[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [Globus-discuss] Job submission from GT4 to condor 6.7.10 (job never executes)



On Fri, 2005-09-30 at 00:55 -0700, Mustansar Mehmood wrote:
> hi,
>    I am using GT4 with condor 6.7.10 on a Fedor Core $
> machine. I am able to submit and execute a job with
> GRAM without using condor(without putting condor into
> context). And  I can create a jobad  and submit it
> through condor as well by using condor_submit with
> this classad file
> --------------------------------------------
> universe       = grid
>   grid_type      = gt4
>   executable     = /bin/hostname
>   log            = ad.log
>   output         = ad.musti_ouput
>   error          = ad.error
>   globusscheduler =
> https://ucf-7.linuxclass.marist.edu:8443
>   jobmanager_type = Fork
>   should_transfer_files = YES
>   when_to_transfer_output = ON_EXIT
>   queue
>  -------------------------------------
> 
> . even this scenario works for me
>  globusrun-ws -submit -Ft Condor -S -o  job.epr -b -c
> /bin/touch touched_it
> 
> creating a classad as this
> ---------------------------------------------
> #
> # description file for condor submission
> #
> Universe = vanilla
> Notification = Never
> Executable = /bin/touch
> Requirements = OpSys == "LINUX"  && Arch == "INTEL"
> Environment =
> GLOBUS_LOCATION=/usr/local/globus;X509_CERT_DIR=/etc/grid-security/certificates;X509_USER_PROXY=;X509_USER_CERT=
> ;X509_USER_KEY=;HOME=/home/globus;LOGNAME=globus;JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.03/jre;GLOBUS_GRAM_JOB_HANDLE=htt
> ps://<ip>:8443/wsrf/services/ManagedExecutableJobService?b7b4c80a-317c-11da-99e0-000d60eb0162;LD_LIBRARY_PATH=
> Arguments = touched_it
> InitialDir = /home/globus
> Input = /dev/null
> Log = /usr/local/globus/var/globus-condor.log
> log_xml = True
> #Extra attributes specified by client
> 
> Output = /dev/null
> Error = /dev/null
> ---------------------------------------------------
> 
> 
> --------------------PROBLEM CAUSING SUBMISSION
> SCENARIO-----------------------------------------
> But. Once i use this command syntax
> globusrun-ws -submit   -factory  
> https://<ip>:8443/wsrf/services/ManagedJobFactoryService
> -Ft Condor -f
> /usr/local/globus/test/globus_wsrf_gram_service_java_test_unit/test.xml
> 
> globusrun-ws -submit   -factory  
> https://148.100.51.27:8443/wsrf/services/ManagedJobFactoryService
> -factory-type Condor -f
> /usr/local/globus/test/globus_wsrf_gram_service_java_test_unit/test.xml
> Submitting job...Done.
> Job ID: uuid:885a1146-3186-11da-883e-000d60eb0162
> Termination time: 10/01/2005 07:48 GMT
> Current job state: Pending
> 
> my jobs stays "pending". for ever. it does create a
> classadd for the RSL but never execute. and i dont see
> any errors on my container side.
> ----------------RSL-------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <job>
>     <executable>/bin/hostname</executable>

You changed this to execute 'hostname' instead of 'echo', but neglected
to remove the <argument> elements.  This will most certainly cause
hostname to fail.  Condor just isn't failing the job for some reason.

Peter

>     <directory>${GLOBUS_USER_HOME}</directory>
>     <argument>12</argument>
>     <argument>abc</argument>
>     <argument>34</argument>
>    
> <argument>pdscaex_instr_GrADS_grads23_28919.cfg</argument>
>     <argument>pgwynnel was here</argument>
>     <environment>
>         <name>PI</name>
>         <value>3.141</value>
>     </environment>
>     <environment>
>         <name>GLOBUS_DUROC_SUBJOB_INDEX</name>
>         <value>0</value>
>     </environment>
>     <stdout>${GLOBUS_USER_HOME}/stdout</stdout>
>     <stderr>${GLOBUS_USER_HOME}/stderr</stderr>
>     <count>1</count>
>     <jobType>multiple</jobType>
> </job>
> ----------------------------------------------
> it creates a classad file for condor which looks like
> -------------------CLASS_AD-----------------------------
> #
> # description file for condor submission
> #
> Universe = vanilla
> Notification = Never
> Executable = /bin/hostname
> Requirements = OpSys == "LINUX"  && Arch == "INTEL"
> Environment =
> PI=3.141;GLOBUS_DUROC_SUBJOB_INDEX=0;GLOBUS_LOCATION=/usr/local/globus;X509_CERT_DIR=/etc/grid-security/certificates;X509_USER_PROXY=;X509_USER_CERT=;X509_USER_KEY=;HOME=/home/globus;LOGNAME=globus;JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun-1.5.0.03/jre;GLOBUS_GRAM_JOB_HANDLE=https://148.100.51.27:8443/wsrf/services/ManagedExecutableJobService?ce468320-3171-11da-8d85-000d60eb0162;LD_LIBRARY_PATH=
> Arguments = 12 abc 34
> pdscaex_instr_GrADS_grads23_28919.cfg pgwynnel was
> here
> InitialDir = /home/globus
> Input = /dev/null
> Log = /usr/local/globus/var/globus-condor.log
> log_xml = True
> #Extra attributes specified by client
> 
> Output = /home/globus/stdout
> Error = /home/globus/stderr
> queue 1
> ---------------------------------------------------------
> 
> here are last few entries from Schedular logfile
> ------------------------------------------------
> 9/30 02:57:14 (pid:8885) Activity on stashed
> negotiator socket
> 9/30 02:57:14 (pid:8885) Negotiating for owner:
> KBPSD@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 9/30 02:57:14 (pid:8885) Checking consistency running
> and runnable jobs
> 9/30 02:57:14 (pid:8885) Tables are consistent
> 9/30 02:57:15 (pid:8885) Out of servers - 0 jobs
> matched, 4 jobs idle, 4 jobs rejected
> 9/30 02:57:15 (pid:8885) Activity on stashed
> negotiator socket
> 9/30 02:57:15 (pid:8885) Negotiating for owner:
> globus@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> 9/30 02:57:15 (pid:8885) Checking consistency running
> and runnable jobs
> 9/30 02:57:15 (pid:8885) Tables are consistent
> 9/30 02:57:15 (pid:8885) Out of servers - 0 jobs
> matched, 4 jobs idle, 4 jobs rejected
> 
> -------------------------------globus-condor.conf--------
> /usr/local/globus/etc]$ cat globus-condor.conf
> log_path=/usr/local/globus/var/globus-condor.log
> -------------------------------------------------
> 
> --------------globus-condor.log------------------------
> <c>
>     <a n="MyType"><s>SubmitEvent</s></a>
>     <a n="EventTypeNumber"><i>0</i></a>
>     <a n="EventTime"><s>2005-09-30T03:48:05</s></a>
>     <a n="Cluster"><i>68</i></a>
>     <a n="Proc"><i>0</i></a>
>     <a n="Subproc"><i>0</i></a>
>     <a n="SubmitHost"><s>&lt;<ip>:59194&gt;</s></a>
> </c>
> 
> 
> 
> Analysis and Questions:
> =======================
> Possibly my sytax is wrong to submit the job  (if yes
> please some correct me)
> secondly my machine is not the central manager in the
> condor pool. (could it be the problem since apparently
> i am not referring to central manager. Though think it
> should be automatic since i can submit other jobs
> withoout putting condor pool central manager into
> context)
> There is some thing wrong with my condor or globus
> configuration. 
> Its been a while i have been trying to fix this
> problem but some how people are too busy or may be i
> didnt give clear enough description of the problem. So
> please let me know if there is any thing i need to do
> to fix it. Its been a while i am stuck with this. Any
> help will be apprciated.thanx in advance.
> Mustansar
> Marist College Poughkeepsie Ny
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> -
> To Unsubscribe: send mail to majordomo@xxxxxxxxxx
> with "unsubscribe discuss" in the body of the message
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature