[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to solve problem between condor and globus?



Hi Fu-Ming (i suppose this is your firstname)

I can see in the condor_config osgc01.grid.sinica.edu.tw is the cetral manager and you're submitting the job to this host. I suppose you have Globus with Condor as Scheduler on it. 

So why do you use the Globus Universe and not Vanilla?

More important are the requirements in the condor_config. Try to replace UWCS_*  with TESTINGMODE_*. You can see the settings after Part3 in section:
#####################################################################
##  This where you choose the configuration that you would like to
##  use.  It has no defaults so it must be defined.  We start this
##  file off with the UWCS_* policy.
###################################################################### 

I send you one of my config_files as example. 

Pedro

-----Ursprüngliche Nachricht-----
Von: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai
Gesendet: Donnerstag, 22. Dezember 2005 09:44
An: Condor-Users Mail List
Betreff: Re: [Condor-users] How to solve problem between condor and globus?

Hello, Pedro,
Please refer the attched file. It's my global condor_config file.

the following is my submit file.
[root@osgc01 root]# more /home/sary357/job/job4.jdl
Universe        = globus
globusscheduler = osgc01.grid.sinica.edu.tw/jobmanager-condor
Executable      = job4.sh
Output          = job4.out
Error           = job4.err
Log             = job4.log
Requirements    = (Name=="vm2@xxxxxxxxxxxxxxxxxxxxxxxxx")
should_transer_file =  IF_NEEDED
when_to_transfer_output = ON_EXIT
Queue
[root@osgc01 root]# more /home/sary357/job/job4.sh
#!/bin/bash
/bin/hostname

Thank you for your attention!!

BR
On Wed, 21 Dec 2005 19:12:40 +0100, Pedro R. Br輍ger Taboada wrote
> I see many problems, staging, universe and expression. I need to see 
> the submit file and the condor_config file. Perhaps the I can solve 
> your problem.
> 
> Pedro
> 
> -----Ursprgliche Nachricht-----
> Von: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai
> Gesendet: Dienstag, 20. Dezember 2005 11:06
> An: Condor-Users Mail List
> Betreff: Re: [Condor-users] How to solve problem between condor and globus?
> 
> Sorry, all,
> After trying so many times, I gave up and used NFS.
> However, I still can not submit globus job to condor.
> so, I tried to get some debug information.
> 
> [sary357@osgc01 job]$ condor_q -analyze
> ---
> 4206.000:  Run analysis summary.  Of 4 machines,
>       3 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the 
> pool      1 match but reject the job for unknown reasons      0 
> match but will not currently preempt their existing job      0 are 
> available to run your job
> 
> WARNING: Analysis is only meaningful for Globus universe jobs using 
> matchmaking.
> ---
> 4207.000:  Run analysis summary.  Of 4 machines,
>       0 are rejected by your job's requirements
>       3 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the 
> pool      1 match but reject the job for unknown reasons      0 
> match but will not currently preempt their existing job      0 are 
> available to run your job
>         Last successful match: Tue Dec 20 09:45:24 2005
>         Last failed match: Tue Dec 20 09:55:31 2005        Reason 
> for last match failure: no match found
> 
> == StarterLog.vm2==
> 12/20 17:35:36 Shadow version: $CondorVersion: 6.7.7 Apr 27 2005 $
> 12/20 17:35:36 Submitting machine is "osgc01.grid.sinica.edu.tw"
> 12/20 17:35:36 ShouldTransferFiles is "NO", NOT transfering files
> 12/20 17:35:36 Submit UidDomain: "grid.sinica.edu.tw"
> 12/20 17:35:36  Local UidDomain: "grid.sinica.edu.tw"
> 12/20 17:35:36 Initialized user_priv as "sary357"
> 
> 12/20 17:35:36 Done moving to directory "/opt/osg/osgs01/execute/dir_6591"
> 
> 12/20 17:35:36 JICShadow::initIOProxy(): Job does not define WantIOProxy
> 12/20 17:35:36 No StarterUserLog found in job ClassAd
> 12/20 17:35:36 Starter will not write a local UserLog
> 12/20 17:35:36 Starting a VANILLA universe job with ID: 4207.0
> 12/20 17:35:36 In OsProc::OsProc()
> 12/20 17:35:36 Main job KillSignal: 15 (SIGTERM)
> 12/20 17:35:36 Main job RmKillSignal: 15 (SIGTERM)
> 12/20 17:35:36 Main job HoldKillSignal: 15 (SIGTERM)
> 12/20 17:35:36 in VanillaProc::StartJob()
> 12/20 17:35:36 in OsProc::StartJob()
> 12/20 17:35:36 IWD: /home/sary357/gram_scratch_tUb21E3Wqv
> 12/20 17:35:36 Input file: /dev/null
> 12/20 17:35:36 Failed to 
> open
> '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> out' as standard output: No such file or directory (errno 2)
> 12/20 17:35:36 Failed to 
> open
> '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> err' as standard error: No such file or directory (errno 2)
> 12/20 17:35:36 Failed to open some/all of the std files...
> 12/20 17:35:36 Aborting OsProc::StartJob.
> 12/20 17:35:36 Failed to start job, exiting
> 12/20 17:35:36 ShutdownFast all jobs.
> 12/20 17:35:36 Got ShutdownFast when no jobs running.
> 12/20 17:35:36 Removing /opt/osg/osgs01/execute/dir_6591
> 
> 12/20 17:35:36 Attempting to remove /opt/osg/osgs01/execute/dir_6591 
> as SuperUser (root)
> =========================
> 
> [sary357@osgc01 job]$ condor_q -better-analyze 4206
> 
> -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> : 
> osgc01.grid.sinica.edu.tw
> ---
> 4206.000:  Run analysis summary.  Of 4 machines,
>       3 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the 
> pool      1 match but reject the job for unknown reasons      0 
> match but will not currently preempt their existing job      0 are 
> available to run your job
> 
> The Requirements expression for your job is:
> 
> ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
> 
>     Condition                         Machines Matched    Suggestion
>     ---------                         ----------------    ----------
> 1   ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
>                                       1
> 
> WARNING: Analysis is only meaningful for Globus universe jobs using 
> matchmaking.
> [sary357@osgc01 job]$ condor_q -better-analyze 4207
> 
> -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> : 
> osgc01.grid.sinica.edu.tw
> Segmentation fault
> 
> I'm sure the FileDomain in those 2 machines are the same.
> It looks like the output file and error file can not be built. Does 
> anyone know?
> 
> BR
> 
> ----------------------------------------------------------------------
> "Gravitation is not responsible for people falling in love."
> 
> Fu-Ming Tsai
> Academia Sinica Computing Centre, Academia Sinica
> sary357@xxxxxxxxxxxxxxxxxx
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users


----------------------------------------------------------------------
"Gravitation is not responsible for people falling in love." 

Fu-Ming Tsai
Academia Sinica Computing Centre, Academia Sinica
sary357@xxxxxxxxxxxxxxxxxx
------------------------------------------------------------------------

Attachment: condor_config
Description: Binary data