[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to solve problem between condor and globus?



Thank you, Pedro,
  I want to execute globus jobs because condor here is used for OSG. And 
could you explain the meaning of the parameters "TESTINGMODE_*" and "UWCS_*"?

BR

On Fri, 23 Dec 2005 11:22:11 +0100, Pedro R. Br??gger Taboada wrote
> Hi Fu-Ming (i suppose this is your firstname)
> 
> I can see in the condor_config osgc01.grid.sinica.edu.tw is the 
> cetral manager and you're submitting the job to this host. I suppose 
> you have Globus with Condor as Scheduler on it.
> 
> So why do you use the Globus Universe and not Vanilla?
> 
> More important are the requirements in the condor_config. Try to 
> replace UWCS_*  with TESTINGMODE_*. You can see the settings after 
> Part3 in section:
> #####################################################################
> ##  This where you choose the configuration that you would like to
> ##  use.  It has no defaults so it must be defined.  We start this
> ##  file off with the UWCS_* policy.
> ######################################################################
> 
> I send you one of my config_files as example.
> 
> Pedro
> 
> -----Urspr??ngliche Nachricht-----
> Von: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai Gesendet: 
> Donnerstag, 22. Dezember 2005 09:44 An: Condor-Users Mail List 
> Betreff: Re: [Condor-users] How to solve problem between condor and globus?
> 
> Hello, Pedro,
> Please refer the attched file. It's my global condor_config file.
> 
> the following is my submit file.
> [root@osgc01 root]# more /home/sary357/job/job4.jdl
> Universe        = globus
> globusscheduler = osgc01.grid.sinica.edu.tw/jobmanager-condor
> Executable      = job4.sh
> Output          = job4.out
> Error           = job4.err
> Log             = job4.log
> Requirements    = (Name=="vm2@xxxxxxxxxxxxxxxxxxxxxxxxx")
> should_transer_file =  IF_NEEDED
> when_to_transfer_output = ON_EXIT
> Queue
> [root@osgc01 root]# more /home/sary357/job/job4.sh
> #!/bin/bash
> /bin/hostname
> 
> Thank you for your attention!!
> 
> BR
> On Wed, 21 Dec 2005 19:12:40 +0100, Pedro R. Br輍ger Taboada wrote
> > I see many problems, staging, universe and expression. I need to see 
> > the submit file and the condor_config file. Perhaps the I can solve 
> > your problem.
> > 
> > Pedro
> > 
> > -----Urspr??gliche Nachricht-----
> > Von: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai
> > Gesendet: Dienstag, 20. Dezember 2005 11:06
> > An: Condor-Users Mail List
> > Betreff: Re: [Condor-users] How to solve problem between condor and 
globus?
> > 
> > Sorry, all,
> > After trying so many times, I gave up and used NFS.
> > However, I still can not submit globus job to condor.
> > so, I tried to get some debug information.
> > 
> > [sary357@osgc01 job]$ condor_q -analyze
> > ---
> > 4206.000:  Run analysis summary.  Of 4 machines,
> >       3 are rejected by your job's requirements
> >       0 reject your job because of their own requirements
> >       0 match but are serving users with a better priority in the 
> > pool      1 match but reject the job for unknown reasons      0 
> > match but will not currently preempt their existing job      0 are 
> > available to run your job
> > 
> > WARNING: Analysis is only meaningful for Globus universe jobs using 
> > matchmaking.
> > ---
> > 4207.000:  Run analysis summary.  Of 4 machines,
> >       0 are rejected by your job's requirements
> >       3 reject your job because of their own requirements
> >       0 match but are serving users with a better priority in the 
> > pool      1 match but reject the job for unknown reasons      0 
> > match but will not currently preempt their existing job      0 are 
> > available to run your job
> >         Last successful match: Tue Dec 20 09:45:24 2005
> >         Last failed match: Tue Dec 20 09:55:31 2005        Reason 
> > for last match failure: no match found
> > 
> > == StarterLog.vm2==
> > 12/20 17:35:36 Shadow version: $CondorVersion: 6.7.7 Apr 27 2005 $
> > 12/20 17:35:36 Submitting machine is "osgc01.grid.sinica.edu.tw"
> > 12/20 17:35:36 ShouldTransferFiles is "NO", NOT transfering files
> > 12/20 17:35:36 Submit UidDomain: "grid.sinica.edu.tw"
> > 12/20 17:35:36  Local UidDomain: "grid.sinica.edu.tw"
> > 12/20 17:35:36 Initialized user_priv as "sary357"
> > 
> > 12/20 17:35:36 Done moving to directory "/opt/osg/osgs01/execute/dir_6591"
> > 
> > 12/20 17:35:36 JICShadow::initIOProxy(): Job does not define WantIOProxy
> > 12/20 17:35:36 No StarterUserLog found in job ClassAd
> > 12/20 17:35:36 Starter will not write a local UserLog
> > 12/20 17:35:36 Starting a VANILLA universe job with ID: 4207.0
> > 12/20 17:35:36 In OsProc::OsProc()
> > 12/20 17:35:36 Main job KillSignal: 15 (SIGTERM)
> > 12/20 17:35:36 Main job RmKillSignal: 15 (SIGTERM)
> > 12/20 17:35:36 Main job HoldKillSignal: 15 (SIGTERM)
> > 12/20 17:35:36 in VanillaProc::StartJob()
> > 12/20 17:35:36 in OsProc::StartJob()
> > 12/20 17:35:36 IWD: /home/sary357/gram_scratch_tUb21E3Wqv
> > 12/20 17:35:36 Input file: /dev/null
> > 12/20 17:35:36 Failed to 
> > open
> > '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> > out' as standard output: No such file or directory (errno 2)
> > 12/20 17:35:36 Failed to 
> > open
> > '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> > err' as standard error: No such file or directory (errno 2)
> > 12/20 17:35:36 Failed to open some/all of the std files...
> > 12/20 17:35:36 Aborting OsProc::StartJob.
> > 12/20 17:35:36 Failed to start job, exiting
> > 12/20 17:35:36 ShutdownFast all jobs.
> > 12/20 17:35:36 Got ShutdownFast when no jobs running.
> > 12/20 17:35:36 Removing /opt/osg/osgs01/execute/dir_6591
> > 
> > 12/20 17:35:36 Attempting to remove /opt/osg/osgs01/execute/dir_6591 
> > as SuperUser (root)
> > =========================
> > 
> > [sary357@osgc01 job]$ condor_q -better-analyze 4206
> > 
> > -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> : 
> > osgc01.grid.sinica.edu.tw
> > ---
> > 4206.000:  Run analysis summary.  Of 4 machines,
> >       3 are rejected by your job's requirements
> >       0 reject your job because of their own requirements
> >       0 match but are serving users with a better priority in the 
> > pool      1 match but reject the job for unknown reasons      0 
> > match but will not currently preempt their existing job      0 are 
> > available to run your job
> > 
> > The Requirements expression for your job is:
> > 
> > ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
> > 
> >     Condition                         Machines Matched    Suggestion
> >     ---------                         ----------------    ----------
> > 1   ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
> >                                       1
> > 
> > WARNING: Analysis is only meaningful for Globus universe jobs using 
> > matchmaking.
> > [sary357@osgc01 job]$ condor_q -better-analyze 4207
> > 
> > -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> : 
> > osgc01.grid.sinica.edu.tw
> > Segmentation fault
> > 
> > I'm sure the FileDomain in those 2 machines are the same.
> > It looks like the output file and error file can not be built. Does 
> > anyone know?
> > 
> > BR
> > 
> > ----------------------------------------------------------------------
> > "Gravitation is not responsible for people falling in love."
> > 
> > Fu-Ming Tsai
> > Academia Sinica Computing Centre, Academia Sinica
> > sary357@xxxxxxxxxxxxxxxxxx
> > ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> ----------------------------------------------------------------------
> "Gravitation is not responsible for people falling in love."
> 
> Fu-Ming Tsai
> Academia Sinica Computing Centre, Academia Sinica
> sary357@xxxxxxxxxxxxxxxxxx
> ------------------------------------------------------------------------


----------------------------------------------------------------------
"Gravitation is not responsible for people falling in love." 

Fu-Ming Tsai
Academia Sinica Computing Centre, Academia Sinica
sary357@xxxxxxxxxxxxxxxxxx
------------------------------------------------------------------------