[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CondorG pools



Do the two nodes, manager and the other one, have a shared set
of home directories for the globus user?  Globus expects
that the stdout, stderr, and proxy files will be exported to
the other worker nodes via NFS, that's what the .globus
directory is for in each user's id.
You can change this
if you know what you're doing and hack the globus scripts
to have condor transfer the files to the worker node, but it's tricky.
(tricky enough that I haven't tried it yheteven though I have several
globus-condor pools and have been running them for 6 years).  You
are looking for a perl module called condor.pm about 6 directories
deep in the globus software.

Steve


On Wed, 17 Nov 2010, Roy, Kevin (LNG-SEA) wrote:

I am using condor 7.4.4 on two VMware Ubuntu machines.



I have setup Globus and can submit and run jobs.  I have setup Condor
and can submit and run jobs.  If there is only one machine I can use
Globus to run jobs on condor and vice versa.



When I add a second machine and issue a submit with 5 jobs.  2-3 goes to
one machine and rest to the other machine.  On the manager machine the
jobs run without a problem.  On the second machine the jobs appears to
start to run and are put on hold...   For the following reason (from
condor_q -better-analyze):



Hold reason: Error from helium.adiroy.com: Failed to open
'/home/globus/.globus/job/hydrogen.adiroy.com/16073795612117631466.25882
26358823932351/stdout' as standard output: No such file or directory
(errno 2)



The directory does not exist and if I create it does nothing.  I have
opened my ports for GLOBUS_TCP_PORT and this too did nothing.  I have
searched quite extensively on the web but cannot find any more
information.  Can someone help me?  Thanks in advance



The job is defined as

executable = /bin/hostname

globusscheduler = hydrogen

universe = globus

output = condorg.out.$(cluster).$(Process)

log = condorg.log.$(cluster).$(Process)



should_transfer_files = YES

when_to_transfer_output = ON_EXIT



stream_output = true

stream_error = true



queue 5



--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.