[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_submit -name option



Hi All
 
We've been playing with submission of jobs to a remote schedd. This is windows to windows.
 
Works fine if using the -remote option which spools across the data to the remote machine.
Can take a long, long time if submitting thousands of jobs though.
 
Thought we'd try the -name option instead, which does not spool the data across.
On the remote schedd a condor_q shows the jobs on hold with the reason being
"cannot access initial working directory", and the shadow log shows:
 
5/18 11:25:40 ******************************************************
5/18 11:25:40 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/18 11:25:40 ** C:\PROGRA~1\condor\bin\condor_shadow.exe
5/18 11:25:40 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
5/18 11:25:40 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
5/18 11:25:40 ** $CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
5/18 11:25:40 ** $CondorPlatform: INTEL-WINNT50 $
5/18 11:25:40 ** PID = 3336
5/18 11:25:40 ** Log last touched 5/18 11:16:24
5/18 11:25:40 ******************************************************
5/18 11:25:40 Using config source: c:\PROGRA~1\condor\condor_config
5/18 11:25:40 Using local config sources:
5/18 11:25:40    C:\PROGRA~1\condor/condor_config.local
5/18 11:25:40 DaemonCore: Command Socket at <130.116.144.59:9342>
5/18 11:25:40 Initializing a VANILLA shadow for job 6.0
5/18 11:25:40 (6.0) (3336): WriteUserLog::initialize: safe_fopen_wrapper("C:\Data\condor_stuff\examples\cpubound\cpubound_6_0.log",a+tc) failed - errno 2 (No such file or directory)
5/18 11:25:40 (6.0) (3336): WriteUserLog::initialize: failed to open file
5/18 11:25:40 (6.0) (3336):
 
Path does not exist.
He who travels without bounds
Can't locate data.
 
5/18 11:25:40 (6.0) (3336): Cannot access initial working directory C:\Data\condor_stuff\examples\cpubound: No such file or directory
5/18 11:25:40 (6.0) (3336): Job 6.0 going into Hold state (code 14,2): Cannot access initial working directory C:\Data\condor_stuff\examples\cpubound: No such file or directory
5/18 11:25:40 (6.0) (3336): RemoteResource::killStarter(): DCStartd object NULL!
5/18 11:25:40 (6.0) (3336): **** condor_shadow (condor_SHADOW) pid 3336 EXITING WITH STATUS 112
The initial working directory is the local path on the original submitting PC, and obviously
does not exist on the remote schedd. No wonder it can't find it.
 
What am I not understanding? Can someone please help/explain?
 
Thanks.
 
Cheers
 
Greg
 
P.S. I like the Confuscious-like saying in the log file! :)
 

Dr. Greg Hitchen
Physical Scientist | Electron Beam Laboratory
Earth Sciences and Resource Engineering
CSIRO

Phone: +61 8 6436 8663 | Fax: +61 8 6436 8555 | Mobile: 0407 952 748 
greg.hitchen@csiro.au | www.csiro.au | www.csiro.au/org/CESRE

Address: 26 Dick Perry Avenue, Kensington WA 6151

PLEASE NOTE
The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.

Please consider the environment before printing this email.