Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] condor_submit -name option
- Date: Tue, 18 May 2010 11:34:03 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: [Condor-users] condor_submit -name option
Hi
All
We've been playing
with submission of jobs to a remote schedd. This is windows to
windows.
Works fine if using
the -remote option which spools across the data to the remote
machine.
Can take a long,
long time if submitting thousands of jobs though.
Thought we'd try the
-name option instead, which does not spool the data across.
On the remote schedd
a condor_q shows the jobs on hold with the reason being
"cannot access
initial working directory", and the shadow log shows:
5/18 11:25:40
******************************************************
5/18 11:25:40 **
condor_shadow (CONDOR_SHADOW) STARTING UP
5/18 11:25:40 **
C:\PROGRA~1\condor\bin\condor_shadow.exe
5/18 11:25:40 ** SubsystemInfo:
name=SHADOW type=SHADOW(6) class=DAEMON(1)
5/18 11:25:40 ** Configuration:
subsystem:SHADOW local:<NONE> class:DAEMON
5/18 11:25:40 **
$CondorVersion: 7.2.4 Jun 15 2009 BuildID: 159529 $
5/18 11:25:40 **
$CondorPlatform: INTEL-WINNT50 $
5/18 11:25:40 ** PID = 3336
5/18 11:25:40
** Log last touched 5/18 11:16:24
5/18 11:25:40
******************************************************
5/18 11:25:40 Using
config source: c:\PROGRA~1\condor\condor_config
5/18 11:25:40 Using local
config sources:
5/18 11:25:40
C:\PROGRA~1\condor/condor_config.local
5/18 11:25:40 DaemonCore: Command
Socket at <130.116.144.59:9342>
5/18 11:25:40 Initializing a VANILLA
shadow for job 6.0
5/18 11:25:40 (6.0) (3336): WriteUserLog::initialize:
safe_fopen_wrapper("C:\Data\condor_stuff\examples\cpubound\cpubound_6_0.log",a+tc)
failed - errno 2 (No such file or directory)
5/18 11:25:40 (6.0) (3336):
WriteUserLog::initialize: failed to open file
5/18 11:25:40 (6.0) (3336):
Path does not
exist.
He who travels without bounds
Can't locate
data.
5/18 11:25:40 (6.0)
(3336): Cannot access initial working directory
C:\Data\condor_stuff\examples\cpubound: No such file or directory
5/18
11:25:40 (6.0) (3336): Job 6.0 going into Hold state (code 14,2): Cannot access
initial working directory C:\Data\condor_stuff\examples\cpubound: No such file
or directory
5/18 11:25:40 (6.0) (3336): RemoteResource::killStarter():
DCStartd object NULL!
5/18 11:25:40 (6.0) (3336): **** condor_shadow
(condor_SHADOW) pid 3336 EXITING WITH STATUS 112
The initial working
directory is the local path on the original submitting PC, and
obviously
does not exist on
the remote schedd. No wonder it can't find it.
What am I not
understanding? Can someone please help/explain?
Thanks.
Cheers
Greg
P.S. I like the
Confuscious-like saying in the log file! :)
Dr. Greg Hitchen
Physical Scientist | Electron Beam Laboratory
Earth Sciences and Resource
Engineering
CSIRO
Phone:
+61 8 6436 8663 | Fax: +61 8 6436 8555 | Mobile: 0407 952 748
greg.hitchen@csiro.au | www.csiro.au
| www.csiro.au/org/CESRE
Address: 26 Dick Perry Avenue, Kensington WA
6151
PLEASE NOTE
The information contained in
this email may be confidential or privileged. Any unauthorised use or disclosure
is prohibited. If you have received this email in error, please delete it
immediately and notify the sender by return email. Thank you. To the extent
permitted by law, CSIRO does not represent, warrant and/or guarantee that the
integrity of this communication has been maintained or that the communication is
free of errors, virus, interception or interference.
Please
consider the environment before printing this email.