Subject: Re: [Condor-users] Job fails to run / Job leaves around unkillable processes
Here is everything in the starter log from the last 2 seconds of running that process. As you can see from the log below, IWD is set to C:\condor\execute\dir_6728. You can also see it failing to delete that directory later. This is a directory that it created. Again, usernames and domains have been changed to protect the guilty. I'm not sure why the starter is allowed to create a directory, copy an executable into it, but then can't run it or later delete the directory. This is very strange.
Is there a line in the starter log that looks like this
IWD: <some path>
It would be before the message that Create_Process failed. This is
the Initial directory, if it's different
than that path to the executable, then that might be the directory
On 10/29/2010 3:00 PM, Torrin Jones wrote:
Thanks for the response. Good points. However . . .
V: is actually a physical hard drive on my computer and at
the moment, condor is only installed on my computer. I was
doing a test to see if my software would work with the latest
version. So everything is contained on my computer that has V
as a physical hard drive. So condor should be able to get at
it. I also checked to see if this directories actually does
exist. They do and as far as I can tell, they are accessible by
anybody, including condor (which is running as NT
After all this, I wanted to be sure, so I moved everything to
c:\temp and changed all paths in the submit description file to
relative paths and then submitted to condor to see if anything
changed. Unfortunately, I still have the same problem.
I've attached the new submit description file and the output
log file. IP address, port numbers, usernames, etc. have been
changed to protect the guilty. Below is what came out of
yep 267 is "The
directory name is invalid". From looking at your .job
file. I'm wondering if the invalid directory isn't
v:\temp\condor or v:\shared\condor rather than
c:\condor\execute\dir_6136 as the error message seems to
I'm guessing that v: is a network drive. So I gotta
wonder, v: really valid in the context of the job?
10/28 08:35:33 ERROR
...) failed: " at line 530 in file
The MSDN says 267 means, "The directory name
is invalid." However, the directory name is
there. Here is the scenario. I submit a small
job. condor_dummy.job attached. All
condor_dummy.exe does is print out a line like
this . . .
Run by DOMAIN\USER on COMPUTERNAME at DATE
It's basically a quick condor test.
Anyway, I submit the job and condor tries to
run it. However it fails and I get the above
message in the StarterLog.slot1. Here is the
kicker. It will retry and fail. However, if I
leave it in the queue long enough, it will
eventually succeed. When I ran the job
yesterday, it tried 28 times. The final time,
it succeeded. Here is another thing I'm seeing.
After it succeeded, I looked in Process
Explorer and saw 27 condor_exec.exe running.
The condor_exec.exe's were unkillable. I tried
every approach I could think of. Killing them
as Admin, as NT AUTHORITY/SYSTEM, even putting a
debugger on them and killing them that way,