[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job fails to run / Job leaves around unkillable processes



Thanks for the response.  Good points.  However . . . 

V: is actually a physical hard drive on my computer and at the moment, condor is only installed on my computer.  I was doing a test to see if my software would work with the latest version.  So everything is contained on my computer that has V as a physical hard drive.  So condor should be able to get at it.  I also checked to see if this directories actually does exist.  They do and as far as I can tell, they are accessible by anybody, including condor (which is running as NT AUTHORITY\SYSTEM).

After all this, I wanted to be sure, so I moved everything to c:\temp and changed all paths in the submit description file to relative paths and then submitted to condor to see if anything changed.  Unfortunately, I still have the same problem.

I've attached the new submit description file and the output log file.  IP address, port numbers, usernames, etc. have been changed to protect the guilty.   Below is what came out of StarterLog.slot1.

10/29 12:47:28 Create_Process: CreateProcess failed, errno=267
10/29 12:47:28 ERROR "Create_Process(C:\condor\execute\dir_6728\condor_exec.exe,, ...) failed: " at line 530 in file ..\src\condor_starter.V6.1\os_proc.cpp



On Fri, Oct 29, 2010 at 8:51 AM, John (TJ) Knoeller <johnkn@xxxxxxxxxxx> wrote:
yep  267 is "The directory name is invalid".  From looking at your .job file.  I'm wondering if the invalid directory isn't
v:\temp\condor or v:\shared\condor rather than c:\condor\execute\dir_6136 as the error message seems to imply.

I'm guessing that v: is a network drive.  So I gotta wonder,  v: really valid in the context of the job?


On 10/29/2010 9:31 AM, Torrin Jones wrote:
Using Condor 7.4.4 on Windows XP.

Any idea what would cause an error 267?

From StarterLog.slot1 . . .

10/28 08:35:33 Create_Process: CreateProcess failed, errno=267
10/28 08:35:33 ERROR "Create_Process(C:\condor\execute\dir_6136\condor_exec.exe,, ...) failed: " at line 530 in file ..\src\condor_starter.V6.1\os_proc.cpp

The MSDN says 267 means, "The directory name is invalid."  However, the directory name is there.  Here is the scenario.  I submit a small job.  condor_dummy.job attached.  All condor_dummy.exe does is print out a line like this . . .

Run by DOMAIN\USER on COMPUTERNAME at DATE TIME.

It's basically a quick condor test.

Anyway, I submit the job and condor tries to run it.  However it fails and I get the above message in the StarterLog.slot1.  Here is the kicker.  It will retry and fail.  However, if I leave it in the queue long enough, it will eventually succeed.  When I ran the job yesterday, it tried 28 times.  The final time, it succeeded.  Here is another thing I'm seeing.  After it succeeded, I looked in Process Explorer and saw 27 condor_exec.exe running.  The condor_exec.exe's were unkillable.  I tried every approach I could think of.  Killing them as Admin, as NT AUTHORITY/SYSTEM, even putting a debugger on them and killing them that way, nothing works.

So I have 2 issues.

1. The job fails to run.
2. The job leaves around unkillable processes.

Any ideas?  Has anybody seen anything like this?
_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


Attachment: 5.log
Description: Binary data

Attachment: condor_dummy.job
Description: Binary data