[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] weird reason for held job
- Date: Thu, 16 Dec 2010 07:02:07 -0500
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] weird reason for held job
I'm going to guess it is this...
What's really happening is the job's Iwd doesn't exist on the execute
On 12/15/2010 04:22 PM, Dennis Box wrote:
No, I prepared it on a linux box, the worker nodes are linux as well.
I should have added that if I condor_submit the .cmd file from an nfs
mounted directory it runs, if I create a subdirectory in /tmp and submit
the .cmd file from there I get the 'no such file or directory' hold
reason. The full path to the executable is in the .cmd file.
On 12/15/10 3:04 PM, Ian Cottam wrote:
Did it start
Did you prepare it under Windows, such that Linux sees
On 15/12/2010 20:38, "Dennis Box"<dbox@xxxxxxxx> wrote:
I can create a condor job which gets held almost immediately after
[dbox@gpsn01 ~]$ condor_q dbox
-- Submitter: gpsn01.fnal.gov :<188.8.131.52:60205> : gpsn01.fnal.gov
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
2070.0 dbox 12/15 13:58 0+00:00:00 H 0 0.0
Looking deeper into why it is held:
[dbox@gpsn01 ~]$ condor_q -l 2070.0 | grep HoldReason
LastHoldReason = "Error from slot2@xxxxxxxxxxxxxxxx: Failed to execute
with arguments 360: No such file or directory"
LastHoldReasonCode = 6
LastHoldReasonSubCode = 2
Here's the weird part: I ssh to the machine where the error occurs and
look at it, the file seems to be fine!
[dbox@gpwn002 ~]$ ls -la
-rwxr-xr-x 1 dbox gpcf 800 Dec 15 13:57
Any suggestions on how to proceed to debug this?