[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories



On 5/30/07, Christoph Spielmann <cspielma@xxxxxxxxxx> wrote:
hi everybody!

We use condor on one of our linux-clusters here. The installation seems
to be okey, but when i try to submit a job to condor from a non-nfs
directory it failes with the famous condor_shadow (condor_SHADOW)
EXITING WITH STATUS 112 error message. The detailled error message is:

5/30 12:15:37 (2203.0) (29451): Job 2203.0 going into Hold state (code
6,2): Error from starter on vm2@xxxxxxxxxxxxxxxxxxxxxxxx: Failed to
execute '/tmp/.condor_run.29439': No such file or directory
5/30 12:15:37 (2203.0) (29451): ZKM: setting default map to (null)
5/30 12:15:37 (2203.0) (29451): **** condor_shadow (condor_SHADOW)
EXITING WITH STATUS 112

I searched the mailing-list archives and found quite alot of ppl with
the same problems but none of the proposed solutions worked for us. We
tried to work with version 6.8.5 and 6.9.2 both dynamically linked. The
problem shows up on both versions. Sometimes it does work but in 99 % of
the trial runs it doesn't.

The funny thing is that it doesn't work when i use condor_run in
combination with a shell-command like /bin/hostname or /bin/date but
when i write a simple hello-world c-program, a submit description file
for that c-program and submit the description file with condor_submit it
works as expected. Even on non-nfs directories!


condor_run does not use file transfer. You must have a shared
filesystem to use condor_run, or at least have the executable in the
same place in every machine. (That is why /bin/hostname works).

I'd bet the reason it works on a few occasions is that every now and
then your job runs on the submit machine, and can find the executable.

-Erik