Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories

Date: Wed, 30 May 2007 16:21:53 +0200
From: Christoph Spielmann <cspielma@xxxxxxxxxx>
Subject: Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories

Erik Paulson wrote:

On 5/30/07, Christoph Spielmann <cspielma@xxxxxxxxxx> wrote:

hi everybody!

We use condor on one of our linux-clusters here. The installation seems
to be okey, but when i try to submit a job to condor from a non-nfs
directory it failes with the famous condor_shadow (condor_SHADOW)
EXITING WITH STATUS 112 error message. The detailled error message is:

5/30 12:15:37 (2203.0) (29451): Job 2203.0 going into Hold state (code
6,2): Error from starter on vm2@xxxxxxxxxxxxxxxxxxxxxxxx: Failed to
execute '/tmp/.condor_run.29439': No such file or directory
5/30 12:15:37 (2203.0) (29451): ZKM: setting default map to (null)
5/30 12:15:37 (2203.0) (29451): **** condor_shadow (condor_SHADOW)
EXITING WITH STATUS 112

I searched the mailing-list archives and found quite alot of ppl with
the same problems but none of the proposed solutions worked for us. We
tried to work with version 6.8.5 and 6.9.2 both dynamically linked. The
problem shows up on both versions. Sometimes it does work but in 99 % of
the trial runs it doesn't.

The funny thing is that it doesn't work when i use condor_run in
combination with a shell-command like /bin/hostname or /bin/date but
when i write a simple hello-world c-program, a submit description file
for that c-program and submit the description file with condor_submit it
works as expected. Even on non-nfs directories!


condor_run does not use file transfer. You must have a shared
filesystem to use condor_run, or at least have the executable in the
same place in every machine. (That is why /bin/hostname works).

I'd bet the reason it works on a few occasions is that every now and
then your job runs on the submit machine, and can find the executable.

-Erik
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

Hello Erik!

Well i just checked and both hostname AND date are on all machines in the same place (/bin) so that's not the problem. Actually the filesystem-root of the nodes is mounted via nfs just machine-specific things like /tmp, /etc... are mounted seperately on each machine. But are mounted all on the same place of course...

Follow-Ups:
- Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories
  - From: Erik Paulson

References:
- [Condor-users] condor_run vs. condor_submit and non-nfs directories
  - From: Christoph Spielmann
- Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories
  - From: Erik Paulson

Prev by Date: Re: [Condor-users] Condor (on windows) problems with "condor_submit"
Next by Date: [Condor-users] RANK and job placement
Previous by thread: Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories
Next by thread: Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] condor_run vs. condor_submit and non-nfs directories