Erik Paulson wrote:
On 5/30/07, Christoph Spielmann <cspielma@xxxxxxxxxx> wrote:hi everybody! We use condor on one of our linux-clusters here. The installation seems to be okey, but when i try to submit a job to condor from a non-nfs directory it failes with the famous condor_shadow (condor_SHADOW) EXITING WITH STATUS 112 error message. The detailled error message is: 5/30 12:15:37 (2203.0) (29451): Job 2203.0 going into Hold state (code 6,2): Error from starter on vm2@xxxxxxxxxxxxxxxxxxxxxxxx: Failed to execute '/tmp/.condor_run.29439': No such file or directory 5/30 12:15:37 (2203.0) (29451): ZKM: setting default map to (null) 5/30 12:15:37 (2203.0) (29451): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 112 I searched the mailing-list archives and found quite alot of ppl with the same problems but none of the proposed solutions worked for us. We tried to work with version 6.8.5 and 6.9.2 both dynamically linked. The problem shows up on both versions. Sometimes it does work but in 99 % of the trial runs it doesn't. The funny thing is that it doesn't work when i use condor_run in combination with a shell-command like /bin/hostname or /bin/date but when i write a simple hello-world c-program, a submit description file for that c-program and submit the description file with condor_submit it works as expected. Even on non-nfs directories!condor_run does not use file transfer. You must have a shared filesystem to use condor_run, or at least have the executable in the same place in every machine. (That is why /bin/hostname works). I'd bet the reason it works on a few occasions is that every now and then your job runs on the submit machine, and can find the executable. -Erik _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
Well i just checked and both hostname AND date are on all machines in the same place (/bin) so that's not the problem. Actually the filesystem-root of the nodes is mounted via nfs just machine-specific things like /tmp, /etc... are mounted seperately on each machine. But are mounted all on the same place of course...