[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] "Failed to execute" error message



I have a newly-installed cluster (SUSE Linux 10.0 x86_64) on which I had Condor working. Without going into a long story, suffice it to say it was necessary to reinstall the master node's OS to work around some hardware support issues.

Now I can't get Condor to work. When we submit a job it just sits there, periodically trying to run again but failing. The job log repeats the following two messages:

...
001 (122.000.000) 02/16 16:07:47 Job executing on host: <192.168.128.211:33123>
...
007 (122.000.000) 02/16 16:07:47 Shadow exception!
Error from starter on vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to execute '/data/PRD/bin/perfwrap condor_exec.exe /data/PRD/jobs/queue/test-trc-sacreps-20060216-160732.297 48/test/perf.txt /data/PRD/bin/runjob /data/PRD/jobs/queue/test-trc-sacreps-20060216-160732.29748/test/test.inf': No such file or directory
       0  -  Run Bytes Sent By Job
       0  -  Run Bytes Received By Job
...

The "No such file or directory" is apparently my problem, but I can see all the files on the given command line except for condor_exec.exe. It is my impression that this file is a temporary copy of the job. If I knew where this file is supposed to be located it might help me track down the problem. Can anyone tell me were to look, or give me other ideas to try?

--
-       -       -       -       -       -
Stephen Creps
Coordinator of UNIX Systems
Information Technology Group
Chemistry Department
Indiana University
http://www.chem.indiana.edu/itg/
sacreps@xxxxxxxxxxx
(812) 855-8450
Chemistry C206