[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] "Failed to execute" error message

If it could be a "file not found" problem, make sure your environment variables are getting passed to your job the way you think they should be. You can see what environment variables are set for a given job by using condor_q -l.

Rok Roskar
University of Washington
Department of Astronomy

On Feb 16, 2006, at 1:14 PM, Stephen Creps wrote:

   I have a newly-installed cluster (SUSE Linux 10.0 x86_64) on which I
had Condor working.  Without going into a long story, suffice it to say
it was necessary to reinstall the master node's OS to work around some
hardware support issues.

   Now I can't get Condor to work.  When we submit a job it just sits
there, periodically trying to run again but failing.  The job log
repeats the following two messages:

001 (122.000.000) 02/16 16:07:47 Job executing on host:
007 (122.000.000) 02/16 16:07:47 Shadow exception!
        Error from starter on vm2@xxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed
to execute '/data/PRD/bin/perfwrap condor_exec.exe
48/test/perf.txt /data/PRD/bin/runjob
/data/PRD/jobs/queue/test-trc-sacreps-20060216-160732.29748/test/ test.inf':
No such file or directory
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job

   The "No such file or directory" is apparently my problem, but I can
see all the files on the given command line except for condor_exec.exe.
It is my impression that this file is a temporary copy of the job. If I
knew where this file is supposed to be located it might help me track
down the problem.  Can anyone tell me were to look, or give me other
ideas to try?

-       -       -       -       -       -
Stephen Creps
Coordinator of UNIX Systems
Information Technology Group
Chemistry Department
Indiana University
(812) 855-8450
Chemistry C206

Condor-users mailing list