[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] simple condor error




I have a user trying to submit a simple vanilla universe job.
This is the job submit file:

[root@ilcsim LinacBench1]# more myJob1.run
universe = vanilla
executable = /prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh
transfer_output = true
transfer_error = true
transfer_executable = false
log = myjob.log.$(Cluster).$(Process)
notification = NEVER
remote_initialdir = /prj/ilc/lebrun/CHEF/LinacBench1
output = myjob.out.$(Cluster).$(Process)
error = myjob.err.$(Cluster).$(Process)
queue
[root@ilcsim LinacBench1]# more myJob1.run
universe = vanilla
executable = /prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh
transfer_output = true
transfer_error = true
transfer_executable = false
log = myjob.log.$(Cluster).$(Process)
notification = NEVER
remote_initialdir = /prj/ilc/lebrun/CHEF/LinacBench1
output = myjob.out.$(Cluster).$(Process)
error = myjob.err.$(Cluster).$(Process)
queue

Here's what's in ShadowLog

6/20 13:57:14 ******************************************************
6/20 13:57:14 ** condor_shadow (CONDOR_SHADOW) STARTING UP
6/20 13:57:14 ** /opt/condor-6.7.19/sbin/condor_shadow
6/20 13:57:14 ** $CondorVersion: 6.7.19 May 10 2006 $
6/20 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $
6/20 13:57:14 ** PID = 24934
6/20 13:57:14 ** Log last touched 6/13 15:32:28
6/20 13:57:14 ******************************************************
6/20 13:57:14 Using config file: /etc/condor/condor_config
6/20 13:57:14 Using local config files: /opt/condor-6.7.19/local.ilcsim/condor_config.local
6/20 13:57:14 DaemonCore: Command Socket at <131.225.110.52:33278>
6/20 13:57:14 Initializing a VANILLA shadow for job 9.0
6/20 13:57:14 (9.0) (24934): Request to run on <131.225.110.52:32775> was ACCEPT
ED
6/20 13:57:14 (9.0) (24934): ERROR "Error from starter on vm1@xxxxxxxxxxxxxxx: F ailed to execute '/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe':
 No such file or directory" at line 597 in file pseudo_ops.C
6/20 13:57:16 **********************************************

-- \

And the same in the UserLog

[root@ilcsim log]# more /prj/ilc/lebrun/CHEF/LinacBench1/myjob.log.13.0
000 (013.000.000) 06/20 15:27:23 Job submitted from host: <131.225.110.52:32774>
...
001 (013.000.000) 06/20 15:27:27 Job executing on host: <131.225.110.52:32775>
...
007 (013.000.000) 06/20 15:27:27 Shadow exception!
Error from starter on vm1@xxxxxxxxxxxxxxx: Failed to execute '/prj/ilc/l ebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe': No such file or directory
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
...
001 (013.000.000) 06/20 15:27:30 Job executing on host: <131.225.110.52:32775>
...
007 (013.000.000) 06/20 15:27:30 Shadow exception!
Error from starter on vm1@xxxxxxxxxxxxxxx: Failed to execute '/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe': No such file or directory
----------------------------------------------------
[root@ilcsim LinacBench1]# ls -l /prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh -rwxr-xr-x 1 lebrun bphys 262 Jun 20 15:26 /prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh

------------------------------------------------------------------

So, what's wrong-- why is it trying to execute
'/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe'
which doesn't exist,
instead of
'/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh'
which does?

Something basic is wrong.  Jobs being run by other users using
a very similar basic job file don't have this problem.
Any idea what?

Steve Timm




------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team