[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] simple condor error



The line at the script didn't point to the correct line for csh but
we have now fixed that and the user's job is still not running, with
the same error.

Steve


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team

On Tue, 20 Jun 2006, Dan Bradley wrote:

Steve,

The error message that says failure to execute 'blah condor_exec.exe
..." is actually showing you the command, followed by the args,
_including_ arg 0, which is the name "condor_exec.exe" used by condor
for all jobs.  This is confusing.  The error message should not be
interpreted as meaning that Condor is trying to access a file literally
named 'blah condor_exec.exe'.  I'm fixing the error message to make
this clear.

Does the #! line at the top of the script point to the correct path for
csh?  I believe an error in that could cause the problem you are
seeing.

--Dan

On Jun 20, 2006, at 3:52 PM, Steven Timm wrote:


I have a user trying to submit a simple vanilla universe job.
This is the job submit file:

[root@ilcsim LinacBench1]# more myJob1.run
universe = vanilla
executable = /prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh
transfer_output = true
transfer_error = true
transfer_executable = false
log = myjob.log.$(Cluster).$(Process)
notification = NEVER
remote_initialdir = /prj/ilc/lebrun/CHEF/LinacBench1
output = myjob.out.$(Cluster).$(Process)
error = myjob.err.$(Cluster).$(Process)
queue
[root@ilcsim LinacBench1]# more myJob1.run
universe = vanilla
executable = /prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh
transfer_output = true
transfer_error = true
transfer_executable = false
log = myjob.log.$(Cluster).$(Process)
notification = NEVER
remote_initialdir = /prj/ilc/lebrun/CHEF/LinacBench1
output = myjob.out.$(Cluster).$(Process)
error = myjob.err.$(Cluster).$(Process)
queue

Here's what's in ShadowLog

6/20 13:57:14 ******************************************************
6/20 13:57:14 ** condor_shadow (CONDOR_SHADOW) STARTING UP
6/20 13:57:14 ** /opt/condor-6.7.19/sbin/condor_shadow
6/20 13:57:14 ** $CondorVersion: 6.7.19 May 10 2006 $
6/20 13:57:14 ** $CondorPlatform: I386-LINUX_RH9 $
6/20 13:57:14 ** PID = 24934
6/20 13:57:14 ** Log last touched 6/13 15:32:28
6/20 13:57:14 ******************************************************
6/20 13:57:14 Using config file: /etc/condor/condor_config
6/20 13:57:14 Using local config files:
/opt/condor-6.7.19/local.ilcsim/condor_config.local
6/20 13:57:14 DaemonCore: Command Socket at <131.225.110.52:33278>
6/20 13:57:14 Initializing a VANILLA shadow for job 9.0
6/20 13:57:14 (9.0) (24934): Request to run on <131.225.110.52:32775>
was
ACCEPT
ED
6/20 13:57:14 (9.0) (24934): ERROR "Error from starter on
vm1@xxxxxxxxxxxxxxx: F
ailed to execute '/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh
condor_exec.exe':
  No such file or directory" at line 597 in file pseudo_ops.C
6/20 13:57:16 **********************************************

-- \

And the same in the UserLog

[root@ilcsim log]# more /prj/ilc/lebrun/CHEF/LinacBench1/myjob.log.13.0
000 (013.000.000) 06/20 15:27:23 Job submitted from host:
<131.225.110.52:32774>
...
001 (013.000.000) 06/20 15:27:27 Job executing on host:
<131.225.110.52:32775>
...
007 (013.000.000) 06/20 15:27:27 Shadow exception!
         Error from starter on vm1@xxxxxxxxxxxxxxx: Failed to execute
'/prj/ilc/l
ebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe': No such file or
directory
         0  -  Run Bytes Sent By Job
         0  -  Run Bytes Received By Job
...
001 (013.000.000) 06/20 15:27:30 Job executing on host:
<131.225.110.52:32775>
...
007 (013.000.000) 06/20 15:27:30 Shadow exception!
         Error from starter on vm1@xxxxxxxxxxxxxxx: Failed to execute
'/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe': No
such file or
directory
----------------------------------------------------
[root@ilcsim LinacBench1]# ls -l
/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh
-rwxr-xr-x    1 lebrun   bphys         262 Jun 20 15:26
/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh

------------------------------------------------------------------

So, what's wrong-- why is it trying to execute
'/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh condor_exec.exe'
which doesn't exist,
instead of
'/prj/ilc/lebrun/CHEF/LinacBench1/runGrid.csh'
which does?

Something basic is wrong.  Jobs being run by other users using
a very similar basic job file don't have this problem.
Any idea what?

Steve Timm




------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx
http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific
Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR