[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Bug



I don't think you are running into bugs, but you do have some problems.

Martin Lukac wrote:

ShadowLog:
10/1 17:22:26 ******************************************************
10/1 17:22:26 ** condor_shadow (CONDOR_SHADOW) STARTING UP
10/1 17:22:26 ** $CondorVersion: 6.6.0 Nov 13 2003 $
10/1 17:22:26 ** $CondorPlatform: INTEL-LINUX-GLIBC23 $
10/1 17:22:26 ** PID = 17003
10/1 17:22:26 ******************************************************
10/1 17:22:26 Using config file: /opt/condor/etc/condor_config
10/1 17:22:26 Using local config files:
/opt/condor/local.frontend-0/condor_config.local
10/1 17:22:26 DaemonCore: Command Socket at <10.1.1.1:53694>
10/1 17:22:27 Initializing a VANILLA shadow
10/1 17:22:27 (31.0) (17003): Request to run on <10.255.255.253:32775> was ACCEPTED
10/1 17:22:27 (31.0) (17003): ERROR "Error from starter on compute-0-1.local:
Failed to execute '/disk/local/NAMD/NAMD_2.5_Source/Linux-i686-MPI/test/test.out
condor_exec.exe': No such file or directory" at line 659 in file pseudo_ops.C
10/1 17:22:27 (31.0) (17003): Unable to log ULOG_SHADOW_EXCEPTION event

It sounds like you don't have a shared filesystem between the computers, but you didn't tell Condor to transfer files. Is that correct?


I recommend reading the following two sections of the manual:

2.5.3 Submitting Jobs Using a Shared File System
2.5.4 Submitting Jobs Without a Shared File System: Condor's File Transfer Mechanism

http://www.cs.wisc.edu/condor/manual/v6.6/2_5Submitting_Job.html


MasterLog:
9/30 17:17:13 Can't send UPDATE_MASTER_AD to collector frontend-0.local
<10.1.1.1:9618>: Failed to send UDP update command to collector

It looks like you have a DNS problem. frontend-0.local does not look like a correct hostname.


10/1 17:21:34 UserLog::initialize:
open("/home/condor/spool/cluster30.proc0.subproc0/test.log") failed - errno 13
(Permission denied)

It looks like the permissions on /home/condor/spool are set incorrectly. Make sure that they are open enough for the condor user to access. (If you're using the condor user--otherwise make sure they are appropriate for the user that you are using.)


-alain