[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Shadow exception: Unable to talk to job: disconnected



Hello,

I installed Condor 6.6.6 (for RedHat 9) for use on a cluster of Itanim2
cpus with IA64 technology.
Operating System: Red Hat Linux Advanced Server release 2.1AS 
Kernel: Linux version 2.4.18-e.31smp

Starting the daemons on the nodes seems OK (condor_status lists
the computational nodes of my pool). I tried to submit the basic
example 'hello' and a shadow exception occured during execution.

007 (002.000.000) 10/13 14:35:35 Shadow exception!
Unable to talk to job: disconnected

	32  -  Run Bytes Sent By Job
	32  -  Run Bytes Received By Job

The contents of the shadow log of the master node follows. 

10/13 14:06:59 (?.?) (22653):******* Standard Shadow starting up *******
10/13 14:06:59 (?.?) (22653):** $CondorVersion: 6.6.6 Jul 26 2004 $
10/13 14:06:59 (?.?) (22653):** $CondorPlatform: I386-LINUX_RH9 $
10/13 14:06:59 (?.?) (22653):*******************************************
10/13 14:06:59 (?.?) (22653):uid=59062, euid=59062, gid=59001,
egid=59001
10/13 14:06:59 (?.?) (22653):Hostname = "<129.88.97.11:36924>", Job =
1.0
10/13 14:06:59 (1.0) (22653):Requesting Primary Starter
10/13 14:06:59 (1.0) (22653):Shadow: Request to run a job was ACCEPTED
10/13 14:06:59 (1.0) (22653):Shadow: RSC_SOCK connected, fd = 17
10/13 14:06:59 (1.0) (22653):Shadow: CLIENT_LOG connected, fd = 18
10/13 14:06:59 (1.0) (22653):My_Filesystem_Domain = "imag.fr"
10/13 14:06:59 (1.0) (22653):My_UID_Domain = "imag.fr"
10/13 14:07:09 (1.0) (22653):ERROR "Unable to talk to job: disconnected
" at line 116 in file receivers.C
10/13 14:07:09 (1.0) (22653):Shadow: DoCleanup: unlinking TmpCkpt
'/home/externe/cahon/condor/hosts/ita11/spool/cluster1.proc0.subproc0.tmp'
10/13 14:07:09 (1.0) (22653):Trying to unlink
/home/externe/cahon/condor/hosts/ita11/spool/cluster1.proc0.subproc0.tmp

Can someone help me ?

Thanks a lot.

Regards.
-- 
Sebastien CAHON,
http://www.lifl.fr/~cahon/