[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Shadow Exception !!!



I've installed the last version of condor in my PC,
and it's running ok under linux (Redhat 9). The
problem appears when I sent a process to run, after
running for a few seconds a shadow exception appears,
and the the process starts again until another shadow
exception stops it. I dunno what's happening, if you
run the process in one pc wothout using condors works
perfect. 
Looking at the logs first I saw in the log of my
program this:
000 (003.000.000) 09/21 13:40:49 Job submitted from
host: <193.147.240.233:36284
>
...
001 (003.000.000) 09/21 13:40:52 Job executing on
host: <193.147.240.233:36286>
...
006 (003.000.000) 09/21 13:41:00 Image size of job
updated: 1332
...
007 (003.000.000) 09/21 13:41:42 Shadow exception!
        Can no longer talk to condor_starter
<193.147.240.233:36286>
        0  -  Run Bytes Sent By Job
        14829  -  Run Bytes Received By Job
*****************************************************

So i looked in the Starterlog and this is what I've
got:

Starterlog:

9/21 13:40:52
******************************************************
9/21 13:40:52 ** condor_starter (CONDOR_STARTER)
STARTING UP
9/21 13:40:52 **
/home/condor/condor-6.7.1/sbin/condor_starter
9/21 13:40:52 ** $CondorVersion: 6.7.1 Aug 10 2004 $
9/21 13:40:52 ** $CondorPlatform: I386-LINUX_RH9 $
9/21 13:40:52 ** PID = 15935
9/21 13:40:52
******************************************************
9/21 13:40:52 Using config file:
/home/condor/condor-6.7.1/etc/condor_config
9/21 13:40:52 Using local config files:
/home/condor/condor-6.7.1/local.golem/c
o
ndor_config.local
9/21 13:40:52 DaemonCore: Command Socket at
<193.147.240.233:36308>
9/21 13:40:52 Done setting resource limits
9/21 13:40:52 Communicating with shadow
<193.147.240.233:36306>
9/21 13:40:52 Submitting machine is "golem.imim.es"
9/21 13:40:52 File transfer completed successfully.
9/21 13:40:52 Starting a VANILLA universe job with ID:
3.0
9/21 13:40:52 IWD:
/home/condor/condor-6.7.1/local.golem/execute/dir_15935
9/21 13:40:52 Output file:
/home/condor/condor-6.7.1/local.golem/execute/dir_15
935/2program.out
9/21 13:40:52 Error file:
/home/condor/condor-6.7.1/local.golem/execute/dir_159
35/2program.err
9/21 13:40:52 About to exec
/home/condor/condor-6.7.1/local.golem/execute/dir_1
5935/condor_exec.exe
9/21 13:40:52 Create_Process succeeded, pid=15937
9/21 13:41:42 Process exited, pid=15937, status=0
9/21 13:41:42 ReliSock: put_file: Failed to open file
/home/condor/condor-6.7.1
/local.golem/execute/dir_15935/2program.log, errno =
2.
9/21 13:41:42 ERROR "DoUpload: Failed to send file
/home/condor/condor-6.7.1/lo
cal.golem/execute/dir_15935/2program.log, exiting at
1408
" at line 1407 in file file_transfer.C
9/21 13:41:42 ShutdownFast all jobs.
*****************************************************

the Shadowlog :
9/21 13:40:52
******************************************************
9/21 13:40:52 ** condor_shadow (CONDOR_SHADOW)
STARTING UP
9/21 13:40:52 **
/home/condor/condor-6.7.1/sbin/condor_shadow
9/21 13:40:52 ** $CondorVersion: 6.7.1 Aug 10 2004 $
9/21 13:40:52 ** $CondorPlatform: I386-LINUX_RH9 $
9/21 13:40:52 ** PID = 15934
9/21 13:40:52
******************************************************
9/21 13:40:52 Using config file:
/home/condor/condor-6.7.1/etc/condor_config
9/21 13:40:52 Using local config files:
/home/condor/condor-6.7.1/local.golem/c
ondor_config.local
9/21 13:40:52 DaemonCore: Command Socket at
<193.147.240.233:36306>
9/21 13:40:52 Initializing a VANILLA shadow for job
3.0
9/21 13:40:52 (3.0) (15934): Request to run on
<193.147.240.233:36286> was ACCE
PTED
9/21 13:41:42 (3.0) (15934): ERROR "Can no longer talk
to condor_starter <193.1
47.240.233:36286>" at line 93 in file NTreceivers.C
*********************

Anyone knows where the problem is? 

BTW, I just have only one machine that everytime a
process is send it starts running imediately.
If you need more info let me know




		
______________________________________________
Renovamos el Correo Yahoo!: ¡100 MB GRATIS!
Nuevos servicios, más seguridad
http://correo.yahoo.es