[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] shadow exception error?



Dear condor-users:

I just installed condor_6.6.11 on our Linux cluster consisting of 12 nodes with NFS system.
When I typed condor_q and condor_status on the master node(central manager) and slave nodes(compute nodes), I got the normal screen output which told me how many jobs are running, etc. and which machines are in my pool. Then I tried to run the test example "sh_loop" under condor-6.6.11/examples as user condor by condor_submit sh_loop.cmd on my master node. The job terminated normally. However, when I tried to submit the sh_loop.cmd on my slave node I got shadow exception error message in file sh_loop.log as below:

000 (005.000.000) 07/19 22:13:57 Job submitted from host: <10.0.2.2:36083>
..
007 (005.000.000) 07/19 22:14:00 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:01 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:03 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:04 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:05 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
009 (005.000.000) 07/19 22:14:40 Job was aborted by the user.
        via condor_rm (by user condor)
..

Does anybody know the possible reason? 


 				

        Jun Wang
        junwang@xxxxxxxxxxxxxx
          2006-07-19