[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] shadow exception error?



Hi Jun

I think you can set ALL_DEBUG=D_FULLDEBUG or D_FULLDEBUG to specific daemon
debug. Then more information can be found in log files.

Yaoheng Zhang

-----邮件原件-----
发件人: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] 代表 Jun Wang
发送时间: 19 July 2006 22:35
收件人: condor-users@xxxxxxxxxxx
主题: [Condor-users] shadow exception error?

Dear condor-users:

I just installed condor_6.6.11 on our Linux cluster consisting of 12 nodes
with NFS system.
When I typed condor_q and condor_status on the master node(central manager)
and slave nodes(compute nodes), I got the normal screen output which told me
how many jobs are running, etc. and which machines are in my pool. Then I
tried to run the test example "sh_loop" under condor-6.6.11/examples as user
condor by condor_submit sh_loop.cmd on my master node. The job terminated
normally. However, when I tried to submit the sh_loop.cmd on my slave node I
got shadow exception error message in file sh_loop.log as below:

000 (005.000.000) 07/19 22:13:57 Job submitted from host: <10.0.2.2:36083>
..
007 (005.000.000) 07/19 22:14:00 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:01 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:03 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:04 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
007 (005.000.000) 07/19 22:14:05 Shadow exception!
        Can no longer talk to condor_starter on execute machine (10.0.2.1)
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
..
009 (005.000.000) 07/19 22:14:40 Job was aborted by the user.
        via condor_rm (by user condor)
..

Does anybody know the possible reason? 


 				

        Jun Wang
        junwang@xxxxxxxxxxxxxx
          2006-07-19



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR