[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Shadow Error!




Hello, HTCondor users.
I have a problem while I submitted a job and that seems to be "Shadow Error"

My current status is below:

1. 'condor_submit' command:
nickeys@ubuntu:/home/condor$ ls
submit Âtest
nickeys@ubuntu:/home/condor$ condor_submit submit
Submitting job(s).
1 job(s) submitted to cluster 5.
nickeys@ubuntu:/home/condor$ condor_q


-- Schedd: ubuntu : <xx.xxx.xx.xxx:9618?... @ 02/08/17 18:48:54
OWNER Â BATCH_NAME Â Â Â Â Â Â Â ÂSUBMITTED Â DONE Â RUN Â ÂIDLE ÂTOTAL JOB_IDS
nickeys CMD: /home/condor/test  2/8 Â18:48   Â_   Â1   Â_   Â1 5.0

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
nickeys@ubuntu:/home/condor$ condor_q


-- Schedd: ubuntu : <xx.xxx.xx.xxx:9618?... @ 02/08/17 18:48:58
OWNER Â BATCH_NAME Â Â Â Â Â Â Â ÂSUBMITTED Â DONE Â RUN Â ÂIDLE ÂTOTAL JOB_IDS
nickeys CMD: /home/condor/test  2/8 Â18:48   Â_   Â1   Â_   Â1 5.0

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
nickeys@ubuntu:/home/condor$ condor_q


-- Schedd: ubuntu : <xx.xxx.xx.xxx:9618?... @ 02/08/17 18:49:06
OWNER Â BATCH_NAME Â Â Â Â Â Â Â ÂSUBMITTED Â DONE Â RUN Â ÂIDLE ÂTOTAL JOB_IDS
nickeys CMD: /home/condor/test  2/8 Â18:48   Â_   Â_   Â1   Â1 5.0

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended



2. 'submit' file:
Executable = test
Arguments Â= 10 20
Log    Â= test.log
Output   = test.out
Error   Â= test.error
Queue


3. output log(User log?):
000 (006.000.000) 02/08 19:04:34 Job submitted from host: <10.150.21.171:9618?addrs=10.150.21.171-9618+[--1]-9618&noUDP&sock=26312_3ac7>
...
001 (006.000.000) 02/08 19:04:34 Job executing on host: <10.150.21.170:9618?addrs=10.150.21.170-9618+[--1]-9618&noUDP&sock=226707_09b2_88>
...
007 (006.000.000) 02/08 19:04:35 Shadow exception!
    Error from slot1@ubuntu: Create_Process failed to register the job with the ProcD
    0 Â- ÂRun Bytes Sent By Job
    0 Â- ÂRun Bytes Received By Job
...
001 (006.000.000) 02/08 19:04:36 Job executing on host: <10.150.21.170:9618?addrs=10.150.21.170-9618+[--1]-9618&noUDP&sock=226707_09b2_88>
...
007 (006.000.000) 02/08 19:04:37 Shadow exception!
    Error from slot1@ubuntu: Create_Process failed to register the job with the ProcD
    0 Â- ÂRun Bytes Sent By Job
    0 Â- ÂRun Bytes Received By Job
...
001 (006.000.000) 02/08 19:04:38 Job executing on host: <10.150.21.170:9618?addrs=10.150.21.170-9618+[--1]-9618&noUDP&sock=226707_09b2_88>
...
007 (006.000.000) 02/08 19:04:40 Shadow exception!
    Error from slot1@ubuntu: Create_Process failed to register the job with the ProcD
    0 Â- ÂRun Bytes Sent By Job
    0 Â- ÂRun Bytes Received By Job
...

4. SchedLog:
02/08/17 19:14:40 (pid:26312) Increasing flock level for nickeys to 2 from 1. (Due to lack of activity from negotiator)
02/08/17 19:14:40 (pid:26312) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/08/17 19:14:40 (pid:26312) TransferQueueManager upload 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:14:40 (pid:26312) TransferQueueManager download 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:14:40 (pid:26312) condor_read() failed: recv() 5 bytes from collector aiden-1.dgist.ac.kr returned -1, timeout=20, errno=104 Connection reset by peer.
02/08/17 19:14:40 (pid:26312) IO: Failed to read packet header
02/08/17 19:14:40 (pid:26312) SECMAN: no classad from server, failing
02/08/17 19:14:40 (pid:26312) ERROR: SECMAN:2007:Failed to end classad message.
02/08/17 19:14:40 (pid:26312) Failed to start non-blocking update to <10.150.21.171:9618>.
02/08/17 19:14:40 (pid:26312) condor_read() failed: recv() 5 bytes from collector aiden-1.dgist.ac.kr returned -1, timeout=20, errno=104 Connection reset by peer.
02/08/17 19:14:40 (pid:26312) IO: Failed to read packet header
02/08/17 19:14:40 (pid:26312) SECMAN: no classad from server, failing
02/08/17 19:14:40 (pid:26312) ERROR: SECMAN:2007:Failed to end classad message.
02/08/17 19:14:40 (pid:26312) Failed to start non-blocking update to <10.150.21.171:9618>.
02/08/17 19:15:00 (pid:26054) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/08/17 19:15:00 (pid:26054) TransferQueueManager upload 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:15:00 (pid:26054) TransferQueueManager download 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:19:41 (pid:26312) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/08/17 19:19:41 (pid:26312) TransferQueueManager upload 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:19:41 (pid:26312) TransferQueueManager download 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:19:41 (pid:26312) condor_read() failed: recv() 5 bytes from collector aiden-1.dgist.ac.kr returned -1, timeout=20, errno=104 Connection reset by peer.
02/08/17 19:19:41 (pid:26312) IO: Failed to read packet header
02/08/17 19:19:41 (pid:26312) SECMAN: no classad from server, failing
02/08/17 19:19:41 (pid:26312) ERROR: SECMAN:2007:Failed to end classad message.
02/08/17 19:19:41 (pid:26312) Failed to start non-blocking update to <10.150.21.171:9618>.
02/08/17 19:19:41 (pid:26312) condor_read() failed: recv() 5 bytes from collector aiden-1.dgist.ac.kr returned -1, timeout=20, errno=104 Connection reset by peer.
02/08/17 19:19:41 (pid:26312) IO: Failed to read packet header
02/08/17 19:19:41 (pid:26312) SECMAN: no classad from server, failing
02/08/17 19:19:41 (pid:26312) ERROR: SECMAN:2007:Failed to end classad message.
02/08/17 19:19:41 (pid:26312) Failed to start non-blocking update to <10.150.21.171:9618>.
02/08/17 19:20:00 (pid:26054) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/08/17 19:20:00 (pid:26054) TransferQueueManager upload 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
02/08/17 19:20:00 (pid:26054) TransferQueueManager download 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load



5. host command:
nickeys@ubuntu:/home/condor$ host xxx.xxx.xxx.xxx
Host xxx.xxx.xxx.xxx.in-addr.arpa. not found: 3(NXDOMAIN)
Â
I have done googling for a while, but I could not find any solution fit for my case.
Please give some clue to resolve this problem.

Sincerely,