[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] [Birdbath Related] Can't remove Jobs and Clusters



Hi Matt,

I've accumulated many jobs with JobStatus = 1 in the queue. Now I'm trying to kill all these jobs (or just remove them) using removeJob and /or removeCluster functions without success. condor.Status returned is a null object all the times. Could you please have a look at the log to advise? . A few things that I have noticed in the log are mentioned below. Sorry if you find it very basic but I can't understand this :(

a) ProcAPI sanity failure
b) command 60011 (DC_NOP), calling handler (handle_nop())
c) VACATE_SERVICE
c) RELEASE_CLAIM

Cheers
Afras

Log

3/26 20:53:00 (pid:4920) ProcAPI sanity failure, user_time = -167

3/26 20:53:00 (pid:4920) ProcAPI sanity failure, age = -97025303

3/26 20:54:18 (pid:4920) Activity on stashed negotiator socket

3/26 20:54:18 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP

3/26 20:54:18 (pid:4920) Checking consistency running and runnable jobs

3/26 20:54:18 (pid:4920) Tables are consistent

3/26 20:54:18 (pid:4920) Out of servers - 1 jobs matched, 1 jobs idle, 1 jobs rejected

3/26 20:54:18 (pid:4920) Activity on stashed negotiator socket

3/26 20:54:18 (pid:4920) Negotiating for owner: S2VP@afrasyab-LAPTOP

3/26 20:54:18 (pid:4920) Checking consistency running and runnable jobs

3/26 20:54:18 (pid:4920) Tables are consistent

3/26 20:54:18 (pid:4920) Out of servers - 1 jobs matched, 104 jobs idle, 3 jobs rejected

3/26 20:54:22 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:22 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 124)

3/26 20:54:23 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP

3/26 20:54:24 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3283>

3/26 20:54:24 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:24 (pid:4920) Shadow pid 124 for job 25.0 exited with status 4

3/26 20:54:24 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:26 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:26 (pid:4920) Started shadow for job 13.0 on "<192.168.1.2:2115>", (shadow pid = 3884)

3/26 20:54:26 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3285>

3/26 20:54:26 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:26 (pid:4920) Shadow pid 3884 for job 13.0 exited with status 4

3/26 20:54:26 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:28 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:28 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 3068)

3/26 20:54:28 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3287>

3/26 20:54:28 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:28 (pid:4920) Shadow pid 3068 for job 25.0 exited with status 4

3/26 20:54:28 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:30 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:31 (pid:4920) Started shadow for job 13.0 on "<192.168.1.2:2115>", (shadow pid = 1120)

3/26 20:54:31 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3289>

3/26 20:54:31 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:31 (pid:4920) Shadow pid 1120 for job 13.0 exited with status 4

3/26 20:54:31 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:33 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:34 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 5428)

3/26 20:54:34 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3291>

3/26 20:54:34 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:34 (pid:4920) Shadow pid 5428 for job 25.0 exited with status 4

3/26 20:54:34 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:36 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:36 (pid:4920) Started shadow for job 13.0 on "<192.168.1.2:2115>", (shadow pid = 4396)

3/26 20:54:37 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3296>

3/26 20:54:37 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:37 (pid:4920) Shadow pid 4396 for job 13.0 exited with status 4

3/26 20:54:37 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:38 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:38 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 5400)

3/26 20:54:39 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3298>

3/26 20:54:39 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:39 (pid:4920) Shadow pid 5400 for job 25.0 exited with status 4

3/26 20:54:39 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:40 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:40 (pid:4920) Started shadow for job 13.0 on "<192.168.1.2:2115>", (shadow pid = 4660)

3/26 20:54:41 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3300>

3/26 20:54:41 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:41 (pid:4920) Shadow pid 4660 for job 13.0 exited with status 4

3/26 20:54:41 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:42 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:42 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 4992)

3/26 20:54:43 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3302>

3/26 20:54:43 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:43 (pid:4920) Shadow pid 4992 for job 25.0 exited with status 4

3/26 20:54:43 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:43 (pid:4920) Match for cluster 25 has had 5 shadow exceptions, relinquishing.

3/26 20:54:43 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.3:2472>

3/26 20:54:43 (pid:4920) Match record (<192.168.1.3:2472>, 25, 0) deleted

3/26 20:54:43 (pid:4920) DaemonCore: Command received via TCP from host <192.168.1.3:3305>

3/26 20:54:43 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)

3/26 20:54:43 (pid:4920) Got VACATE_SERVICE from <192.168.1.3:3305>

3/26 20:54:44 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:44 (pid:4920) Started shadow for job 13.0 on "<192.168.1.2:2115>", (shadow pid = 3924)

3/26 20:54:44 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP

3/26 20:54:45 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3314>

3/26 20:54:45 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:54:45 (pid:4920) Shadow pid 3924 for job 13.0 exited with status 4

3/26 20:54:45 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:45 (pid:4920) Match for cluster 13 has had 5 shadow exceptions, relinquishing.

3/26 20:54:45 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.2:2115>

3/26 20:54:45 (pid:4920) Match record (<192.168.1.2:2115>, 13, 0) deleted

3/26 20:54:49 (pid:4920) DaemonCore: Command received via TCP from host <192.168.1.2:3748>

3/26 20:54:49 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)

3/26 20:54:49 (pid:4920) Got VACATE_SERVICE from <192.168.1.2:3748>

3/26 20:57:00 (pid:4920) ProcAPI sanity failure, user_time = -165

3/26 20:57:01 (pid:4920) ProcAPI sanity failure, age = -97025063

3/26 20:58:43 (pid:4920) Received HTTP POST connection from <192.168.1.3:3345>

3/26 20:58:43 (pid:4920) About to serve HTTP request...

3/26 20:58:44 (pid:4920) Completed servicing HTTP request

3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket

3/26 20:59:19 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs

3/26 20:59:19 (pid:4920) Tables are consistent

3/26 20:59:19 (pid:4920) Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0

3/26 20:59:19 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket

3/26 20:59:19 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs

3/26 20:59:19 (pid:4920) Tables are consistent

3/26 20:59:19 (pid:4920) Out of servers - 0 jobs matched, 2 jobs idle, 1 jobs rejected

3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket

3/26 20:59:19 (pid:4920) Negotiating for owner: S2VP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs

3/26 20:59:19 (pid:4920) Tables are consistent

3/26 20:59:19 (pid:4920) Out of servers - 1 jobs matched, 104 jobs idle, 3 jobs rejected

3/26 20:59:23 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:24 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 5908)

3/26 20:59:24 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP

3/26 20:59:25 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3367>

3/26 20:59:25 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:25 (pid:4920) Shadow pid 5908 for job 25.0 exited with status 4

3/26 20:59:25 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:27 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:27 (pid:4920) Started shadow for job 12.0 on "<192.168.1.2:2115>", (shadow pid = 5372)

3/26 20:59:27 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3369>

3/26 20:59:27 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:27 (pid:4920) Shadow pid 5372 for job 12.0 exited with status 4

3/26 20:59:27 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:29 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:30 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 4972)

3/26 20:59:30 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3371>

3/26 20:59:30 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:30 (pid:4920) Shadow pid 4972 for job 25.0 exited with status 4

3/26 20:59:30 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:32 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:32 (pid:4920) Started shadow for job 12.0 on "<192.168.1.2:2115>", (shadow pid = 3224)

3/26 20:59:33 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3373>

3/26 20:59:33 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:33 (pid:4920) Shadow pid 3224 for job 12.0 exited with status 4

3/26 20:59:33 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:34 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:34 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 3000)

3/26 20:59:34 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3375>

3/26 20:59:34 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:34 (pid:4920) Shadow pid 3000 for job 25.0 exited with status 4

3/26 20:59:34 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:36 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:36 (pid:4920) Started shadow for job 12.0 on "<192.168.1.2:2115>", (shadow pid = 5884)

3/26 20:59:37 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3380>

3/26 20:59:37 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:37 (pid:4920) Shadow pid 5884 for job 12.0 exited with status 4

3/26 20:59:37 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:38 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:38 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 652)

3/26 20:59:38 (pid:4920) Received HTTP POST connection from <192.168.1.3:3382>

3/26 20:59:38 (pid:4920) About to serve HTTP request...

3/26 20:59:39 (pid:4920) Completed servicing HTTP request

3/26 20:59:40 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:40 (pid:4920) Started shadow for job 12.0 on "<192.168.1.2:2115>", (shadow pid = 4500)

3/26 20:59:40 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3388>

3/26 20:59:40 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:40 (pid:4920) Shadow pid 652 for job 25.0 exited with status 4

3/26 20:59:40 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:41 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3389>

3/26 20:59:41 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:41 (pid:4920) Shadow pid 4500 for job 12.0 exited with status 4

3/26 20:59:41 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:43 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:43 (pid:4920) Started shadow for job 25.0 on "<192.168.1.3:2472>", (shadow pid = 4300)

3/26 20:59:43 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3391>

3/26 20:59:43 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:43 (pid:4920) Shadow pid 4300 for job 25.0 exited with status 4

3/26 20:59:43 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:43 (pid:4920) Match for cluster 25 has had 5 shadow exceptions, relinquishing.

3/26 20:59:43 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.3:2472>

3/26 20:59:43 (pid:4920) Match record (<192.168.1.3:2472>, 25, 0) deleted

3/26 20:59:43 (pid:4920) DaemonCore: Command received via TCP from host <192.168.1.3:3394>

3/26 20:59:43 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)

3/26 20:59:43 (pid:4920) Got VACATE_SERVICE from <192.168.1.3:3394>

3/26 20:59:45 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:46 (pid:4920) Started shadow for job 12.0 on "<192.168.1.2:2115>", (shadow pid = 4992)

3/26 20:59:46 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) DaemonCore: Command received via UDP from host <192.168.1.3:3402>

3/26 20:59:46 (pid:4920) DaemonCore: received command 60011 (DC_NOP), calling handler (handle_nop())

3/26 20:59:46 (pid:4920) Shadow pid 4992 for job 12.0 exited with status 4

3/26 20:59:46 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:46 (pid:4920) Match for cluster 12 has had 5 shadow exceptions, relinquishing.

3/26 20:59:46 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.2:2115>

3/26 20:59:46 (pid:4920) Match record (<192.168.1.2:2115>, 12, 0) deleted

3/26 20:59:51 (pid:4920) DaemonCore: Command received via TCP from host <192.168.1.2:3761>

3/26 20:59:51 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)

3/26 20:59:51 (pid:4920) Got VACATE_SERVICE from <192.168.1.2:3761>

3/26 21:01:01 (pid:4920) ProcAPI sanity failure, user_time = -164

3/26 21:01:01 (pid:4920) ProcAPI sanity failure, age = -97024822