[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [Birdbath Related] Can't remove Jobs and Clusters



You need to call CloseSpool() on the jobs before they can be removed. If the job is completed, calling CloseSpool() will tell the Schedd that the jobs can be collected, you won't even have to call RemoveJob().

As for the Status being null, that shouldn't be.


matt

On Mar 26, 2006, at 2:15 PM, Afrasyab Bashir wrote:

Hi Matt,

I've accumulated many jobs with JobStatus = 1 in the queue. Now I'm trying
to kill all these jobs (or just remove them) using removeJob and /or
removeCluster functions without success. condor.Status returned is a null object all the times. Could you please have a look at the log to advise? . A few things that I have noticed in the log are mentioned below. Sorry if you
find it very basic but I can't understand this :(

a) ProcAPI sanity failure
b) command 60011 (DC_NOP), calling handler (handle_nop())
c) VACATE_SERVICE
c) RELEASE_CLAIM

Cheers
Afras

Log

3/26 20:53:00 (pid:4920) ProcAPI sanity failure, user_time = -167

3/26 20:53:00 (pid:4920) ProcAPI sanity failure, age = -97025303

3/26 20:54:18 (pid:4920) Activity on stashed negotiator socket

3/26 20:54:18 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP

3/26 20:54:18 (pid:4920) Checking consistency running and runnable jobs

3/26 20:54:18 (pid:4920) Tables are consistent

3/26 20:54:18 (pid:4920) Out of servers - 1 jobs matched, 1 jobs idle, 1
jobs rejected

3/26 20:54:18 (pid:4920) Activity on stashed negotiator socket

3/26 20:54:18 (pid:4920) Negotiating for owner: S2VP@afrasyab-LAPTOP

3/26 20:54:18 (pid:4920) Checking consistency running and runnable jobs

3/26 20:54:18 (pid:4920) Tables are consistent

3/26 20:54:18 (pid:4920) Out of servers - 1 jobs matched, 104 jobs idle, 3
jobs rejected

3/26 20:54:22 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:22 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 124)

3/26 20:54:23 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab- LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:54:23 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab- LAPTOP

3/26 20:54:24 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3283>

3/26 20:54:24 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:24 (pid:4920) Shadow pid 124 for job 25.0 exited with status 4

3/26 20:54:24 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:26 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:26 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 3884)

3/26 20:54:26 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3285>

3/26 20:54:26 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:26 (pid:4920) Shadow pid 3884 for job 13.0 exited with status 4

3/26 20:54:26 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:28 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:28 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 3068)

3/26 20:54:28 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3287>

3/26 20:54:28 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:28 (pid:4920) Shadow pid 3068 for job 25.0 exited with status 4

3/26 20:54:28 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:30 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:31 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 1120)

3/26 20:54:31 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3289>

3/26 20:54:31 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:31 (pid:4920) Shadow pid 1120 for job 13.0 exited with status 4

3/26 20:54:31 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:33 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:34 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 5428)

3/26 20:54:34 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3291>

3/26 20:54:34 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:34 (pid:4920) Shadow pid 5428 for job 25.0 exited with status 4

3/26 20:54:34 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:36 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:36 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 4396)

3/26 20:54:37 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3296>

3/26 20:54:37 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:37 (pid:4920) Shadow pid 4396 for job 13.0 exited with status 4

3/26 20:54:37 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:38 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:38 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 5400)

3/26 20:54:39 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3298>

3/26 20:54:39 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:39 (pid:4920) Shadow pid 5400 for job 25.0 exited with status 4

3/26 20:54:39 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:40 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:40 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 4660)

3/26 20:54:41 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3300>

3/26 20:54:41 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:41 (pid:4920) Shadow pid 4660 for job 13.0 exited with status 4

3/26 20:54:41 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:42 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:54:42 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 4992)

3/26 20:54:43 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3302>

3/26 20:54:43 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:43 (pid:4920) Shadow pid 4992 for job 25.0 exited with status 4

3/26 20:54:43 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:43 (pid:4920) Match for cluster 25 has had 5 shadow exceptions,
relinquishing.

3/26 20:54:43 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.3:2472>

3/26 20:54:43 (pid:4920) Match record (<192.168.1.3:2472>, 25, 0) deleted

3/26 20:54:43 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.3:3305>

3/26 20:54:43 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)

3/26 20:54:43 (pid:4920) Got VACATE_SERVICE from <192.168.1.3:3305>

3/26 20:54:44 (pid:4920) Starting add_shadow_birthdate(13.0)

3/26 20:54:44 (pid:4920) Started shadow for job 13.0 on
"<192.168.1.2:2115>", (shadow pid = 3924)

3/26 20:54:44 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab- LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:54:44 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab- LAPTOP

3/26 20:54:45 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3314>

3/26 20:54:45 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:54:45 (pid:4920) Shadow pid 3924 for job 13.0 exited with status 4

3/26 20:54:45 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:54:45 (pid:4920) Match for cluster 13 has had 5 shadow exceptions,
relinquishing.

3/26 20:54:45 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.2:2115>

3/26 20:54:45 (pid:4920) Match record (<192.168.1.2:2115>, 13, 0) deleted

3/26 20:54:49 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.2:3748>

3/26 20:54:49 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)

3/26 20:54:49 (pid:4920) Got VACATE_SERVICE from <192.168.1.2:3748>

3/26 20:57:00 (pid:4920) ProcAPI sanity failure, user_time = -165

3/26 20:57:01 (pid:4920) ProcAPI sanity failure, age = -97025063

3/26 20:58:43 (pid:4920) Received HTTP POST connection from
<192.168.1.3:3345>

3/26 20:58:43 (pid:4920) About to serve HTTP request...

3/26 20:58:44 (pid:4920) Completed servicing HTTP request

3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket

3/26 20:59:19 (pid:4920) Negotiating for owner:
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs

3/26 20:59:19 (pid:4920) Tables are consistent

3/26 20:59:19 (pid:4920) Out of jobs - 1 jobs matched, 0 jobs idle, flock
level = 0

3/26 20:59:19 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab- LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab- LAPTOP

3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket

3/26 20:59:19 (pid:4920) Negotiating for owner: s2vp@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs

3/26 20:59:19 (pid:4920) Tables are consistent

3/26 20:59:19 (pid:4920) Out of servers - 0 jobs matched, 2 jobs idle, 1
jobs rejected

3/26 20:59:19 (pid:4920) Activity on stashed negotiator socket

3/26 20:59:19 (pid:4920) Negotiating for owner: S2VP@afrasyab-LAPTOP

3/26 20:59:19 (pid:4920) Checking consistency running and runnable jobs

3/26 20:59:19 (pid:4920) Tables are consistent

3/26 20:59:19 (pid:4920) Out of servers - 1 jobs matched, 104 jobs idle, 3
jobs rejected

3/26 20:59:23 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:24 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 5908)

3/26 20:59:24 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab- LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:24 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab- LAPTOP

3/26 20:59:25 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3367>

3/26 20:59:25 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:25 (pid:4920) Shadow pid 5908 for job 25.0 exited with status 4

3/26 20:59:25 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:27 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:27 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 5372)

3/26 20:59:27 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3369>

3/26 20:59:27 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:27 (pid:4920) Shadow pid 5372 for job 12.0 exited with status 4

3/26 20:59:27 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:29 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:30 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 4972)

3/26 20:59:30 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3371>

3/26 20:59:30 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:30 (pid:4920) Shadow pid 4972 for job 25.0 exited with status 4

3/26 20:59:30 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:32 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:32 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 3224)

3/26 20:59:33 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3373>

3/26 20:59:33 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:33 (pid:4920) Shadow pid 3224 for job 12.0 exited with status 4

3/26 20:59:33 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:34 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:34 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 3000)

3/26 20:59:34 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3375>

3/26 20:59:34 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:34 (pid:4920) Shadow pid 3000 for job 25.0 exited with status 4

3/26 20:59:34 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:36 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:36 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 5884)

3/26 20:59:37 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3380>

3/26 20:59:37 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:37 (pid:4920) Shadow pid 5884 for job 12.0 exited with status 4

3/26 20:59:37 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:38 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:38 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 652)

3/26 20:59:38 (pid:4920) Received HTTP POST connection from
<192.168.1.3:3382>

3/26 20:59:38 (pid:4920) About to serve HTTP request...

3/26 20:59:39 (pid:4920) Completed servicing HTTP request

3/26 20:59:40 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:40 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 4500)

3/26 20:59:40 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab- LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:40 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab- LAPTOP

3/26 20:59:40 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3388>

3/26 20:59:40 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:40 (pid:4920) Shadow pid 652 for job 25.0 exited with status 4

3/26 20:59:40 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:41 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3389>

3/26 20:59:41 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:41 (pid:4920) Shadow pid 4500 for job 12.0 exited with status 4

3/26 20:59:41 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:43 (pid:4920) Starting add_shadow_birthdate(25.0)

3/26 20:59:43 (pid:4920) Started shadow for job 25.0 on
"<192.168.1.3:2472>", (shadow pid = 4300)

3/26 20:59:43 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3391>

3/26 20:59:43 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:43 (pid:4920) Shadow pid 4300 for job 25.0 exited with status 4

3/26 20:59:43 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:43 (pid:4920) Match for cluster 25 has had 5 shadow exceptions,
relinquishing.

3/26 20:59:43 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.3:2472>

3/26 20:59:43 (pid:4920) Match record (<192.168.1.3:2472>, 25, 0) deleted

3/26 20:59:43 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.3:3394>

3/26 20:59:43 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)

3/26 20:59:43 (pid:4920) Got VACATE_SERVICE from <192.168.1.3:3394>

3/26 20:59:45 (pid:4920) Starting add_shadow_birthdate(12.0)

3/26 20:59:46 (pid:4920) Started shadow for job 12.0 on
"<192.168.1.2:2115>", (shadow pid = 4992)

3/26 20:59:46 (pid:4920) Sent ad to central manager for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for
s2vp@afrasyab-LAPTOP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to central manager for S2VP@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for S2VP@afrasyab- LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to central manager for s2vp@afrasyab-LAPTOP

3/26 20:59:46 (pid:4920) Sent ad to 1 collectors for s2vp@afrasyab- LAPTOP

3/26 20:59:46 (pid:4920) DaemonCore: Command received via UDP from host
<192.168.1.3:3402>

3/26 20:59:46 (pid:4920) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())

3/26 20:59:46 (pid:4920) Shadow pid 4992 for job 12.0 exited with status 4

3/26 20:59:46 (pid:4920) ERROR: Shadow exited with job exception code!

3/26 20:59:46 (pid:4920) Match for cluster 12 has had 5 shadow exceptions,
relinquishing.

3/26 20:59:46 (pid:4920) Sent RELEASE_CLAIM to startd on <192.168.1.2:2115>

3/26 20:59:46 (pid:4920) Match record (<192.168.1.2:2115>, 12, 0) deleted

3/26 20:59:51 (pid:4920) DaemonCore: Command received via TCP from host
<192.168.1.2:3761>

3/26 20:59:51 (pid:4920) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)

3/26 20:59:51 (pid:4920) Got VACATE_SERVICE from <192.168.1.2:3761>

3/26 21:01:01 (pid:4920) ProcAPI sanity failure, user_time = -164

3/26 21:01:01 (pid:4920) ProcAPI sanity failure, age = -97024822

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users