[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Strange claimed problems [SEC=UNCLASSIFIED]



Title: Strange claimed problems [SEC=UNCLASSIFIED]

UNCLASSIFIED

Hi All,

I've been having some very strange problems with non-existent jobs claiming processors. Here's what has happened:

1) Submitted a batch of jobs (somewhere around 10000). Soon after, I realized I'd made a mistake with the executable, so I went to remove them with condor_rm. This removed all the jobs, but the ones that were currently running were only marked as deleted, and showed up in the queue as such (marked "X").

2) Fixed the executable and resubmitted the jobs. Was running rather slowly, checked condor_status and saw a lot of the claimed nodes were idle, which often happens anyway for some unknown reason.

3) Let all the resubmitted jobs finish running. This is maybe a day or so later, and the first set of removed jobs were still showing up in the queue. Decided to do a condor_rm -all -forcex to really get rid of them. This works, condor_q is now empty.

4) Go back to check condor_status. A majority of processors are -still- sitting in Claimed/Idle, even though there are no jobs! If I do a condor_status -claimed, they are all claimed by me, suggesting they're still claimed by the first batch of jobs that (should have) been removed/deleted.

Note that no other jobs have been submitted/deleted other than these during this time.

I was wondering if anyone has seen this odd behaviour before, and if there is any way of fixing it (short of rebooting condor master).

Thank you.

IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.