[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Claimed and Idle


What version of Condor are you running?


Henning Fehrmann wrote:

We had appr 65000 jobs in the queue - the schedd was extremely busy.
We put 60000 of them in the hold state to relax the situation a bit.

Afterwards, we realized that almost all the slots went into the 'Claimed Idle' status. A condor_off node and condor_on node put the slots back into the 'Unclaimed Idle' status. After a few minutes we found the slots again in the undesired 'Claimed Idle' status.

Here is a part of the StartLog of a particular node:
11/19 15:47:37 slot2: Got activate_claim request from shadow (<>)
11/19 15:47:37 slot2: Remote job ID is 6373407.0
11/19 15:47:37 slot2: Got universe "STANDARD" (1) from request classad
11/19 15:47:37 slot2: State change: claim-activation protocol successful
11/19 15:47:37 slot2: Changing activity: Idle -> Busy
11/19 15:48:18 slot2: Called deactivate_claim_forcibly()
11/19 15:48:18 condor_write(): Socket closed when trying to write 56 bytes to <>, fd is 5
11/19 15:48:18 Buf::write(): condor_write() failed
11/19 15:48:18 Starter pid 880 exited with status 0
11/19 15:48:18 slot2: State change: starter exited
11/19 15:48:18 slot2: Changing activity: Busy -> Idle

A condor restart on the schedd host solved the problem.
Has anybody a clue what happened?

Thank you,
Henning Fehrmann