[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Claimed and Idle



Hello, 

We had appr 65000 jobs in the queue - the schedd was extremely busy.
We put 60000 of them in the hold state to relax the situation a bit.

Afterwards, we realized that almost all the slots went into 
the 'Claimed Idle' status.
A condor_off node and condor_on node put the slots back into the 'Unclaimed 
Idle' status. After a few minutes we found the slots again in the 
undesired 'Claimed Idle' status.

Here is a part of the StartLog of a particular node: 

11/19 15:47:37 slot2: Got activate_claim request from shadow (<10.20.30.1:39113>)
11/19 15:47:37 slot2: Remote job ID is 6373407.0
11/19 15:47:37 slot2: Got universe "STANDARD" (1) from request classad
11/19 15:47:37 slot2: State change: claim-activation protocol successful
11/19 15:47:37 slot2: Changing activity: Idle -> Busy
11/19 15:48:18 slot2: Called deactivate_claim_forcibly()
11/19 15:48:18 condor_write(): Socket closed when trying to write 56 bytes to <10.20.30.1:54703>, fd is 5
11/19 15:48:18 Buf::write(): condor_write() failed
11/19 15:48:18 Starter pid 880 exited with status 0
11/19 15:48:18 slot2: State change: starter exited
11/19 15:48:18 slot2: Changing activity: Busy -> Idle

A condor restart on the schedd host solved the problem.
Has anybody a clue what happened?

Thank you,
Henning Fehrmann