[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Claimed and Idle
- Date: Wed, 19 Nov 2008 10:15:58 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Claimed and Idle
What version of Condor are you running?
Henning Fehrmann wrote:
We had appr 65000 jobs in the queue - the schedd was extremely busy.
We put 60000 of them in the hold state to relax the situation a bit.
Afterwards, we realized that almost all the slots went into
the 'Claimed Idle' status.
A condor_off node and condor_on node put the slots back into the 'Unclaimed
Idle' status. After a few minutes we found the slots again in the
undesired 'Claimed Idle' status.
Here is a part of the StartLog of a particular node:
11/19 15:47:37 slot2: Got activate_claim request from shadow (<10.20.30.1:39113>)
11/19 15:47:37 slot2: Remote job ID is 6373407.0
11/19 15:47:37 slot2: Got universe "STANDARD" (1) from request classad
11/19 15:47:37 slot2: State change: claim-activation protocol successful
11/19 15:47:37 slot2: Changing activity: Idle -> Busy
11/19 15:48:18 slot2: Called deactivate_claim_forcibly()
11/19 15:48:18 condor_write(): Socket closed when trying to write 56 bytes to <10.20.30.1:54703>, fd is 5
11/19 15:48:18 Buf::write(): condor_write() failed
11/19 15:48:18 Starter pid 880 exited with status 0
11/19 15:48:18 slot2: State change: starter exited
11/19 15:48:18 slot2: Changing activity: Busy -> Idle
A condor restart on the schedd host solved the problem.
Has anybody a clue what happened?