[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] fetchwork vs. claim_worklife
- Date: Tue, 12 Apr 2011 13:15:16 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] fetchwork vs. claim_worklife
On 4/12/11 12:46 PM, Carsten Aulbert wrote:
On Tuesday 12 April 2011 16:58:52 Dan Bradley wrote:
I am puzzled about why preemption is ineffective in the case where the
work-fetch job has higher rank than the existing claim. What version of
condor is this?
But I was not aware that preemption is needed to claim an idle slot
The logs you posted showed the slot transitioning to Claimed/Idle, not
Unclaimed/Idle. Therefore, the work-fetch job must preempt the claim of
the schedd that is holding it. I can't think of any reason why the
schedd would hold the claim after a job completes without starting
another job for an hour other than the schedd being very very busy.
Perhaps it would be worth looking into what exactly is going on with
that. One place to start would be the shadow log. Look at the shadow
that ran the job that ran on the claim before it transitioned to
Claimed/Idle for a long period of time. Did the shadow exit cleanly?
In the schedd log, can you see the schedd handling the exit of that
shadow? It should immediately launch another job on the claim at that
I am also curious why claims are sitting in Claimed/Idle for so long
after a job finishes. Is the schedd severely overloaded?
Not really - as far as I can tell, busy as usual with< ~50% CPU time on a
The schedd is single-threaded. It is possible for the cpu to be not
very busy but for the schedd to be having performance problems due to
disk i/o or blocking network communications. Is the schedd responsive
to condor_q queries?