[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Is REQUEST_CLAIM_TIMEOUT value something  we can set directly? I don't
see it as a variable in the default condor_config file. I would like
to set it to something shorter, perhaps 1 minute. The current default
seems to be 30 minutes. Here is an example of a log entry:

SchedLog:07/26/12 06:09:11 (pid:12862) Timed out requesting claim
slot4@xxxxxxxxxxx <> for XXXXX after

We seem to be running in to multiple occurrences  of this daily and
suspect it is due to preemption and the MAXJOBRETIREMENTTIME value
being set to 3 hours. In the example above, a scheduler gets matched
up with slot4@xxxxxxxxxxx but can't seem to claim it since there is a
job running on c4.XXXX.com which is supposed to be preempted but is
still running due to the grace period specified by
MAXJOBRETIREMENTTIME. BTW, not sure if this is relevant or not but
this is happening across two pools with the scheduler on pool A and
the matched slot on pool B.

My current thought is to reduce the REQUEST_CLAIM_TIMEOUT to something
short like a minute so if the slot is not freed up, it will just move
on to the next free node. Current behavior is for this job to be tied
up even though other slots free up shortly after. Or is there a better
way to handle this issue?