[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Specifying Max Job Run Time



Terrence Martin wrote:
I do not suppose there is a recipe out there for restricting how long a job runs while at the same time making sure that jobs are not interrupted prior to that run time and for the long run jobs only to be kicked if there are other jobs waiting for the queue? Say a value of 72 hours, that is both the min and max runtime for jobs but if the jobs are the only one on the cluster that they can just keep running.

There are a lot of settings it seems, from PREEMPTION_REQUIREMENTS to MaxJobRetirementTime to PREEMPT_LATENCY. It is just not all that clear to me how to go about getting all these settings to do what I want as far as putting an upper limit on jobs after which they can not be guaranteed to run, while at the same time not kicking off jobs that may be running long for legitimate reasons on an otherwise underused cluster.

Terrence,

It sounds like you just need jobs to preempt quickly but with a long retirement time. We have this at the CMS Tier 1 at FNAL. MaxJobRetirementTime is set to 48 hours. PREEMPTION_REQUIREMENTS is essentially set to

(CurrentTime - EnteredCurrentState) > (10*60)

In this configuration, jobs get preempted after 10 minutes of running if jobs are waiting in the queue; but the preemption does not evict the running jobs until it hits 48 hours.

- B