[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Getting "next_job_start_delay" startup spreading without idle claimed slots



This is something I've been mulling for a while, and I finally had the flash of insight I needed.

In pools where you're not evicting jobs, it can be useful to throttle the rate of startup of certain kinds of jobs to offer a better chance for competing jobs to claim resources, or to limit the impact of a startup utilization spike in the job.

Now, you may be thinking about the SubmitterUserResourcesInUse attribute which provides the negotiator with the number of CPUs currently claimed by the user's running jobs, which allows you to set a requirements expression to limit the total number of running jobs in the pool as a whole using the Requirements expression:

	Requirements = (isUndefined(SubmitterUserResourcesInUse) || SubmitterUserResourcesInUse <= 50)

This will ensure that the total number of jobs running will not exceed 50, BUT, all 50 of those jobs might start in the same negotiator cycle - say on one of your spiffy 64-core machines - and inflict anguish on the machine with a 50-job spike of nearly simultaneous startup activity. In addition, this approach counts all of your jobs, not just the jobs in this cluster with this requirements expression. If you submitted another cluster which still has 100 jobs running, then none of the jobs in this cluster will start until your total use drops below 50.

You might also be thinking of the next_job_start_delay submit command. However, this does not space out the resource claims, only the startup of the executable on the exec node. So if you have 50 jobs which have a next_job_start_delay of 60 seconds, then your submission will claim all fifty slots immediately, but will take nearly an hour to start running actual productive work on all of them, leaving claimed resources idle as they wait for their turn to start.

With this technique, your submission will only claim resources when the delay period has elapsed and then immediately start using the claim, unlike the next_job_start_delay approach, via a straightforward requirements expression:


	START_DELAY = 120
	Requirements = (QDate <= time() - (ProcID * $(START_DELAY))

The first ProcID, 0, will fire up right out of the gate at the next negotiator run. ProcID 1 will launch two minutes later, and so on until all of the jobs in the cluster have been launched. In the meantime, all of the pending jobs in the cluster will remain idle, holding no resource claims. The minimum resolution of the interval, of course, is your NEGOTIATOR_INTERVAL configuration setting, which defaults to 60 seconds.

If you'd like to get fancy, you can display a startup countdown in the job description via this submit description entry:

	+JobDescription = ifThenElse(JobStatus == 1, \
		strcat(string( quantize(-1 * (time() - (ProcID * $(START_DELAY)) - QDate ) / 60, 1) ), " min to go"), \
		"MyJobDescr")

This gives you the following in condor_q -nobatch mode:
	ID      OWNER            SUBMITTED     RUN_TIME ST PRI  SIZE CMD
	5864.2   pelletm         4/2  20:27   0+00:00:00 I  0     0.0 (0 min to go)
	5864.3   pelletm         4/2  20:27   0+00:00:00 I  0     0.0 (2 min to go)
	5864.4   pelletm         4/2  20:27   0+00:00:00 I  0     0.0 (4 min to go)

... and then, when the next job starts running:
	5864.2   pelletm         4/2  20:27   0+00:00:21 R  0     0.0 (MyJobDescr)
	5864.3   pelletm         4/2  20:27   0+00:00:00 I  0     0.0 (1 min to go)
	5864.4   pelletm         4/2  20:27   0+00:00:00 I  0     0.0 (3 min to go)

Since this keys off of the queue date, a job which is held while it's running and then is later released will start running again immediately without any delay.

Enjoy!

	-Michael Pelletier.