[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Quotas - accepting surplus but not too much surplus

On Aug 5, 2013, at 10:39 AM, Keith Chadwick <chadwick@xxxxxxxx> wrote:

> At Fermilab, we use quotas and we also wanted a mechanism to allow jobs to complete,
> yet implement preemption.
> So...
> We started by histogramming the job durations, and analyzed the histograms.
> The results for the ensemble of our workloads (pretty much independent of the
> individual workloads) was that job duration peaked between 4 and 6 hours, and
> there was an exponential falloff from the peak.  More than 95% of jobs completed
> in less than 24 hours.
> The full analysis is available here:
> 	http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=3246
> Based on this analysis we set a preemption timeout of 24 hours.

Very interesting!

As a pro-tip, HTCondor 8.0 automatically calculates some similar histograms for you.  From the output of "condor_status -l -schedd":

JobsRuntimesHistogramBuckets = "30Sec, 1Min, 3Min, 10Min, 30Min, 1Hr, 3Hr, 6Hr, 12Hr, 1Day, 2Day, 4Day, 8Day, 16Day"
JobsCompletedRuntimes = "49013, 12376, 10268, 5203, 73025, 15853, 27233, 32692, 34783, 23434, 10062, 8, 0, 0, 0"

Unfortunately, the histogram buckets are not sysadmin-customizable.  Honestly, I haven't had much time to play with these locally.  I suspect the uneven buckets would cause me heartache.  It may also be useful to request the aggregate job runtime for each bucket instead of the job count.

(there's a similar mechanism for job memory usage)

> The results is that users get their "dedicated" slots (quotas actually) and can
> "opportunistically" use more than their quota.  When sufficient quota'd users
> need slots, the opportunistic jobs are signaled that they should preempt with
> a preemption time of 24 hours.  Since the above analysis shows that the typical
> job duration is less than 24 hours, the jobs get to complete, and the cluster
> reclaims the slot for the quota'd use.

This mechanism gets a little blurry with the use of pilot jobs; however, I know there are experiments which aim to come up with nicer preemption mechanisms for pilots.