[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Quotas - accepting surplus but not too much surplus



At Fermilab, we use quotas and we also wanted a mechanism to allow jobs to complete,
yet implement preemption.

So...

We started by histogramming the job durations, and analyzed the histograms.

The results for the ensemble of our workloads (pretty much independent of the
individual workloads) was that job duration peaked between 4 and 6 hours, and
there was an exponential falloff from the peak. More than 95% of jobs completed
in less than 24 hours.

The full analysis is available here:

	http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=3246

Based on this analysis we set a preemption timeout of 24 hours.

The results is that users get their "dedicated" slots (quotas actually) and can
"opportunistically" use more than their quota.  When sufficient quota'd users
need slots, the opportunistic jobs are signaled that they should preempt with
a preemption time of 24 hours.  Since the above analysis shows that the typical
job duration is less than 24 hours, the jobs get to complete, and the cluster
reclaims the slot for the quota'd use.

-Keith.

At 10:14 AM -0500 8/5/13, Brian Bockelman wrote:
Hi Jerome,

Sorry, but I can't think of any nice way to implement this policy without preemption.

A "not nice" way to do this would be to set aside 20% of the slots as "A-only" and 20% of the slots as "B-only" via use of START expressions. Next, use RANK to make A and B jobs to prefer to use their dedicated slots first.

This is not very nice because it only approximates the policy you want (what happens if the A-only slots go offline for power maintenance?) and I'm pretty sure there are some side-cases that could cause priority inversion.

Hope this is nonetheless helpful!

Brian

On Aug 5, 2013, at 5:20 AM, Jerome Samson <jeromes@xxxxxx> wrote:

 Hello,

 I have a pool with dynamic group quota definition. We set up :
 GROUP_NAMES =  A, B
 GROUP_QUOTA_DYNAMIC_A = .8
 GROUP_QUOTA_DYNAMIC_B = .2

 GROUP_ACCEPT_SURPLUS = True

For specific reason we disabled condor's preemption (so when a job starts as 'surplus' it will take enough time to end correctly).

Thus when a group needs more ressources, the negotiator will find some available ressources in other groups, however I would like to prevent a a small portion of each group from being allocated.

For instance, when group B is running many jobs and A has no running tasks, it will get surplus from A but it could only get a max of 0.8 of the whole pool

 Is there a correct way to do so ?

 Jerome
 +33 (0) 1 53 06 25 37
 ---------------------
 BUF Compagnie
 139-141 Boulevard Ney
 75018 Paris
 ---------------------
 Pensez a l'environnement avant d'imprimer ce message
 _______________________________________________
 HTCondor-users mailing list
 To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
 subject: Unsubscribe
 You can also unsubscribe by visiting
 https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

 The archives can be found at:
 https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/