Hi Todd, this config does the trick. Thanks a lot. I agree, preemption would do the better job. But it only makes sense for short running jobs. Otherwise it's a waste of resources and time (if jobs are not able to checkpoint). And that's our dilemma. Usually a job - most of them run in vanilla universe - lasts for several hours or even days, so preemption is not really an option. Our users will not be amused, if their jobs get killed after a certain amount of runtime or just if the jobs would have been almost finished. So, running this solution of a "default quota" per user is only a disadvantage if the resources of the pool are not totally occupied. That's right. But one can't have everything :) Nevertheless HTCondor is the best solution for non-dedicated clusters :) Werner On 02/11/2015 10:02 PM, Todd Tannenbaum wrote: > On 2/10/2015 12:52 PM, Werner Hack wrote: >> Hi all, >> >> I tried to limit the number of jobs running per user via a requirement definition >> as described here: https://gist.github.com/dberzano/9995356 >> >> Maybe this only works if static slots are used. But for partitionable slots it does not work in this >> way. >> Or maybe I missed to configure something else? >> >> The manual writes: >> SubmitterUserResourcesInUse: The integer number of slots currently utilized by the user submitting >> the candidate job. >> >> How is SubmitterUserResourcesInUse handled for partitionable slots? The same way? >> Or is there a better way to set a simple user quota for running jobs >> so that one user can not occupy the whole pool for a longer time? >> >> Any hint will be appreciated >> Best >> Werner >> > > Hi Werner, > > I looked at your config settings on github. My guess is it fails to work with paritionable slots > because CLAIM_PARTITIONABLE_LEFTOVERS is True by default. What this means is the negotiator will > match a partitionable slot (pslot) with a job, and give the pslot to the schedd. The schedd will > then run as many jobs as possible on that pslot until some pslot resource (like cpu cores) is > exhausted. The result is your so-called normal users could run many jobs even if you only want > them able to run one job (although they could only use one machine). I think it will work as you > envisioned with partitionable slots if you add the following to your condor_config (on all your > execute machines): > > # Turn off claiming leftover resources by the schedd > # so that our quota via requirements magic works > CLAIM_PARTITIONABLE_LEFTOVERS = False > > # Optionally turn on consumption_policy mechanism so more than > # one job can be matched per negotiation cycle to a > # machine considering that CLAIM_PARTITIONABLE_LEFTOVERS > # is disabled above. > CONSUMPTION_POLICY = True > > You can read about these knobs in the Manual, and/or you may find the following wisdom on the Wiki > enlightening: > https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ConsumptionPolicies > > Finally, the whole concept of placing a quota on the number of jobs a user can run seems against > the idea of high-throughput computing.... Instead I'd suggest a policy that simply gives > power-users a better priority, and allow users with a better priority to preempt jobs submitted by > users with a worse priority. This way, if a poor Todd is a normal user and nobody else even wants > to use the pool, Todd isn't limited to just a few jobs for no good reason... Of course, I > understand that some jobs (especially non idempotent jobs that have side-effects like creating > records in database) don't like to be preempted, but I'd argue that is pretty rare. > > Hope this helps, > Todd > >> >> >> _______________________________________________ >> HTCondor-users mailing list >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/htcondor-users/ >> > > --
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature