[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] portionable slots and greedy users



The short jobs can also run on long job slot. But a long job can't run on a short job slot. I already have a hold policy, if a job is marked short then it will automatically get held if it runs over 7 minus.

How can I do this or what is the preferred way:
* mark a job as short .Â+JobType = "short". Is this the preferred way?
* Short jobs have no quotas. They can run 100% of the pool if they like
* If a job doesn't haveÂ+JobType, assume its "Long" type of job
* Long jobs can only have 75% of the pool. Want to leave the rest of the pool empty for short users.
* I want to prioritize jobs which haveÂ+JobType="short"

The idea is I don't want users to hog up all the resources. I encourage people to run short jobs. Eventually, I will make the JobType="long" to 5% of the pool and no more.
How can I achieveÂthis


On Fri, Oct 6, 2023 at 2:46âPM Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> If you can't trust your users, hopefully the condor folks can offer a
> workable solution. I can't think of an easy way out, sorry.

    The first post is in the thread said that the long jobs were also
all characterized by having high memory requirements. You could write a
submit transform that matches whatever "high memory" means in this context
and inserts a concurrency limit. See

https://htcondor.readthedocs.io/en/latest/admin-manual/setting-up-special-environments.html#concurrency-limits

for details, but the idea is you set the maximum number of concurrently
running "high memory" jobs so that they can only use 95% of the pool.
(Maybe aim for 75% first and increase the limit as necessary? 95%
doesn't have a lot of slop...)

    That's if you want to reserve 5% of the pool for non-high-memory
jobs. This will waste capacity if you don't have "enough" of such jobs,
but should do a good job of ensuring small scheduling delays. If you
instead want the share of the pool's time spent running short jobs over
(roughly) the whole day to be 5%, you can use the same trick, but with
accounting groups instead of concurrency limits.

    You can also empirically determine shortness. Write submit
transform that sets allowed_job_duration (or allowed_execute_duration as
appropriate) to 300 and a periodic_release which removes the hold
automatically. The periodic_release can't change the value of
allowed_job_duration, but you can probably say something like:

allowed_job_duration = if( NumHolds == 0, 300, undefined )

instead of just "300". HoldReasonCode 46 (or 47) is reserved for
allowed_job_duration (or allowed_execute_duration) being exceeded, so the
periodic_release _expression_ should be easy to write.

-- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
--- Get your facts first, then you can distort them as you please.--