[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Best Practice method to limit number of slots on a machine



Hi Lyle,

Something we do here (using partitionable slots) is define a custom "IO heavy" resource on each execution point, and so only a certain amount of jobs (in our case, 1 job) that request an IO heavy resource can run on a machine before the partitionable slot is exhausted of that resource:

MACHINE_RESOURCE_IoHeavy = 1

Now, only one job that has "request_ioheavy = 1" in their submit file (or RequestIoHeavy=1 in their job ad) can run on each machine.

This can be tweaked using DETECTED_CPUS as Christoph and Thomas have noted if you want to scale this based on the cores you have on the execution point:

MACHINE_RESOURCE_IoHeavy = INT(0.5 * $(DETECTED_CPUS))

Testing this in a Docker container using 4 cores:

[root@55362c5001d1 /]# condor_status -af:h Name SlotType TotalCpus TotalIoHeavy
Name        SlotType   ÂTotalCpus       TotalIoHeavy
slot1@55362c5001d1 Partitionable 4.0 Â Â Â Â Â Â Â Â Â 2


Now if I have two IO heavy jobs with one requesting 2 CPUs and one requesting 1 CPU, the 2 CPUÂjob won't exhaust the CPU resources, and both can run on this machine.

If all of your jobs are indeed IO heavy, you could then add a submit transform to your access points that puts "RequestIoHeavy = 1" in all job ads so thatÂyour users don't have to specify "request_ioheavy = 1" in all of their jobs.

Jason Patton

On Fri, Sep 9, 2022 at 3:58 AM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
Hi Lyle,

to be on the safe side, you can try the int() function [1]

Cheers,
 ÂThomas

[1]
https://htcondor.readthedocs.io/en/latest/man-pages/classads.html?highlight=int()#term-integer-INT-expr

On 09/09/2022 09.03, Lyle Pakula wrote:
> Hi Christoph,
>
> Yes the issue is the number of CPU's varies across the cluster so that,
> using a hard cpu limit, would tend to saturate the weaker machines while
> leaving the powerful machines underutilised.
>
> Indeed ROUND would solve that problemÂ- i'll give it a try and see what
> condor does!
>
> Thanks Lyle
>
>
>
> On Fri, Sep 9, 2022 at 3:47 PM Beyer, Christoph <christoph.beyer@xxxxxxx
> <mailto:christoph.beyer@xxxxxxx>> wrote:
>
>Â Â ÂHi Lyle,
>
>Â Â Âyou can limit the number of CPUs that are used to create slots
>Â Â Â(that's if you use partitionable slots) using:
>
>Â Â ÂNUM_CPUS -> lie about the number of detected cpus
>Â Â ÂMAX_NUM_CPUS -> limit the max number of cpus
>
>Â Â ÂThe numbers should be integer - hence I am not sure if something
>Â Â Âlike '0,75 *Â $(DETECTED_CPUS)' will work - you have to try it I
>Â Â Âguess ;)
>
>Â Â ÂIf you have static slots it is very easy, just create less slots ...
>
>Â Â ÂBest
>Â Â Âchristoph
>
>
>Â Â Â--
>Â Â ÂChristoph Beyer
>Â Â ÂDESY Hamburg
>Â Â ÂIT-Department
>
>Â Â ÂNotkestr. 85
>Â Â ÂBuilding 02b, Room 009
>Â Â Â22607 Hamburg
>
>Â Â Âphone:+49-(0)40-8998-2317
>Â Â Âmail: christoph.beyer@xxxxxxx <mailto:christoph.beyer@xxxxxxx>
>
>Â Â Â------------------------------------------------------------------------
>Â Â Â*Von: *"Lyle Pakula" <Lyle@xxxxxxxxxxxxxxxx
>Â Â Â<mailto:Lyle@xxxxxxxxxxxxxxxx>>
>Â Â Â*An: *"HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx
>Â Â Â<mailto:htcondor-users@xxxxxxxxxxx>>
>Â Â Â*Gesendet: *Freitag, 9. September 2022 00:43:01
>Â Â Â*Betreff: *[HTCondor-users] Best Practice method to limit number of
>Â Â Âslots on aÂÂÂÂÂÂÂÂmachine
>
>Â Â ÂHi Everyone,
>Â Â ÂI'm wondering what people's best practice is to easily limit the
>Â Â Ânumber of jobs running on a machine as not to saturate that machine.
>
>Â Â ÂOur jobs are heavy I/O rather than compute. So when a job is
>Â Â Âsubmitted we try to spread it across the cluster using the below rank
>
>Â Â Â# Define Load Balancing on the AE Pool
>Â Â ÂNEGOTIATOR_PRE_JOB_RANK=( \$(NEGOTIATOR_PRE_JOB_RANK) ) * SlotID
>
>Â Â ÂBut big jobs will still saturate. So what's the best way to limit
>Â Â Âthe number of slots use on a machine, noting our machines vary from
>Â Â Â4-8 cores. Ideally we would limit the number of slots ot sayÂ75% of
>Â Â Âwhat the max is. Below is a bit heavy handed.
>
>Â Â Â# DON'T COUNT HYPERTHREADED CPUS AS THIS LEADS TO SATURATION
>Â Â ÂCOUNT_HYPERTHREAD_CPUS=FALSE
>
>Â Â ÂThanks, Lyle
>
>Â Â Â--
>Â Â ÂAE CAPITAL
>Â Â Â15 William St, Level 19, Melbourne VIC, Australia
>
>Â Â Âp +61 3 9020 7801
>Â Â Âm +61 (0)434 872 054
>Â Â Âw http://www.aecapital.com.au <http://www.aecapital.com.au>
>
>
>Â Â ÂAE Capital Pty Limited (ACN 153 242 865) is regulated by the
>Â Â ÂAustralian Securities & Investments Commission and is a Corporate
>Â Â ÂAuthorised Representative of JFM Pty Limited (ACN 125 150 656),
>Â Â Âholder of an Australian Financial Services Licence (AFSL 314585).
>Â Â ÂAE Capital Pty Limited is a member of the National Futures
>Â Â ÂAssociation (ID 0498660).
>
>Â Â Â_______________________________________________
>Â Â ÂHTCondor-users mailing list
>Â Â ÂTo unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>Â Â Âsubject: Unsubscribe
>Â Â ÂYou can also unsubscribe by visiting
>Â Â Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>Â Â Â<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>
>Â Â ÂThe archives can be found at:
>Â Â Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
>Â Â Â<https://lists.cs.wisc.edu/archive/htcondor-users/>
>Â Â Â_______________________________________________
>Â Â ÂHTCondor-users mailing list
>Â Â ÂTo unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>Â Â Âsubject: Unsubscribe
>Â Â ÂYou can also unsubscribe by visiting
>Â Â Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>Â Â Â<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>
>Â Â ÂThe archives can be found at:
>Â Â Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
>Â Â Â<https://lists.cs.wisc.edu/archive/htcondor-users/>
>
>
>
> --
> AE CAPITAL
> 15 William St, Level 19, Melbourne VIC, Australia
>
> p +61 3 9020 7801
> m +61 (0)434 872 054
> w http://www.aecapital.com.au <http://www.aecapital.com.au>
>
>
> AE Capital Pty Limited (ACN 153 242 865) is regulated by the Australian
> Securities & Investments Commission and is a Corporate Authorised
> Representative of JFM Pty Limited (ACN 125 150 656), holder of an
> Australian Financial Services Licence (AFSL 314585). AE Capital Pty
> Limited is a member of the National Futures Association (ID 0498660).
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/