[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Best practices for leaving cores/memory available for machine OS



Thanks, Todd and Rich. Sounds like Condor's defaults are reasonable. I think I'll stick with the default behavior for now, but if I encounter problems I'll use the configurations you mentioned to limit the cores and/or memory available to Condor.

Thanks,
Jesse

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Tuesday, June 02, 2015 1:07 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Best practices for leaving cores/memory available for machine OS

On 6/1/2015 4:06 PM, Jesse Farnham wrote:
> Hello,
>
> I'm running an HTCondor pool in which all machines are available to
> run tasks at all times, i.e., they do not become unavailable when the
> keyboard or mouse is moved on those machines. I'm interested to know
> what the best practice is for setting up resource slots on these
> machines. By default, HTCondor seems to create one slot per core and
> evenly divide the machine's memory among the slots. For example, one
> of the machines in the pool has 8 GB of RAM and 2 cores, and HTCondor
> created two slots on that machine, each with 1 core and 4 GB of RAM.
>
> My worry is that in the event that both of these slots are 100%
> utilized by user-submitted jobs, this leaves no cores or memory free
> for the operating system itself, the Condor daemons, etc. What is the
> standard practice here? Do HTCondor pool administrators typically
> customize the slot allocation on worker machines to leave 1 or 2 cores
> and some fraction of RAM free for the OS itself, or is HTCondor's
> default behavior of evenly dividing all the resources of the machine
> among the job slots considered to be a reasonable default?
>

The CPU / resident memory usage of the OS itself seems pretty minimal in practice, so imho HTCondor's default behavior is a reasonable default unless you are also explicitly running some other service(s) on these machines that are cpu and/or memory hungry.

If you wanted to make less memory available to HTCondor, perhaps because you knew your nodes were also running a service that used 20% of the memory (for instance, we have many nodes here that also run Squid reverse http proxy services), you could put the following in your condor_config file(s):

   # Only allow HTCondor to see/use 80% of the detected physical RAM
   MEMORY = $(DETECTED_MEMORY)*0.8

Similar recipe to decrease CPU core usage:

   # Keep HTCondor from seeing/using one CPU core
   NUM_CPUS = $(DETECTED_CORES)-1

Hope the above helps,
Todd


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

________________________________

Disclaimer: This e-mail may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please notify the sender immediately and destroy/delete this e-mail. You are hereby notified that any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly prohibited.

This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All information contained in this communication is not warranted as to completeness or accuracy and is subject to change without notice. Any comments or statements made in this communication do not necessarily reflect those of AQR Capital Management, LLC and its affiliates.