[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Best practices for leaving cores/memory available for machine OS



On 6/1/2015 4:06 PM, Jesse Farnham wrote:
Hello,

I’m running an HTCondor pool in which all machines are available to run
tasks at all times, i.e., they do not become unavailable when the
keyboard or mouse is moved on those machines. I’m interested to know
what the best practice is for setting up resource slots on these
machines. By default, HTCondor seems to create one slot per core and
evenly divide the machine’s memory among the slots. For example, one of
the machines in the pool has 8 GB of RAM and 2 cores, and HTCondor
created two slots on that machine, each with 1 core and 4 GB of RAM.

My worry is that in the event that both of these slots are 100% utilized
by user-submitted jobs, this leaves no cores or memory free for the
operating system itself, the Condor daemons, etc. What is the standard
practice here? Do HTCondor pool administrators typically customize the
slot allocation on worker machines to leave 1 or 2 cores and some
fraction of RAM free for the OS itself, or is HTCondor’s default
behavior of evenly dividing all the resources of the machine among the
job slots considered to be a reasonable default?


The CPU / resident memory usage of the OS itself seems pretty minimal in practice, so imho HTCondor's default behavior is a reasonable default unless you are also explicitly running some other service(s) on these machines that are cpu and/or memory hungry.

If you wanted to make less memory available to HTCondor, perhaps because you knew your nodes were also running a service that used 20% of the memory (for instance, we have many nodes here that also run Squid reverse http proxy services), you could put the following in your condor_config file(s):

  # Only allow HTCondor to see/use 80% of the detected physical RAM
  MEMORY = $(DETECTED_MEMORY)*0.8

Similar recipe to decrease CPU core usage:

  # Keep HTCondor from seeing/using one CPU core
  NUM_CPUS = $(DETECTED_CORES)-1

Hope the above helps,
Todd