[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] various questions about dynamic provisioning

On 7/12/2012 5:50 AM, Carsten Aulbert wrote:
Hi all

we are running dynamic slots on many machines (mostly 7.6.x but should
migrate to 7.8 soon to get the condor_defrag daemon.

But I have a couple of question regarding dynamic slots which so far we
have not been able to solve:

I will take a pass giving some answers, esp pointing out some new functionality in v7.8+ that should make it easier than in v7.6...

*** Memory limits/RequestMemory/ImageSize
We run a user wrapper script which sets ulimits based on RequestMemory
or for the few machines which still are running static slots based on
the slot settings.

FYI, as of Condor v7.9.0 (currently in beta testing, public release scheduled first week of August), setting a ulimit like this is supported directly inside of Condor, no need for a wrapper script. The config knob name is STARTER_RLIMIT_AS, and allows you to specify setrlimit as a classad expression w/ access to both the machine and job ad attributes.

Of course, you know a limitation of ulimits in general is they limit the size of each process in a user's job, NOT the collective size of all processes in the job....

Actually, we use 110% of this limit. This should
prevent users to exceed their allocated share.

However, we face the problem that if a job fails and goes back to the
queue the ImageSize is larger then RequestMemory and then jobs will be
scheduled to run on a host by the negotiator, the target machine will
partition off the wanted slot, but will fail to start the job as the
ImageSize is too large. The only hint we see here is something along the
lines of "Job requirements not met" in the StarterLog.slot1_x.

The user only sees the job being idle and condor_q -b telling her that
many machines potentially match the job.

Is there a way to either tell the user what the problem is or change our
way of requesting memory? E.g. RequestMemory should be the maximum of
(0.0011 * ImageSize and a statically given number)?

I know if v7.8.0+, there should be no problem with RequestMemory being a classad expression - just make it an ifthenelse(). Also in v7.8.0+, you can setup a default request_memory value for use by condor_submit in the condor_config file, or perhaps even better for what you want, you can tell the startd to modify the incoming request_memory request via the following config knob:

A boolean expression used by the condor_startd daemon to modify the evaluated value of the RequestMemory job ClassAd attribute, before it used to provision a dynamic slot. The default value is given by
      quantize(RequestMemory,{TotalSlotMemory / TotalSlotCpus / 4})

I.e. you could put in the condor_config on your execute nodes:

MODIFY_REQUEST_EXPR_REQUESTMEMORY = ifthenelse(RequestMemory > 0.011 * ImageSize,RequestMemory,0.011 * ImageSize)

Note that in v7.8.0+, instead of using ImageSize, I suggest you use the attribute MemoryUsage. MemoryUsage is an integer expression in units of Mbytes (warning: recall that ImageSize is Kbytes) that represents the peak memory usage for the job. MemoryUsage will always be set by default to what we feel is the most "accurate" metric for stating how much physical RAM a job is using (all processes in a job). The default for MemoryUsage is to total up the ResidentSetSize for each process in the job (but of course there is a knob to change it).

*** How to schedule a limited amount of jobs per execute node

In the good ol' days with static slots, you could add a requirement on
the slot number if you had jobs which were really heavy on the local
scratch disk. However, if you added something like

Requirements = strcmp("slot1_1@.*",RemoteHost)

due to obvious reasons (not being split off while being negotiated).

Is there a way to achieve this? I've looked at concurrency limits but so
far failed to find a good idea how to utilize for this scenario.

The Good news: So in v7.6/v7.8, the partitionable resources are hard-coded to be CPU, Disk, and Memory. In v7.9.0+ these resources are no longer hard-coded, so dealing w/ your scenario above it is trivial, since we added a first-pass at generic partitionable resources. See

So in v7.9.0, if you wanted to limit the number of heavy local io jobs to just two per machine, you enter the following into your condor_config:

# include a custom "localio" limit that can also be partitioned:
SLOT_TYPE_1 = cpus=100%,disk=100%,swap=100%,localio=2

and inside your job submit file you can not only include the usual

request_mem = <int>
request_cpu = <int>
request_disk = <int>

but you could also include

request_localio = <int>

Cool eh?  Perhaps this will convince you to run a v7.9.x Condor release? :)

The Bad news: I cannot think of an easy way to do the above in v7.8.0. I am guessing you are mainly using partitionable slots for CPU and Memory? If so, perhaps as a hack you could achieve the above by (ab)using request_disk. For instance, to limit just two heavy local io jobs per machine, if your machine has 10GB of disk space, you could claim your heavy local io jobs need 4 GB of disk and other jobs just need 50 MB or whatever. Another idea is concurrency limits, but you'd have to make one limit per machine in your pool. Yuck.

PS: Inconsistency in the condor manual:


talks about request_{cpus,memory,disk)

while the underscore is missing here:


Actually, the first instance is talking about what goes into your condor_submit job submit file, and the second is talking about the job classad attribute. The job classad attribute is "RequestMemory". In condor_submit, you can say either "request_memory=X" or "RequestMemory=X".

Hope the above helps,
best regards from Madison,