[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting Condor to observe slot limits?




Condor does not (yet) support setrlimit. You can still do this inside of USER_JOB_WRAPPER if you wish.

PREEMPT _can_ discriminate between a heterogeneous set of slot definitions. For example the machine attributes Memory, VirtualMemory, and Disk in this expression refer to the attribute of the particular slot that is running the job, whereas TotalMemory, TotalVirtualMemory, and TotalDisk refer to properties of the whole machine. Prior to 6.9.5, DiskUsage was not updated in the copy of the job ClassAd that was used to evaluate PREEMPT. Now, for vanilla universe jobs, DiskUsage is updated, so you can PREEMPT based on disk usage as well as virtual memory usage.

--Dan

Mark Calleja wrote:

Hi,

I recently came across a very interesting condor-admin thread from a few years ago that raised the issue of having Condor enforce slot (then VM) limits:

http://www.cs.wisc.edu/condor/ligo-tickets/12616.html

Towards the end of that exchange, one of the Condor developers, in response to a user's comment that it would be useful for Condor to police these limits using setrlimit, writes:

"Agreed--I'm just trying hard to find a short-term solution that can hold you over while we improve things."

With no intention of putting anyone on the spot, can I ask whether this ever led to anything concrete? Speaking personally, I'd be very keen to see such a solution implemented since in our flocked environment the execute nodes cannot depend on cooperation from the submit hosts (e.g. using periodic removes) to perform the policing. I realise that one can go some way at achieving this using PREEMPT expressions, e.g. Todd's comments in https://lists.cs.wisc.edu/archive/condor-users/2007-November/msg00156.shtml, but that doesn't seem to be able to discriminate between a heterogeneous set of slot definitions on a multi-processor machine, e.g. we allow some slots more resources than others. Also, it would be nice to be able to enforce all of the resources in a slot definition, e.g. disk usage.

Cheers,
Mark