[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] dynamic allocation of RAM



On 03/15/2016 08:24 AM, Thomas Hartmann wrote:

> 2. handle RAM allocations more dynamically. for instance:
> 2.1. if a job wants to use more RAM than previously requested, see
> whether the machine on which it runs still has this amount of RAM
> available.
> 2.2. if it does, update the request_memory to a safe value and continue
> running the job.
> 2.3. if the extra RAM is not available, stop the job, update the
> request_memory to a safe value and put it back into the queue.

Courtesy of Lauren Michael:

> 2) The below lines added to the submit file will allow the jobs to
> self-police MemoryUsage, and will adjust the memory request in response
> (though "request_memory" would need to be replaced in the submit file, not
> added).
> +MemoryUsage = ( 800 ) * 2 / 3
> request_memory = ( MemoryUsage ) * 3 / 2
> periodic_hold = ( MemoryUsage >= ( ( RequestMemory ) * 3 / 2 ) )
> periodic_release = (JobStatus == 5) && ((CurrentTime -
> EnteredCurrentStatus) > 180) && (HoldReasonCode != 34)
> 
> These lines essentially say:
> Set the "request_memory" ("RequestMemory" in the job classad) to be a
> function of MemoryUsage, and artificially set the MemoryUsage to an initial
> value (800 MB * 2/3).
> Put the job on hold if the (real) MemoryUsage goes 50% above the current
> RequestMemory value.
> Release the held job (if held for the memory reason, and held for at least
> 3 minutes), so that it will be matched to run again on a compute "slot"
> with more memory (according to the new RequestMemory value).

We removed "HoldReasonCode != 34" and added "periodic_remove = (time() -
QDate) > 500000" and have been running those jobs for quite some time.
What I can't tell you is how many of them actually use that magic: I
won't dig into that until things break and so far they haven't. (Most of
those jobs run in under 800MB.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature