[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] controlling memory intensive jobs



> We have 10 servers which have 64GB of memory with 16 cores. We don't
> want to have people to run all of their memory intensive jobs at once
> since it would crash the box. What do condor admins typically do to
> control this? so only 10 jobs runs on 10 different servers?

I make all my users tell me up front how much memory their job needs to
run. It's a rough guess, but enough to make sure Condor doesn't schedule
too many memory intensive jobs on my machines. In the back end I bin the
memory request so jobs are in one of 5 memory size estimate buckets.
This makes them easier to deal with when planning machine setups. I
don't allocate my machine resources evenly across slots. I unbalance
them unpurpose to service the 5 bins of memory requirements accordingly.

It can be less efficient if all the jobs in your queue are in the
largest memory bin -- you end up with slots that are allocated with too
little memory to run these going unused. But it's better than having
jobs fail. And it'll hold until dynamic machine partitioning is
mainstream in Condor.

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.