[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] controlling memory intensive jobs


I've got similar hardware to Ian (multiple boxes, 4 core x 2CPU, with 1
slot per processor) and I'd like to reconfigure the memory to in a
similar way, so...

Noob question: Can someone point me towards the part of the manual that
explains how to go about this? This'll be my first memory
reconfiguration so advice/details would also be appreciated.

[Condor 7.0.5 running on Rocks 5.1]


Health Protection Agency
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 09 November 2009 15:52
To: Condor-Users Mail List
Subject: Re: [Condor-users] controlling memory intensive jobs

> I am curious about your dynamic policies now. At our lab these servers
> are keep having memory problems .

Mag, Matt Hope answered with much of what I would have said so I won't
repeat it.

I don't actually use dynmaic partitioning. It's new and most of my farm
isn't running 7.2.x yet.

I too have a pretty good idea of how my jobs behave. They're all sliced
up in to nice memory buckets by our submission front end. And for the
most part, because it's part of the engineer's job here, everyone knows
just how much memory and CPU their jobs will need. The submission front
end we use ensures no one gets in to the system without a memory spec
and disk spec on their jobs. And if they don't specify something the
system slaps a default spec on their jobs that assumes it's using the
most of everything, so they quickly learn not to be lazy. Works wonders.

On my nodes I unbalance the static partitions to create slots that deal
with big memory jobs and slots that deal with small memory jobs. On my 4
core x 2 CPU machines I'll typically assign 1 processor to each slot but
4 of the slots will get 15.5% of the RAM and disk and the other 4 will
each get 9.5% of the RAM and disk. These numbers were arrived at after
some careful study of the jobs people run, how often they're wrong with
their memory guesses, and how we can best avoid out-of-memory problems
when pairing jobs on machines.

For the most part all of my jobs require one core and only one core. So
the 1:1 slot:core ratio works.

I have one more trick up my sleeve that we use for jobs that are
multi-threaded or multi-process that wants more than one core. Slot 1 on
my machines will vacate and set all the other slots on the box to Owner
if a job with the IsMultiThreaded=1 attribute lands in that slot.

So if a user needs a whole machine they can submit, targetting only slot
1 on machines, with the attribute IsMultiThreaded=1 set on the job and
they'll be ensured of obtaining the entire box when their job starts.
It's obviously a very destructive option and needs to be used with care
or you can end up killing forward progress on your non-MT jobs.

If you want the config snippets for the above setup let me know and I'll
try and get them into post shape. Actually, if you search the archives I
may have, at one point, posted them in a non-Altera farm format. :)

Hope that helps!

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise
protected from disclosure. If you are not the intended recipient, you
are hereby notified that any use, disclosure, dissemination,
distribution,  or copying  of this message, or any attachments, is
strictly prohibited.  If you have received this message in error, please
advise the sender by reply e-mail, and delete the message and any
attachments.  Thank you.

Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: 
The information contained in the EMail and any attachments is
confidential and intended solely and for the attention and use of
the named addressee(s). It may not be disclosed to any other person
without the express authority of the HPA, or the intended
recipient, or both. If you are not the intended recipient, you must
not disclose, copy, distribute or retain this message or any part
of it. This footnote also confirms that this EMail has been swept
for computer viruses, but please re-sweep any attachments before
opening or saving. HTTP://www.HPA.org.uk