[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] controlling memory intensive jobs



I have no personal knowledge or experience of subdividing in this manner (I explored the idea hence know how to configure it)

I would make some general notes:

Iif you can achieve this without dynamic, don't use dynamic. It is new, less tested and has inherent flaws with respect to optimal packing of jobs. Going for a less general solution may often result in better performance and utilization in your own person pool for the time being.

At my place of work the stream of jobs neatly falls into memory hungry, long running jobs (days) and memory light (well < 3GB) short running jobs (hours).

Since these different jobs come from two different teams we simply partition our SMP based farm such that there are a smaller number of 'Fat' slots advertising quite a lot of memory and more 'slim' slots advertising less memory each. Every slot gets just the one core (for now - we have some spare capacity in that regard) and people target only those slots they can run on. The fat slots will kick any slim jobs that are on them if a fat job wants to run there. In this way we achieve maximal throughput for each team if they are the only users and maximal shared throughput (less pre-emption overheads) when co-existing.

We use windows job objects api to prevent a job ever breaching its slot advertised limits and screwing up any other jobs.

Such a setup is easy to manage and control, especially for reporting purposes just not as flexible.

> Lets say I have a very cpu intensive program which takes up 8 cores on
> a 16 core server how what would you recommend to be a good setting
> with dynamic slot?

The choice of cpu cores verses memory should be orthogonal unless you have NUMA machines with a NUMA aware operating system. Sadly condor will not help you there, you must understand your architecture to achieve the best partitioning.

Simply partitioning into either 2 slots, each 50/50 or one big slot with 50% and 8/4/2 etc slots with the remainder split between them sounds sensible but it is dependant on the jobs you tend to run and the acceptable latency/throughpu8t trade off you desire.

Tuning for optimality is something that requires considerable knowledge of the specific problem domain I'm afraid, hope the general stuff above helps give you ideas

Matt

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mag Gam
Sent: 09 November 2009 12:01
To: Condor-Users Mail List
Subject: Re: [Condor-users] controlling memory intensive jobs

Thankyou Matt.

Lets say I have a very cpu intensive program which takes up 8 cores on
a 16 core server how what would you recommend to be a good setting
with dynamic slot?



On Mon, Nov 9, 2009 at 4:56 AM, Matt Hope <Matt.Hope@xxxxxxxxxxxxxxx> wrote:
> The idea is that you have one 'true' slot (which is partitionable) which is defined in config and that then is capable of forming multiple dynamic slots.
>
> The SLOT_TYPE_X_PARTITIONABLE would allow the more compelx case where you might want, say 3 slots, and two of them to be static.
>
> thus, on an 8 core box,
>
> # only need a small amount of memory and one core but should be always available
> SLOT_TYPE_1 = cpus=1, ram=256
> NUM_SLOTS_TYPE_1 = 2
> # the partionable slot, I could leave all these at auto if need be
> SLOT_TYPE_2 = cpus=6, memory=auto
> SLOT_TYPE_2_PARTITIONABLE = True
> NUM_SLOTS_TYPE_2 = 1
>
> In the case you describe you don't need to bother with any of this, you would appear to only need one partitionable slot and that's it.
>
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mag Gam
> Sent: 07 November 2009 03:41
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] controlling memory intensive jobs
>
> Ian:
>
> I am curious about your dynamic policies now. At our lab these servers
> are keep having memory problems .
>
> I looked at http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.1/html/Grid_User_Guide/chap-Grid_User_Guide-Dynamic_provisioning.html
> and tried to setup dynamic provision.
>
> I have a question about, SLOT_TYPE_X_PARTITIONABLE
>
> What is "X" ? Do i need to do do this in my configuration?
>
> SLOT_TYPE_0_PARTITIONABLE
> SLOT_TYPE_1_PARTITIONABLE
> SLOT_TYPE_3_PARTITIONABLE
> SLOT_TYPE_4_PARTITIONABLE
> SLOT_TYPE_5_PARTITIONABLE
> ...
> SLOT_TYPE_15_PARTITIONABLE
>
> for a 15 core box?
>
> Also, do I also need to do this:
> PartitionableSlot=TRUE
>
> I have done condor_status -l  but I don't see DynamicSlot=TRUE
>
>
> Any thoughts?
>
> TIA
>
> On Thu, Sep 24, 2009 at 9:37 AM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>>> We have 10 servers which have 64GB of memory with 16 cores. We don't
>>> want to have people to run all of their memory intensive jobs at once
>>> since it would crash the box. What do condor admins typically do to
>>> control this? so only 10 jobs runs on 10 different servers?
>>
>> I make all my users tell me up front how much memory their job needs to
>> run. It's a rough guess, but enough to make sure Condor doesn't schedule
>> too many memory intensive jobs on my machines. In the back end I bin the
>> memory request so jobs are in one of 5 memory size estimate buckets.
>> This makes them easier to deal with when planning machine setups. I
>> don't allocate my machine resources evenly across slots. I unbalance
>> them unpurpose to service the 5 bins of memory requirements accordingly.
>>
>> It can be less efficient if all the jobs in your queue are in the
>> largest memory bin -- you end up with slots that are allocated with too
>> little memory to run these going unused. But it's better than having
>> jobs fail. And it'll hold until dynamic machine partitioning is
>> mainstream in Condor.
>>
>> - Ian
>>
>> Confidentiality Notice.
>> This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
> ----
> Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
> The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
> All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
> Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
> ----
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/