[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] controlling memory intensive jobs



The idea is that you have one 'true' slot (which is partitionable) which is defined in config and that then is capable of forming multiple dynamic slots.

The SLOT_TYPE_X_PARTITIONABLE would allow the more compelx case where you might want, say 3 slots, and two of them to be static.

thus, on an 8 core box, 

# only need a small amount of memory and one core but should be always available
SLOT_TYPE_1 = cpus=1, ram=256
NUM_SLOTS_TYPE_1 = 2
# the partionable slot, I could leave all these at auto if need be
SLOT_TYPE_2 = cpus=6, memory=auto 
SLOT_TYPE_2_PARTITIONABLE = True
NUM_SLOTS_TYPE_2 = 1

In the case you describe you don't need to bother with any of this, you would appear to only need one partitionable slot and that's it.


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mag Gam
Sent: 07 November 2009 03:41
To: Condor-Users Mail List
Subject: Re: [Condor-users] controlling memory intensive jobs

Ian:

I am curious about your dynamic policies now. At our lab these servers
are keep having memory problems .

I looked at http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.1/html/Grid_User_Guide/chap-Grid_User_Guide-Dynamic_provisioning.html
and tried to setup dynamic provision.

I have a question about, SLOT_TYPE_X_PARTITIONABLE

What is "X" ? Do i need to do do this in my configuration?

SLOT_TYPE_0_PARTITIONABLE
SLOT_TYPE_1_PARTITIONABLE
SLOT_TYPE_3_PARTITIONABLE
SLOT_TYPE_4_PARTITIONABLE
SLOT_TYPE_5_PARTITIONABLE
...
SLOT_TYPE_15_PARTITIONABLE

for a 15 core box?

Also, do I also need to do this:
PartitionableSlot=TRUE

I have done condor_status -l  but I don't see DynamicSlot=TRUE


Any thoughts?

TIA

On Thu, Sep 24, 2009 at 9:37 AM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>> We have 10 servers which have 64GB of memory with 16 cores. We don't
>> want to have people to run all of their memory intensive jobs at once
>> since it would crash the box. What do condor admins typically do to
>> control this? so only 10 jobs runs on 10 different servers?
>
> I make all my users tell me up front how much memory their job needs to
> run. It's a rough guess, but enough to make sure Condor doesn't schedule
> too many memory intensive jobs on my machines. In the back end I bin the
> memory request so jobs are in one of 5 memory size estimate buckets.
> This makes them easier to deal with when planning machine setups. I
> don't allocate my machine resources evenly across slots. I unbalance
> them unpurpose to service the 5 bins of memory requirements accordingly.
>
> It can be less efficient if all the jobs in your queue are in the
> largest memory bin -- you end up with slots that are allocated with too
> little memory to run these going unused. But it's better than having
> jobs fail. And it'll hold until dynamic machine partitioning is
> mainstream in Condor.
>
> - Ian
>
> Confidentiality Notice.
> This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----