[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] dynamic slots - over subscription



This is a mismatch between what people say they need and what they actually use.

This solution is not trying to find out who is breaking the rules, it just forces them to play by the rules.

The cpu affinity simply ensures they only get what they asked for, not what they try to use, it won't kill or prevent jobs from running multiple threads or processes but it will limit them to only run on a subset of the cpu cores. Doing it for sub processes might be tricky on linux (though fork() is handled cleanly), on windows you would use job objects for this.

Linux has the same basic CPU affinity stuff as windows http://linux.die.net/man/2/sched_setaffinity (from 2.5.8 onward) though there may be issues as to the privileges required to set it, it's actually a per thread setting (and if you can set it they could write code to unset it) but assuming error/mis-understanding on the part of your users rather than active malfeasance this should work.

I'm really not a linux user though so this is all supposition.

Matt

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mag Gam
Sent: 13 January 2010 13:00
To: Condor-Users Mail List
Subject: Re: [Condor-users] dynamic slots - over subscription

Thanks Matt for your reply (and other replies).

We are running in a Linux environment and the only way to get the
offending user is to log into the box and see who is running. I wish
there was an easier way to accomplish this. I though the whole point
of dynamic slots is so people won't over subscribe for resources.

I suppose I can always do this, right?

START = $(LoadAvg) < $(NUM_CPUS)

This should prevent users to have any more jobs being submitted here.




On Wed, Jan 13, 2010 at 7:17 AM, Matt Hope <Matt.Hope@xxxxxxxxxxxxxxx> wrote:
> enforce their choice :)
>
> Set the cpu affinity of the jobs to match their request (tricky with dynamic partitions since you need to work out a way to dynamically partition the mask (having condor do this for you would be ideal).
>
> On windows I'd write a little wrapper that asked a local service for a mask (and that, on being asked checked for the liveness of all the currently active masks to see which, if any had departed to free up nodes in the mask). User job wrappers to ensure your cpu affinity is always applied on start up.
>
> If any one tries to work around this either a) stamp on them hard, b) move to job objects and clamp their memory usage as well.
>
> You could try to make this fancy (by trying to be NUMA friendly where possible).
>
> By effectively penalising people that do it wrong you would likely find that people moved towards getting it right.
>
> Matt
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mag Gam
> Sent: 13 January 2010 11:41
> To: Condor-Users Mail List
> Subject: [Condor-users] dynamic slots - over subscription
>
> Lets say I have a process which takes up 4 CPus, and I have 10x16 core
> servers with 64G of memory.
>
> To get more of my jobs to run I do:
> #This should be 4
> RequestCpus = 1
>
> How can I prevent users to do this?  This is clearly create extra load
> on the servers.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
> ----
> Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
> The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
> All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
> Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
> Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
> ----
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/