[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Multi-Threaded Jobs on Condor
- Date: Fri, 14 Sep 2007 07:25:32 +0100
- From: "Simon Hammond" <simon.hammond@xxxxxxxxx>
- Subject: Re: [Condor-users] Multi-Threaded Jobs on Condor
I'll have a look into this (and probably end up posting for more
details!). Thanks again.
University of Warwick
On 14/09/2007, Cecile Garros <cecile@xxxxxxxxxxxxxxxxx> wrote:
> We faced the same problem recently. It seems one solution is to set more
> job slots than the actual number of CPU. For your dual-core boxes you
> would have:
> * 2 slots of type 1, with type 1: cpus=1, ram=50%
> * 1 additional slot of type 2, with type 2: cpus=2, ram=100%
> To make this possible you have to "lie" to Condor and to add the
> following attributes in your local configuration file:
> * NUM_CPUS=4
> * MEMORY=2*ActualMemory
> Then you have to specify that a job should not start on a type1 slot if
> a type 2 slot is in use and vice versa.
> Finally you can tag your multi-threaded jobs to run on type 2 slots.
> Ordinary jobs should run on type 1 slots as a default.
> A good summary about what has been done on that- with interesting links
> as well- is here:
> Hope this helps,
> >> We have a small cluster of dual processor nodes. We want to be able
> >> to submit jobs which contain a multi-threaded code through Condor.
> >> Ideally, we want the job to claim both processors on the node - if we
> >> use slots then the allocate can claim two slots on different nodes.
> >> Is there anyway to specify this in the parallel job submission file
> >> so that both 'slots' on the same node are claimed correctly?
> > Si,
> > Right now there's no way to accomplish this without preemption (or
> > suspension). That is, you can't have Condor hold a slot free while the
> > other slot is running a non-parallel job so a parallel job in the queue
> > gets the whole machine.
> > What you can do is set up a machine so when a job tagged as being
> > "parallel" wants the machine it a) always runs in slot 1; and b) always
> > preempts the running job in slot2; and c) always sets the START
> > expression for slot2 to false when it's on the machine.
> > It's not an ideal solution but it's the only way to achieve this right
> > now. You could use suspension if you're not on Windows instead of
> > preemption for the job in slot2 which makes things a little better. Or
> > checkpointing so you don't loose forward progress from the job in slot2.
> > If you search the archives you'll find a thread about setting up
> > complicated inter-slot start expressions that have jobs suspending and
> > preempting jobs in other slots. I can't remember the title now. Sorry.
> > :( Maybe one of the Condor guys can jump in with a pointer to the
> > complicated setup. It was a university who was doing it, they gave a
> > talk at Condor Week a few years ago about it.
> > - Ian
> > Confidentiality Notice. This message may contain information that is confidential or otherwise protected from disclosure.
> > If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
> > or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error,
> > please advise the sender by reply e-mail, and delete the message and any attachments. Thank you.
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> Cecile GARROS
> Solution Consultant
> Software Development Division
> Best Systems, Inc
> Phone: 029-860-7080
> Fax: 029-860-7081
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: