[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Multi-Threaded Jobs on Condor



Thanks guys,

I'll have a look into this (and probably end up posting for more
details!). Thanks again.



Si Hammond
University of Warwick

On 14/09/2007, Cecile Garros <cecile@xxxxxxxxxxxxxxxxx> wrote:
> Si,
>
> We faced the same problem recently. It seems one solution is to set more
> job slots than the actual number of CPU. For your dual-core boxes you
> would have:
>
>    * 2 slots of type 1, with type 1: cpus=1, ram=50%
>    * 1 additional slot of type 2, with type 2: cpus=2, ram=100%
>
> To make this possible you have to "lie" to Condor and to add the
> following attributes in your local configuration file:
>
>    * NUM_CPUS=4
>    * MEMORY=2*ActualMemory
>
> Then you have to specify that a job should not start on a type1 slot if
> a type 2 slot is in use and vice versa.
>
> Finally you can tag your multi-threaded jobs to run on type 2 slots.
> Ordinary jobs should run on type 1 slots as a default.
>
> A good summary about what has been done on that- with interesting links
> as well- is here:
> https://lists.cs.wisc.edu/archive/condor-users/2007-June/msg00295.shtml
>
> Hope this helps,
> -Cecile
>
>
> >> We have a small cluster of dual processor nodes. We want to be able
> >> to submit jobs which contain a multi-threaded code through Condor.
> >>
> >> Ideally, we want the job to claim both processors on the node - if we
> >> use slots then the allocate can claim two slots on different nodes.
> >>
> >> Is there anyway to specify this in the parallel job submission file
> >> so that both 'slots' on the same node are claimed correctly?
> >>
> >
> > Si,
> >
> > Right now there's no way to accomplish this without preemption (or
> > suspension). That is, you can't have Condor hold a slot free while the
> > other slot is running a non-parallel job so a parallel job in the queue
> > gets the whole machine.
> >
> > What you can do is set up a machine so when a job tagged as being
> > "parallel" wants the machine it a) always runs in slot 1; and b) always
> > preempts the running job in slot2; and c) always sets the START
> > expression for slot2 to false when it's on the machine.
> >
> > It's not an ideal solution but it's the only way to achieve this right
> > now. You could use suspension if you're not on Windows instead of
> > preemption for the job in slot2 which makes things a little better. Or
> > checkpointing so you don't loose forward progress from the job in slot2.
> >
> > If you search the archives you'll find a thread about setting up
> > complicated inter-slot start expressions that have jobs suspending and
> > preempting jobs in other slots. I can't remember the title now. Sorry.
> > :( Maybe one of the Condor guys can jump in with a pointer to the
> > complicated setup. It was a university who was doing it, they gave a
> > talk at Condor Week a few years ago about it.
> >
> > - Ian
> >
> >
> > Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
> > If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
> > or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error,
> > please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.
> >
> >
> >
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> >
> >
> >
> >
>
>
> --
> Cecile GARROS
>
> Solution Consultant
> Software Development Division
> mailto:cecile@xxxxxxxxxxxxxxxxx
>
> ------------------------------------------------
> Best Systems, Inc
> Phone: 029-860-7080
> Fax:    029-860-7081
> http://www.bestsystems.co.jp
> ------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>