[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Multi-Threaded Jobs on Condor



Si,

We faced the same problem recently. It seems one solution is to set more job slots than the actual number of CPU. For your dual-core boxes you would have:

  * 2 slots of type 1, with type 1: cpus=1, ram=50%
  * 1 additional slot of type 2, with type 2: cpus=2, ram=100%

To make this possible you have to "lie" to Condor and to add the following attributes in your local configuration file:

  * NUM_CPUS=4
  * MEMORY=2*ActualMemory

Then you have to specify that a job should not start on a type1 slot if a type 2 slot is in use and vice versa.

Finally you can tag your multi-threaded jobs to run on type 2 slots. Ordinary jobs should run on type 1 slots as a default.

A good summary about what has been done on that- with interesting links as well- is here:
https://lists.cs.wisc.edu/archive/condor-users/2007-June/msg00295.shtml

Hope this helps,
-Cecile


We have a small cluster of dual processor nodes. We want to be able
to submit jobs which contain a multi-threaded code through Condor.

Ideally, we want the job to claim both processors on the node - if we
use slots then the allocate can claim two slots on different nodes.

Is there anyway to specify this in the parallel job submission file
so that both 'slots' on the same node are claimed correctly?

Si,

Right now there's no way to accomplish this without preemption (or
suspension). That is, you can't have Condor hold a slot free while the
other slot is running a non-parallel job so a parallel job in the queue
gets the whole machine.

What you can do is set up a machine so when a job tagged as being
"parallel" wants the machine it a) always runs in slot 1; and b) always
preempts the running job in slot2; and c) always sets the START
expression for slot2 to false when it's on the machine.

It's not an ideal solution but it's the only way to achieve this right
now. You could use suspension if you're not on Windows instead of
preemption for the job in slot2 which makes things a little better. Or
checkpointing so you don't loose forward progress from the job in slot2.

If you search the archives you'll find a thread about setting up
complicated inter-slot start expressions that have jobs suspending and
preempting jobs in other slots. I can't remember the title now. Sorry.
:( Maybe one of the Condor guys can jump in with a pointer to the
complicated setup. It was a university who was doing it, they gave a
talk at Condor Week a few years ago about it.

- Ian


Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you.




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/





--
Cecile GARROS

Solution Consultant
Software Development Division
mailto:cecile@xxxxxxxxxxxxxxxxx

------------------------------------------------
Best Systems, Inc
Phone: 029-860-7080
Fax: 	029-860-7081
http://www.bestsystems.co.jp
------------------------------------------------