Re: [Condor-users] Multi-Threaded Jobs on Condor


We faced the same problem recently. It seems one solution is to set more job slots than the actual number of CPU. For your dual-core boxes you would have:

  * 2 slots of type 1, with type 1: cpus=1, ram=50%
  * 1 additional slot of type 2, with type 2: cpus=2, ram=100%

To make this possible you have to "lie" to Condor and to add the following attributes in your local configuration file:

  * NUM_CPUS=4
  * MEMORY=2*ActualMemory

Then you have to specify that a job should not start on a type1 slot if a type 2 slot is in use and vice versa.

Finally you can tag your multi-threaded jobs to run on type 2 slots. Ordinary jobs should run on type 1 slots as a default.

A good summary about what has been done on that- with interesting links as well- is here:

Hope this helps,

We have a small cluster of dual processor nodes. We want to be able
to submit jobs which contain a multi-threaded code through Condor.

Ideally, we want the job to claim both processors on the node - if we
use slots then the allocate can claim two slots on different nodes.

Is there anyway to specify this in the parallel job submission file
so that both 'slots' on the same node are claimed correctly?


Right now there's no way to accomplish this without preemption (or
suspension). That is, you can't have Condor hold a slot free while the
other slot is running a non-parallel job so a parallel job in the queue
gets the whole machine.

What you can do is set up a machine so when a job tagged as being
"parallel" wants the machine it a) always runs in slot 1; and b) always
preempts the running job in slot2; and c) always sets the START
expression for slot2 to false when it's on the machine.

It's not an ideal solution but it's the only way to achieve this right
now. You could use suspension if you're not on Windows instead of
preemption for the job in slot2 which makes things a little better. Or
checkpointing so you don't loose forward progress from the job in slot2.

If you search the archives you'll find a thread about setting up
complicated inter-slot start expressions that have jobs suspending and
preempting jobs in other slots. I can't remember the title now. Sorry.
:( Maybe one of the Condor guys can jump in with a pointer to the
complicated setup. It was a university who was doing it, they gave a
talk at Condor Week a few years ago about it.

- Ian

