[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor & SMP configuration



You should have a look at http://nmi.cs.wisc.edu/node/1480

Best,


matt

Andrea Borsic wrote:
Dear All,

I have been reading the user forums and trying to follow the proposed solutions, but I am still having troubles with the configuration of Condor for use on a SMP cluster. Each node on the cluster has 8 CPUs, and I am using the trick of presenting several types of slots in order to allow running jobs on 1,2,3,4,8 threads. The configuration file (attached below) has a logic that should prevent too many jobs to run on the single node.

If I submit a mix of 8,4,2, and 1 threaded jobs, I get for example a 8 threaded job running concurrently with three or four 2 threaded jobs on the same node, exceeding the number of CPUs available. The SLOTx_START expressions should prevent this, they seem to be ineffective though.

Any suggestion is kindly appreciated.

Regards,

Andrea Borsic



# here we compute the total number of cpus in use
NUMBER_OF_CLAIMED_CPUS = \
(\
 (1*(slot1_State =?= "Claimed")) + \
 (1*(slot2_State =?= "Claimed")) + \
 (1*(slot4_State =?= "Claimed")) + \
 (1*(slot5_State =?= "Claimed")) + \
 (1*(slot6_State =?= "Claimed")) + \
 (1*(slot7_State =?= "Claimed")) + \
 (1*(slot8_State =?= "Claimed")) + \
 (2*(slot9_State =?= "Claimed")) + \
 (2*(slot10_State =?= "Claimed")) + \
 (2*(slot11_State =?= "Claimed")) + \
 (2*(slot12_State =?= "Claimed")) + \
 (4*(slot13_State =?= "Claimed")) + \
 (4*(slot14_State =?= "Claimed")) + \
 (8*(slot15_State =?= "Claimed"))   \
)

# start single threaded jobs only if at least 1 cpu is available
SLOT1_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT2_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT3_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT4_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT5_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT6_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT7_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT8_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
# start 2 threaded jobs only if at least 2 cpus are available
SLOT9_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
SLOT10_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
SLOT11_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
SLOT12_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
# start 4 threaded jobs only if at least 4 cpus are available
SLOT13_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 4)
SLOT14_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 4)
# start 8 threaded jobs only if all cpus are available
SLOT15_START = ($(NUMBER_OF_CLAIMED_CPUS) == 0)

MEMORY = 32768
NUM_CPUS = 32

SLOT_TYPE_1 = cpus=1, ram=1024
SLOT_TYPE_2 = cpus=2, ram=2048
SLOT_TYPE_3 = cpus=4, ram=4096
SLOT_TYPE_4 = cpus=8, ram=8192
NUM_SLOTS_TYPE_1 = 8
NUM_SLOTS_TYPE_2 = 4
NUM_SLOTS_TYPE_3 = 2
NUM_SLOTS_TYPE_4 = 1



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/