[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor & SMP configuration



Do you have NUMBER_OF_CLAIMED_CPUS added to STARTD_ATTRS
You have to do that otherwise it won't work.
Also see Matt's post that came later.. Anyone I know who has made
it work has done something like what is on that web page.

Steve Timm


On Wed, 1 Oct 2008, Andrea Borsic wrote:

Dear All,

I have been reading the user forums and trying to follow the proposed
solutions, but I am still having troubles with the configuration of
Condor for use on a SMP cluster. Each node on the cluster has 8 CPUs,
and I am using the trick of presenting several types of slots in order
to allow running jobs on 1,2,3,4,8 threads. The configuration file
(attached below) has a logic that should prevent too many jobs to run on
the single node.

If I submit a mix of 8,4,2, and 1 threaded jobs, I get for example a 8
threaded job running concurrently with three or four 2 threaded jobs on
the same node, exceeding the number of CPUs available. The SLOTx_START
expressions should prevent this, they seem to be ineffective though.

Any suggestion is kindly appreciated.

Regards,

Andrea Borsic



# here we compute the total number of cpus in use
NUMBER_OF_CLAIMED_CPUS = \
(\
(1*(slot1_State =?= "Claimed")) + \
(1*(slot2_State =?= "Claimed")) + \
(1*(slot4_State =?= "Claimed")) + \
(1*(slot5_State =?= "Claimed")) + \
(1*(slot6_State =?= "Claimed")) + \
(1*(slot7_State =?= "Claimed")) + \
(1*(slot8_State =?= "Claimed")) + \
(2*(slot9_State =?= "Claimed")) + \
(2*(slot10_State =?= "Claimed")) + \
(2*(slot11_State =?= "Claimed")) + \
(2*(slot12_State =?= "Claimed")) + \
(4*(slot13_State =?= "Claimed")) + \
(4*(slot14_State =?= "Claimed")) + \
(8*(slot15_State =?= "Claimed"))   \
)

# start single threaded jobs only if at least 1 cpu is available
SLOT1_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT2_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT3_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT4_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT5_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT6_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT7_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
SLOT8_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 7)
# start 2 threaded jobs only if at least 2 cpus are available
SLOT9_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
SLOT10_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
SLOT11_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
SLOT12_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 6)
# start 4 threaded jobs only if at least 4 cpus are available
SLOT13_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 4)
SLOT14_START = ($(NUMBER_OF_CLAIMED_CPUS) <= 4)
# start 8 threaded jobs only if all cpus are available
SLOT15_START = ($(NUMBER_OF_CLAIMED_CPUS) == 0)

MEMORY = 32768
NUM_CPUS = 32

SLOT_TYPE_1 = cpus=1, ram=1024
SLOT_TYPE_2 = cpus=2, ram=2048
SLOT_TYPE_3 = cpus=4, ram=4096
SLOT_TYPE_4 = cpus=8, ram=8192
NUM_SLOTS_TYPE_1 = 8
NUM_SLOTS_TYPE_2 = 4
NUM_SLOTS_TYPE_3 = 2
NUM_SLOTS_TYPE_4 = 1



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.