[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] dynamic slots configuration



Hello,

I'm running ATLAS(LHC) single/multi-core jobs, and I have observed that
nodes that have 20 cores are behaving as a black-hole when they are
running multicore and single-core jobs. I have configured dynamic slots &
defragmentation, and I'm wondering if is not something wrong in my
configuration. I have splitted in 6 slots, 2 slots with 8 cores and 4
slots with 1 core, in order to use all 20cores of the machine:

CLAIM_WORKLIFE=3600
CONTINUE=TRUE
JOB_RENICE_INCREMENT=10
KILL=FALSE
NUM_SLOTS=6
NUM_SLOTS_TYPE_1=2
SLOT_TYPE_1_PARTITIONABLE=TRUE
SLOT_TYPE_1=cpus=8
NUM_SLOTS_TYPE_2=4
SLOT_TYPE_2_PARTITIONABLE=TRUE
SLOT_TYPE_2=cpus=1
PREEMPT=FALSE
RANK=0
SUSPEND=FALSE
SLOT_TYPE_1_CONSUMPTION_POLICY=False
SLOT_TYPE_2_CONSUMPTION_POLICY=False
CONSUMPTION_POLICY=False
CLAIM_PARTITIONABLE_LEFTOVERS=False

Alos below you can see my defragmentation configuration file:

SETTABLE_ATTRS_CONFIG=DEFRAG_MAX_WHOLE_MACHINES
,DEFRAG_MAX_CONCURRENT_DRAINING ,DEFRAG_DRAINING_MACHINES_PER_HOUR
ENABLE_RUNTIME_CONFIG=TRUE

DEFRAG_MAX_WHOLE_MACHINES = 1
DEFRAG_MAX_CONCURRENT_DRAINING = 1
DEFRAG_DRAINING_MACHINES_PER_HOUR = 20
DEFRAG_WHOLE_MACHINE_EXPR = (Cpus >= 8 && PartitionableSlot)

and the defragmentation script looks like this:

#!/bin/bash


function setDefrag () {

   defrag_address=$(condor_status -any -autoformat MyAddress -constraint
'MyType =?= "Defrag"')

   echo "Setting DEFRAG_MAX_CONCURRENT_DRAINING=$3,
DEFRAG_DRAINING_MACHINES_PER_HOUR=$4, DEFRAG_MAX_WHOLE_MACHINES=$5
(idle multicore=$1, running multicore=$2)"

   /usr/bin/condor_config_val -address "$defrag_address" -rset
"DEFRAG_MAX_CONCURRENT_DRAINING = $3" >& /dev/null
   /usr/bin/condor_config_val -address "$defrag_address" -rset
"DEFRAG_DRAINING_MACHINES_PER_HOUR = $4" >& /dev/null
   /usr/bin/condor_config_val -address "$defrag_address" -rset
"DEFRAG_MAX_WHOLE_MACHINES = $5" >& /dev/null
   /usr/sbin/condor_reconfig -daemon defrag >& /dev/null
}

idle_jobs=$(condor_q atlas01 -constraint 'JobStatus==1' -af RequestCpus
-name arc6atlas1.nipne.ro| grep 8|wc -l)
running_jobs=$(condor_q atlas01 -constraint 'JobStatus==2' -af RequestCpus
-name arc6atlas1.nipne.ro| grep 8|wc -l)

if [ $idle_jobs -gt 15 ] && [ $running_jobs -lt 150 ]
then
   setDefrag $idle_jobs $running_jobs 40 25 120
elif [ $idle_jobs -gt 15 ] && [ $running_jobs -gt 150 ]
then
   setDefrag $idle_jobs $running_jobs 4 4 120
else
   setDefrag $idle_jobs $running_jobs 1 1 4
fi


Looks ok for you my configuration?Should I look elsewhere for the
problem(not in condor configuration)?

Thank you,
Mihai



Dr. Mihai Ciubancan
IT Department
National Institute of Physics and Nuclear Engineering "Horia Hulubei"
Str. Reactorului no. 30, P.O. BOX MG-6
077125, Magurele - Bucharest, Romania
http://www.ifin.ro
Work:   +40214042360
Mobile: +40761345687
Fax:    +40214042395