[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] RequestCpus > 1 and Dynamic (Partitionable) Slots



I've confirmed on many tests that jobs with RequestCpus > 1 don't seem to be compatible with dynamic slots.

Is this a condor version issue?  I'm running 7.4.4 on x86_64

Our system has many, many jobs that consume between 1-8 cpus and many SMP machines with 4 and 32 cores.

(I can use condor_qedit and get a job to run on a dynamic slot just by switching its Cpus to 1.   It will not run otherwise ... even if Start=TRUE)

The message from analyse is "2 reject your job because of their own requirements" ... (or however many slots are partitionable).

It would be nice to be able to take a job id and a node, and then ask for an explanation of why it's not running on that node.
 
If I run a bunch of jobs with 1 cpu... the dynamic slot works as advertised...  forking off new slots and reclaiming them later... quite nicely.   I even like to leave some lots this way - since they are so much better about resource utilization... in every other respect.

I've noticed one other thread posting about this, but have never seen a final solution.

https://www-auth.cs.wisc.edu/lists/condor-users/2009-June/msg00065.shtml

Has anyone gotten dynamic slots to work with RequestCpus > 1... where it actually decrements the number of cpus from those remaining?

> condor -version
$CondorVersion: 7.2.4 Apr 11 2010 $
$CondorPlatform: X86_64-LINUX_DEBIAN_UNKNOWN $

>condor_status ea-morpheus -l | grep Cpu
CpuIsBusy = false
Cpus = 1
CpuBusyTime = 0
CpuBusy = ( ( LoadAvg - CondorLoadAvg ) >= 0.500000 )
TotalCpus = 4

Machine doing nothing:

>cat /srv/condor/ea-morpheus
DAEMON_LIST = MASTER, STARTD
NUM_SLOTS=1
SLOT_TYPE_1=Cpu=4,auto
SLOT_TYPE_1_PARTITIONABLE=TRUE
NUM_SLOTS_TYPE_1=1
START=TRUE

JOB not running:

> condor_q 6490.0 -l | grep Req
AutoClusterAttrs = "JobUniverse,LastCheckpointPlatform,NumCkpts,RequestCpus,RequestDisk,RequestMemory,FileSystemDomain,DiskUsage,ImageSize,Requirements,NiceUser,ConcurrencyLimits"
RequestDisk = DiskUsage
RequestMemory = 500
RequestCpus = 2
Requirements = ( Memory >= 500 ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= DiskUsage ) && ( ( RequestMemory * 1024 ) >= ImageSize ) && ( TARGET.FileSystemDomain == MY.FileSystemDomain )

- Erik