[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to manipulate the number of available slots on a running WN



I think you might be able to pull this off by using a custom expression for the WithinResourceLimits expression. 

If you look at a slot's Requirements expression, you see something like  this

Requirements = (START) && (IsValidCheckpointPlatform) && (WithinResourceLimits)
Start = SuspendedByAdmin =!= true
WithinResourceLimits = (ifThenElse(TARGET._condor_RequestCpus =!= undefined,MY.Cpus > 0 && TARGET._condor_RequestCpus <= MY.Cpus,ifThenElse(TARGET.RequestCpus =!= undefined,MY.Cpus > 0 && TARGET.RequestCpus <= MY.Cpus,1 <= MY.Cpus)) && ifThenElse(TARGET._condor_RequestMemory =!= undefined,MY.Memory > 0 && TARGET._condor_RequestMemory <= MY.Memory,ifThenElse(TARGET.RequestMemory =!= undefined,MY.Memory > 0 && TARGET.RequestMemory <= MY.Memory,false)) && ifThenElse(TARGET._condor_RequestDisk =!= undefined,MY.Disk > 0 && TARGET._condor_RequestDisk <= MY.Disk,ifThenElse(TARGET.RequestDisk =!= undefined,MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk,false))

The WithinResourceLimits expression is normally automatic.  If you don't set it in your configuration, then it this expression will be generated automatically based on the resources AND CUSTOM RESOURCES!!! that you define in  your configuration. 

But you can configure it yourself, and add a clause to it that would prevent matching when the number of free cpus is
less than the "low power" number while in low power state.   You can also greatly simplify this expression because right now
 the parts of the expression that refer to _condor_Request* don't really come in to play, so you can leave them out without harm. 

so you can configure

WithinResourceLimits  = My.Cpus > (MY.LowPower*8) && 
  (My.Cpus - (MY.LowPower*8) >= IfThenElse(TARGET.RequestCpus is UNDEFINED, 1, TARGET.RequestCpus) && \
  My.Memory > 0 && My.Memory >= TARGET.RequestMemory && \
  My.Disk > 0 && My.Disk <= TARGET.RequestDisk && \
  (TARGET.RequestGPUs is UNDEFINED || My.Gpus >= TARGET.RequestGPUs)

(Be sure to have sub-clauses for all of the custom resources you define on you Startds.)

When the LowPower attribute on the slot is true,  8 cores of the slot will be unmatchable.  When it is false, all of the cores can be matched.

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Beyer, Christoph
Sent: Thursday, June 13, 2019 1:17 PM
To: htcondor-users <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] how to manipulate the number of available slots on a running WN

Hi Tj,

well I just would like to alter some kind of classadd on the host remotely to persuade condor that the machine has only half the slots than it used to have hence running jobs should not be affected. But slots that become available should not be renegotiated until the number of running slots is smaller than PowerSaveCpus.

Say a 16 core machine has 16 jobs running, I pull the trigger and the host does not start any more jobs untill it is down to 7 and will run only 8 slots until I withdraw the powersave mode. 

Best
Christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "johnkn" <johnkn@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 13. Juni 2019 17:03:47
Betreff: Re: [HTCondor-users] how to manipulate the number of available slots on	a running WN

Do you want this change to evict running jobs to get into powersave mode? 
Do you want the machine to drain down to powersave mode?
or do you plan to turn on powersave mode only when the machine is idle (or nearly so)

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Beyer, Christoph
Sent: Thursday, June 13, 2019 6:46 AM
To: htcondor-users <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] how to manipulate the number of available slots on a running WN


Hi,

I do have another maybe typical administrative task, there are works on the powerlines scheduled and I have to scale down the powerconsumption of the pool. 

To do so my initial idea is to reduce the number of available slots per condor WN in forehand of the works: 

(on the WN all slots are partitionable) 

PowerSave = false
PowerSaveCpus = 2
STARTD.SETTABLE_ATTRS_ADMINISTRATOR = StartJobs PowerSave
STARTD_ATTRS = StartJobs, PowerSave, $(STARTD_ATTRS)
NUM_CPUS = ifThenElse($(PowerSave), $(PowerSaveCpus), $(DETECTED_CPUS))

It works so far in the sense that by putting 'PowerSave' to true the number of 'NUM_CPUS' is set accordingly. Unfortunately I seem to need a condor restart for the change to take effect in terms of reduction of slotnumber (which is not what I want) ? 

Is there a more clever way to reduce the number of slots by htcondor configuration means or any way to make the pool node honor the change without restart ? 

Best
Christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/