De : HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
De la part de John M Knoeller
Envoyé : jeudi 25 juin 2020 20:35
À : HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Objet : Re: [HTCondor-users] Machine classadd inside SYSTEM_PERIODIC_HOLD ?
I was reminded today of a couple of things that might be of interest to you.
Keep in mind is that then a job is IDLE (JobStatus == 1) you probably don’t want to be evaluating this particular _expression_, since there is no meaningful value of TARGET.Disk (or whatever equivalent you come up with)
in that case.
Also the jobs’s DiskUsage is given an initial value by condor_submit, but it is also updated from values calculated on execute node as the job runs. These values are passed back to the Schedd (through the Shadow), but
the values in the Schedd are not updated very frequently. The values in the Shadow *are* updated frequently, at least as often as they are updated on the execute node.
PERIODIC_HOLD is evaluated by the Shadow while the job is running. So it’s useful to keep in mind that the values that condor_q will show you for job attributes like DiskUsage are not necessarily the values that PERIODIC_HOLD
will see when evaluating policy while the job is running. condor_q will show you what is stored in the Schedd, but the Shadow is working with fresher data.
Also for this case, you have another policy option that I should have mentioned - You can configure the STARTD to put the job on hold if the DiskUsage exceeds the disk allotted to the slot. Since this is STARTD policy,
you can use a different value as the limit on each STARTD.
Your configuration might look something like this:
# Have the STARTD put a job on hold if it’s disk usage is greater than the disk assigned to the slot.
# this policy ignores VM universe jobs since it uses a different method to allocate disk
DISK_EXCEEDED = (JobUniverse != 13 && DiskUsage =!= UNDEFINED && DiskUsage > Disk)
HOLD_REASON_DISK_EXCEEDED = disk usage exceeded disk allotted to the job
use POLICY : WANT_HOLD_IF(DISK_EXCEEDED, $(HOLD_SUBCODE_DISK_EXCEEDED:103), $(HOLD_DISK_MEMORY_EXCEEDED) )
Many thanks for your insights, I’ll look into this :]
There is no easy way to add a TARGET ad to the evaluation of SYSTEM_PERIODIC_HOLD, doing that would require us to change the code in the Schedd, and even then the TARGET would not always be available when SYSTEM_PERIODIC_HOLD
was evaluated and would not be updated during the course of the job. The Schedd only sees the Machine ad once, when it gets a match from the Negotiator.
That does not mean that you can’t do something like this, you will just have to go about it a different way. if you control the job,
you can use $$() expansion or job_machine_attrs to inject an attribute into the job whose value comes from the machine,
then look at that attribute from the SYSTEM_PERIODIC_HOLD _expression_.
Unfortunately, the Disk attribute of a Machine ClassAd is dynamic, and not necessarily updated as frequently as you would need,
so you are better off just tagging the machines that have small disks with a special attribute at config time, something like
this will be more reliably available for $$() expansion.
# add to the config of machines that have small disks
STARTD_ATTRS = $(STARTD_ATTRS) SmallDiskMachine
SmallDiskMachine = true