I was reminded today of a couple of things that might be of interest to you.
Keep in mind is that then a job is IDLE (JobStatus == 1) you probably don’t want to be evaluating this particular _expression_, since there is no meaningful value of TARGET.Disk (or whatever equivalent you come up with) in that case.
Also the jobs’s DiskUsage is given an initial value by condor_submit, but it is also updated from values calculated on execute node as the job runs. These values are passed back to the Schedd (through the Shadow), but the values in the Schedd are not updated very frequently. The values in the Shadow *are* updated frequently, at least as often as they are updated on the execute node.
PERIODIC_HOLD is evaluated by the Shadow while the job is running. So it’s useful to keep in mind that the values that condor_q will show you for job attributes like DiskUsage are not necessarily the values that PERIODIC_HOLD will see when evaluating policy while the job is running. condor_q will show you what is stored in the Schedd, but the Shadow is working with fresher data.
Also for this case, you have another policy option that I should have mentioned - You can configure the STARTD to put the job on hold if the DiskUsage exceeds the disk allotted to the slot. Since this is STARTD policy, you can use a different value as the limit on each STARTD.
Your configuration might look something like this:
# Have the STARTD put a job on hold if it’s disk usage is greater than the disk assigned to the slot.
# this policy ignores VM universe jobs since it uses a different method to allocate disk
DISK_EXCEEDED = (JobUniverse != 13 && DiskUsage =!= UNDEFINED && DiskUsage > Disk)
HOLD_REASON_DISK_EXCEEDED = disk usage exceeded disk allotted to the job
use POLICY : WANT_HOLD_IF(DISK_EXCEEDED, $(HOLD_SUBCODE_DISK_EXCEEDED:103), $(HOLD_DISK_MEMORY_EXCEEDED) )
Many thanks for your insights, I’ll look into this :]