[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Machine classadd inside SYSTEM_PERIODIC_HOLD ?



We have startd nodes with differing hardware specs and especially disks (different each time we purchase them, that is to say : each year)

On some old nodes, we have 2 x 250GB disks , on latest ones we have 3x1TB disks.


In order to prevent machines from jobs, we set a SYSTEM_PERIODIC_HOLD that takes into account the job disk usage, and it’s working well… but it’s very restrictive because of those 250GB nodes.


I therefore am trying to set a periodic hold that’s generous where machines have more disks, and more restrictive when they don’t.

Jobs we receive are grid jobs, and there’s no disk requirement that’s sent.


I tried to define the hold _expression_ so that it would use the machine disks size to compute a reasonable value that could exceed the per job core limit we usually define (25 GB now per core), but it seems machine classads are not defined while evaluating the hold expressions.

I.E, the shadow debug logs show :


#for : MAX_DISK_KB   = debug( TARGET.TotalDisk / TARGET.TotalCpus )

#and then


   (JobStatus == 1 || JobStatus == 2) && ((DiskUsage > $(MAX_DISK_KB) || ResidentSetSize > JobMemoryLimit * 2 ))


The shadow logs give :


1592996884 2020/06/24 13:08:04 (83.0) (1308254): Classad debug: [0.00215ms] TARGET --> UNDEFINED

1592996884 2020/06/24 13:08:04 (83.0) (1308254): Classad debug: [0.10014ms] TARGET.TotalCpus --> UNDEFINED

1592996884 2020/06/24 13:08:04 (83.0) (1308254): Classad debug: [0.55599ms] TARGET.TotalDisk / TARGET.TotalCpus --> UNDEFINED


ð  Question : would there be a “simple” way to get this working ? Any way to get Machine classadds available while evaluating SYSTEM_PERIODIC_HOLD ?


Off course we could get rid of those 250GB disks and node, but for now we are still coping with them, and we’ll face the same issue over and over again, when new machines have say 3*3TB NVME disks or whatever…


Thanks && regards