[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Adding custom job classads on condor_starter nodes



Hello Experts,Â

We are running condor jobs on pre-emptible google cloud instances. I wanted to add something in job classad to identify the jobs impacted because of pre-empted instances.Â

On sched file:Â

SYSTEM_JOB_MACHINE_ATTRS = $(SYSTEM_JOB_MACHINE_ATTRS) nodehealth

on started classAD is advertised.Â

test.example:/etc/condor/config.d# condor_status -compact `hostname` -af machine nodehealth
test.example.com False1

I can see the following in job classAD.Â

$ condor_q -run -af jobruncount MachineAttrnodehealth0 MachineAttrnodehealth1
1 False1 undefined
1 False1 undefined

But when I change the value of classAD (by directly modifying condor configuration and running condor_reconfig) on executor node it's not getting reflected in job definition.Â

I have seen this message in log file. Our executor directory is onlyÂ

06/02/20 06:49:38 slot1_1: Failed to open '/spare/condor/dir_418909/.update.ad.tmp' for writing update ad: No such file or directory (2).

However I do see that .updated.ad file inside the execution directory has the updated value but still machine and job ad reflecting old value as they can't change dynamically.Â

# grep nodehealth .update.ad
nodehealth = "False4"

# grep nodehealth .job.ad
MachineAttrnodehealth0 = "False1"

# grep nodehealth .machine.ad
nodehealth = "False1"

# condor_status -compact `hostname` -af machine nodehealth
test.example.com False4

After hold/release job is picking new value but I want to update the value in running instance of job.Â

gone through link [1] but that one also is not useful.

Any input is highly appreciated.Â

[1]Âhttps://www-auth.cs.wisc.edu/lists/htcondor-users/2016-September/msg00034.shtml

Thanks & Regards,
Vikrant Aggarwal