[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Adding custom job classads on condor_starter nodes



Hi John,

Thanks for your response.Â

As per the documentation it seems like that STARTD_JOB_ATTRS has different purpose, its advertising job classAD into machine classAD. In my case I want to advertise Machine classAD into Job classAD.

STARTD_JOB_ATTRS
When the machine is claimed by a remote user, theÂcondor_startdÂcan also advertise arbitrary attributes from the job ClassAd in the machine ClassAd. List the attribute names to be advertised.ÂNOTE: Since these are already ClassAd expressions, do not do anything unusual with strings. By default, the job ClassAd attributes JobUniverse, NiceUser, ExecutableSize and ImageSize are advertised into the machine ClassAd. This setting was formerly calledÂSTARTD_JOB_EXPRS. The older name is still supported, but support for the older name may be removed in a future version of HTCondor.


 Also checkedÂSTARTD_ATTRS also but doesn't seem to right candidate for this job as they are only adding classAD into machine classAD. Thought of doing hack on submit side but doesn't work hence discarded this idea.Â

submit_attrs = $(submit_attrs) nodeempted
nodeempted = ifthenelse(!isundefined(target.nodehealth),target.nodehealth,False)

But in job classAD nodeempted always come as False.Â

Thanks & Regards,
Vikrant Aggarwal


On Tue, Jun 2, 2020 at 8:33 PM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

SYSTEM_JOB_MACHINE_ATTRS is a list of Machine attributes copied from the match ad (which the Negotiator sends to the Schedd) into the Job ad when a job starts running. This is something that the Schedd does and only when a job starts.Â

Â

A change in the execute node of the value of the attribute that SYSTEM_JOB_MACHINE_ATTRS is copying will not be reflected into jobs until the next time a job starts on that machine *as the result of a full negotiation cycle*, so this can take a very long time to propagate, and the value will never change while a job a running.

Â

For something like node health, which can change as the job runs, I think you want to configure STARTD_JOB_ATTRS on the execute node instead of SYSTEM_JOB_MACHINE_ATTRS on the submit node.

Â

-tj

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of ervikrant06@xxxxxxxxx
Sent: Tuesday, June 2, 2020 6:40 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Adding custom job classads on condor_starter nodes

Â

Hello Experts,Â

Â

We are running condor jobs on pre-emptible google cloud instances. I wanted to add something in job classad to identify the jobs impacted because of pre-empted instances.Â

Â

On sched file:Â

Â

SYSTEM_JOB_MACHINE_ATTRS = $(SYSTEM_JOB_MACHINE_ATTRS) nodehealth

Â

on started classAD is advertised.Â

Â

test.example:/etc/condor/config.d# condor_status -compact `hostname` -af machine nodehealth
test.example.com False1

Â

I can see the following in job classAD.Â

Â

$ condor_q -run -af jobruncount MachineAttrnodehealth0 MachineAttrnodehealth1
1 False1 undefined
1 False1 undefined

Â

But when I change the value of classAD (by directly modifying condor configuration and running condor_reconfig) on executor node it's not getting reflected in job definition.Â

Â

I have seen this message in log file. Our executor directory is onlyÂ

Â

06/02/20 06:49:38 slot1_1: Failed to open '/spare/condor/dir_418909/.update.ad.tmp' for writing update ad: No such file or directory (2).

Â

However I do see that .updated.ad file inside the execution directory has the updated value but still machine and job ad reflecting old value as they can't change dynamically.Â

Â

# grep nodehealth .update.ad
nodehealth = "False4"

Â

# grep nodehealth .job.ad
MachineAttrnodehealth0 = "False1"

Â

# grep nodehealth .machine.ad
nodehealth = "False1"

Â

# condor_status -compact `hostname` -af machine nodehealth
test.example.com False4

Â

After hold/release job is picking new value but I want to update the value in running instance of job.Â

Â

gone through link [1] but that one also is not useful.

Â

Any input is highly appreciated.Â

Â


Thanks & Regards,

Vikrant Aggarwal

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/