[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Update classad and STARTD_JOB_ATTRS



Hi Again,
Sorry for not explaining my goal.
Soon we will have a few Nvidia GPUs for deep learning jobs, the problem is that jobs will run for a long time probably above 48 hours.
In order to provide reasonable service for all users I will would to enable preemption but I wish to preempt jobs that created a checkpoint in the last 30 minutes.
I'm trying to update a classad using chirp that the negotiator will be able to decide if to preempt the job. for example "Checkpoint = epoch time".

Till now I was unable to publish the modified classad.
Maybe there is a better way to accomplish it?

Many Thanks
David


From: Dudu Handelman <duduhandelman@xxxxxxxxxxx>
Sent: 28 October 2023 19:02
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Update classad and STARTD_JOB_ATTRS
 
Greg. 
Sorry i wrote condor_q but obviously its condor_status.

Thanks 
David


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Dudu Handelman <duduhandelman@xxxxxxxxxxx>
Sent: Thursday, October 26, 2023 12:25:14 PM
To: Greg Thain <gthain@xxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Update classad and STARTD_JOB_ATTRS
 
Thanks Greg. 
While using to chirp to update a clasaad the .job.ad file does not update with new value. 
I have tried  to use STARTD_CRON_AUTOPUBLISH = If_Changed
But the classad remain with the original value whil looking at the slot with condor_q. 

Maybe I need other startd cron knob? 

Thanks a million 
David



From: Greg Thain <gthain@xxxxxxxxxxx>
Sent: Wednesday, October 25, 2023 11:39:51 PM
To: Dudu Handelman <duduhandelman@xxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Update classad and STARTD_JOB_ATTRS
 
On 10/25/23 11:34, Dudu Handelman wrote:
> Thanks Greg.
> We all love knobs :-)
> For some reason it's not copy the chirp changes. Tomorrow I will
> verify that chirp is writing to the job ad file.


Ah -- just to be clear, START_JOB_ATTRS copies the attributes as they
exist at job start time, and doesn't update them subsequently, even if
chirp updates those same attributes to the copy of the job ad in the schedd.

If you dynamically want to change attributes in the startd ad, you'll
need startd cron.


-greg