Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] update of condor-version and job-behaviour

Date: Mon, 30 Aug 2021 22:43:02 +0200 (CEST)
From: Martin Flemming <martin.flemming@xxxxxxx>
Subject: Re: [HTCondor-users] update of condor-version and job-behaviour


HI, Greg !

Thanks for clarification ...

Indeed, I would like to prevent a restart of the jobs respectively aproper running of the Jobs until the end on each workernode, when iupgrade the cluster to an upper version of condor ...

So, if i understand your right, i've got to disable several bunches ofNodes before i start the Update in order to carry out the update asinconspicuously as possible .. in other words, every condor/package-updatecontains an auotmaticly restart of the daemon and so also of the runningjobs on the workernodes ?


... by the way, on the master or sched's we use the workflow

1) condor_off -master -fast
2) Upgrade the binaries
3) restart the master
 ALL WITHIN 20 MINUTES.

If you are more concerned about the badput from restarting a running job,
than the potential loss of throughput from keeping cores idle, you can run
"condor_off -peaceful" on the worker node before your upgrade, and condor
will wait until all the jobs exit before it, itself exits, at which time you
could upgrade the machine.


i didn't know the command

condor_off -peaceful


In general we use to disable Workernodes with

condor_config_val -startd -name bird055.desy.de -set "StartJobs = false"
condor reconfig -startd -name bird055.desy.de
condor_drain -graceful bird055.desy.de


Is this equally significant ?

All in all .. the workflow should be

- disable the workernode
- wait until all jobs are finished
- update
- enable the workernode again
- ?


thanks & cheers,
   Martin



On Mon, 30 Aug 2021, Greg Thain wrote:

fg> Hi Martin:

When HTCondor is upgraded *on the worker node*, or, more generally, when theHTCondor worker node daemons restart for any reason:
Any running jobs are killed, will go back to the "I"dle state in the queue,and HTCondor will restart them, perhaps on another machine.
If you are more concerned about the badput from restarting a running job,than the potential loss of throughput from keeping cores idle, you can run"condor_off -peaceful" on the worker node before your upgrade, and condorwill wait until all the jobs exit before it, itself exits, at which time youcould upgrade the machine.
And just for completeness, upgrading the central manager will not evictjobs. Upgrading the access point (where the schedd runs) will not evictjobs, if the new daemons restart quickly enough.
-greg
 Hi !

 Which is the default behaviour of running jobs on an working-node on which
 the condor-packages will be updated ...?

 a) the running jobs are running well with the old version, and each job
 after update of the packages, they will start with the new installed
 condor-version ?

 b) the running jobs will be canceld after the update and would be
 re-scheduled with the new version?

 c) the running jobs will be cancled and will be lost

 d) ....

 cheers & thanks,

        Martin

 _______________________________________________
 HTCondor-users mailing list
 To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
 a
 subject: Unsubscribe
 You can also unsubscribe by visiting
 https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

 The archives can be found at:
 https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


Gruss

       Martin Flemming


______________________________________________________
Martin Flemming
DESY / IT          office : Building 2b / 008a
Notkestr. 85       phone  : 040 - 8998 - 4667
22603 Hamburg      mail   : martin.flemming@xxxxxxx
______________________________________________________

References:
- [HTCondor-users] update of condor-version and job-behaviour
  - From: Martin Flemming
- Re: [HTCondor-users] update of condor-version and job-behaviour
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] daemon foreground spawning with Condor >=8.9
Next by Date: Re: [HTCondor-users] Negotiator only allocating 1 job per machine per cycle
Previous by thread: Re: [HTCondor-users] update of condor-version and job-behaviour
Next by thread: [HTCondor-users] Detailled monitoring of a DAG
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] update of condor-version and job-behaviour