[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] 10.9.0 and 10.8.0 / 10.7.0 (Follow up, upgrading running nodes)



Yes, I know that there was only a week between 10.8.0 and 10.9.0.

You are correct that a node may be updated from X.Y  to X.Y+1 with minimal impact.

I would be very comfortable with upgrading directly to 10.7.0 from 10.9.0. The 10.9.0 release only has a few bug fixes and a new upgrade checker script from the 10.0.9 version.

Your selected sequence fine. Upgrading in any sequence is fine, unless we call out a specific sequence in the release notes/announcement.

Yes, running older execution points with a newer central manager is fine. In production, we leave a few nodes back at the 10.0 version to ensure interoperability with the latest central manager version.

...Tim

On 10/2/23 07:50, Weatherby,Gerard wrote:

Since 10.9.0 came out before I found time to update to 10.8 â

Can we generalize the statement to say:

A node may be upgraded from version X.Y to version X.Y+1 with minimal impact on cluster running?

Would the sequence always be upgrade the central manager first, then upgrade other nodes (access points, execution points)?

Is running older execution points with newer central manager manager supported? (i.e. leaving execution points running 10.7 with a 10.9 central manager?)

 

From: Tim Theisen <tim@xxxxxxxxxxx>
Date: Monday, September 18, 2023 at 2:29 PM
To: Weatherby,Gerard <gweatherby@xxxxxxxx>, HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] 10.8.0 / 10.7.0 (Follow up, upgrading running nodes)

*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

The condor_master will notice that new binaries have been installed and automatically restart the daemons.

For the Central Manager: It contains the collector and the negotiator. The collector maintains an in memory database of the HTCondor pool. Upon restart, the nodes in the HTCondor pool periodically report to the collector. After several minutes, all the nodes will have reported back in and you are back to where you were before the restart. The Access Points (submit node) can still send jobs to slots that have been given to them to manage until the claim life time expires. So, when jobs complete, similar jobs can continue to start. However, no new matches can be made until the Central Manager comes back online.

For the Access Point (your submit node): When the schedd restarts, the new schedd will read the on disk job queue and begin reconnecting to jobs that were running when the old schedd shut down. After a while, the condor_schedd emails out a restart report that details how many running jobs it was able to reconnect.

In either case, the impact to running jobs should be minimal.

...Tim

On 9/18/23 11:49, Weatherby,Gerard wrote:

Thanks.

Whatâs the impact on the cluster if components are upgraded? Specifically in our case we have:


deb [arch=amd64] https://research.cs.wisc.edu/htcondor/repo/ubuntu/10.x focal main

deb-src https://research.cs.wisc.edu/htcondor/repo/ubuntu/10.x focal main

 

in /etc/apt/sources.list.d/htcondor10.list

 

and weâd do:

apt-get update && apt-get install htcondor


We would be looking to update:


our central manager

our dedicated submit node





 

From: Tim Theisen <tim@xxxxxxxxxxx>
Date: Friday, September 15, 2023 at 10:54 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>, Weatherby,Gerard <gweatherby@xxxxxxxx>
Subject: Re: [HTCondor-users] 10.8.0 / 10.7.0

*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Yes, you can mix 10.7.0 and 1.8.0 in a pool. We ran the configuration you described in our production pool during testing.

In fact, we also test the latest 10.0 release against the latest 10.x release. (HTCondor 10.0.8 interoperates with HTCondor 10.8.0.)

We strive to make HTCondor versions work well together. In particular, the latest LTS release must be able to interoperate with the previous LTS.

...Tim

 

On 9/15/23 07:02, Weatherby,Gerard wrote:

Can 10.8.0 nodes be mixed with 10.7.0 components? e.g. Can 10.7.0 execution points be mixed with a 10.8.0 central manager and access point?

 




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736