[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] best way to switch a main negotiator/collector head towards a secondary?



Hi greg,

thanks for the detailed explanation.

If I understand it correctly, there would be no risk of a split-brain negotiator(s) even if a part of the cluster tries to chat with the primary and the other part tries the secondary - i.e., when one would push a change of the primary/secondary order onto the cluster and it takes a while. Not that two `Accountingnew.log`s compete with each other for which the HAD would have to move them back & forth.

So, probably the safest way would be to to stop the primary's negotiator for good and let the collector+HAD organize the fail over to the secondary - before pushing any hard-wired changes to the cluster itself

Cheers,
  Thomas

On 30/08/2021 17.01, Greg Thain wrote:

Hi Thomas:

I think it might help to go down into the details a bit here, to understand what the best approach is.

The CM has two components, the collector and the negotiator. In a HA setup, usually there are two collectors, and all HTCondor daemons advertise to both. Queries pick one collector. In HA terms, the collector is Active-Active. The negotiator is different, there can be only one active at one. However, if no negotiator is running in the pool, all jobs continue to run as usual, and schedd can even start new jobs running with the matches they currently have. The persistent state in the negotiator contains the historical accounting information, in the "Accountingnew.log" file. The HAD daemons periodically transfer the Accountingnew.log file from the active to the backup machine, and heartbeat the two machines in a HAD central manager setup, when if the currently active negotiator fails, it starts the other negotiator, with a potentially somewhat-out-of-date accounting information.

Most sites are willing to allow some time for no negotiator to be running at all, as little throughput will be generally lost.

-greg

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature