[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Preparations to move master machine?



Hi,

just a general remark, from my experience moving a collector/negotiator is not such a big deal. If you prepare the new one, shut down the old one and start the new one, maybe with a dns alias. 

Even if you run for a couple of minutes up to 1/2 hour and more without it, all that happens is that no new jobs get started in the pool. The running jobs are not affected of this change (different story when it comes to scheduler). 

Once the new machine is online it will catch up with the lack of new job starts in a couple of minutes unless you have a special configuration and run millions of 5 seconds-runtime-jobs simultaneously or something alike of course ... 

best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Steffen Grunewald" <steffen.grunewald@xxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 13. Mai 2020 10:12:12
Betreff: [HTCondor-users] Preparations to move master machine?

Good morning,

I've got to move the central (collector, negotiator) functionality off a failing
machine, and would like to do this with as little interruption as possible.

The old machine has an IP of a.b.c.100, the new one is at a.b.c.109, and I'd
like to use an aliased interface at a.b.c.190 to provide access, first at the
old machine, then (during a short maintenance) move that to the new one.

There is no firewall in effect, the a.b.c.0/24 network is purely internal.

Currently, most nodes still refer to a.b.c.100, and the central manager has
a matching NETWORK_INTERFACE=a.b.c.100 - since these machines are in production,
this part cannot be changed.

Is it possible (without any disruptions) to change NETWORK_INTERFACE to a.b.c.*,
to answer connection requests on both .100 and .190?
Would condor_shared_port accept connections on both IP addresses then, enabling
the machines that got setup with the updated CENTRAL_MANAGER setting?
Any other pitfalls I didn't see yet? (I've got to think about keeping the
job history for accounting, but that's phase two.)

Thanks,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am MÃhlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/