[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Odd behavior of condor_master in SL7




> On Dec 4, 2017, at 3:36 PM, Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> wrote:
> 
> On 12/04/2017 03:02 PM, Bob Ball wrote:
>> OK, so, systemd is still new to me, is the restart caused by this line
>> in /usr/lib/systemd/system/condor.service ....
>> [Service]
>> ...
>> Restart=always
>> 
>> But then, this seems inconsistent, in that I use the same command to
>> stop condor as specified in this same file
>> ExecStop=/usr/sbin/condor_off -master
>> 
>> So, I still don't have a real understanding of this.
> 
> It should be Restart=on-[something or other dep. on which signal is sent
> by condor_off when]

To elaborate on this -

HTCondor's configuration of the service basically says "automatically restart HTCondor whenever the condor_master stops (unless if stopped via systemd)".

What is probably intended is "restart HTCondor whenever the condor_master fails".  As Dimitri points out, probably "Restart=on-failure".

Bob:  you can create an override file (/etc/systemd/system/condor.service.d/override.conf) and change this setting on your own without waiting for a release.

> 
> Or if you drank the Kool-Aid: get rid of condor_off and probably
> condor_master as well, and shared_port and have systemd do their jobs.
> 

Both condor_master/condor_off and condor_shared_port have functionality that systemd does not provide.

If one was writing HTCondor from scratch and supporting only Linux - yeah, you could probably avoid your own condor_master / condor_off (you'd either skip some functionality or provide it in a different layer).

However, it's not clear that there's anything close to condor_shared_port's functionality (multiplexing a single TCP connection over multiple services in order to simplify firewall configurations..).

Brian