Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes
- Date: Fri, 29 Jul 2016 21:23:25 +0000
- From: Zach Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes
Yep, Jaimie and I also just confirmed this. The WatchdogSec is set to 5 in our packaging. The master attempts to send a keepalive every (WatchdogSec/2) seconds, so if the timing is bad, having the master block for as little as 3 seconds could trigger systemd to kill off condor.
Workaround for now: Set WatchdogSec to something much higher. Having thought about this for approximately one minute, I'd suggest 60. :)
Cheers,
-zach
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Brian Bockelman
> Sent: Friday, July 29, 2016 4:12 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes
>
>
> Hi Michael,
>
> It's not really a systemd issue. Condor's config file puts in place a
> directive of "if I haven't responded in the last 5 seconds, then consider
> me deadlocked and kill off all my processes."
>
> HTCondor asked, systemd listened.
>
> Sent from my iPhone
>
> On Jul 29, 2016, at 4:00 PM, Michael V Pelletier
> <Michael.V.Pelletier@xxxxxxxxxxxx <mailto:Michael.V.Pelletier@xxxxxxxxxxxx>
> > wrote:
>
>
>
> This seems to be another example of how systemd doesn't seem to
> acknowledge
> decades of UNIX-derived system management experience that people
> have
> accumulated over the years.
>
> http://suckless.org/sucks/systemd
> <http://suckless.org/sucks/systemd>
>
> The philosophy of UNIX is, in part, "write programs that do one
> thing, and
> do it well."
>
>
>
> Which is one reason why systemd is broken up into, what, a dozen different
> daemons?
>
> One thing that sysvinit does poorly is manage services. That's why many
> (most?) commercial POSIX implementations have abandoned it.
>
> In fact, that sysvinit didn't keep up the "and do it well" half of the
> sentence is one of the motivations for the condor_master. It is
> encouraging to me that all the features the HTCondor team found missing
> from sysvinit have now made it into RHEL7's service management framework.
>
> Brian
>
>
>
> I hope I don't loathe it too much when I finally get around to
> installing
> a CentOS 7 VM.
>
> -Michael Pelletier.
> _
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> <mailto:htcondor-users-request@xxxxxxxxxxx> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/