[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes
- Date: Wed, 24 Aug 2016 10:56:31 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes
On 07/27/2016 02:59 PM, andrew.lahiff@xxxxxxxxxx wrote:
An interesting side effect of this is that while HTCondor deletes the job sandboxes, the Docker containers actually continue running, but HTCondor seems unaware of this, and therefore eventually starts running another set of jobs. So I end up with twice as many jobs running as there should be on an affected worker node, half of which are no longer under HTCondor's control.
Andrew (et al.)
I just wanted to close the loop on this, as we've just pushed a bug fix
for the orphaned docker containers. These are now removed at such time
as the startd restarts.