[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes

On 07/27/2016 02:59 PM, andrew.lahiff@xxxxxxxxxx wrote:

An interesting side effect of this is that while HTCondor deletes the job sandboxes, the Docker containers actually continue running, but HTCondor seems unaware of this, and therefore eventually starts running another set of jobs. So I end up with twice as many jobs running as there should be on an affected worker node, half of which are no longer under HTCondor's control.

Andrew (et al.)

I just wanted to close the loop on this, as we've just pushed a bug fix for the orphaned docker containers. These are now removed at such time as the startd restarts.