Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes

Date: Wed, 24 Aug 2016 10:56:31 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes

On 07/27/2016 02:59 PM, andrew.lahiff@xxxxxxxxxx wrote:

Hi,


An interesting side effect of this is that while HTCondor deletes the job sandboxes, the Docker containers actually continue running, but HTCondor seems unaware of this, and therefore eventually starts running another set of jobs. So I end up with twice as many jobs running as there should be on an affected worker node, half of which are no longer under HTCondor's control.


Andrew (et al.)

I just wanted to close the loop on this, as we've just pushed a bug fixfor the orphaned docker containers. These are now removed at such timeas the startd restarts.


-greg

Prev by Date: Re: [HTCondor-users] Preferring execute nodes that are seldomly used by the owner
Next by Date: [HTCondor-users] Docker universe and GPUs
Previous by thread: Re: [HTCondor-users] Preferring execute nodes that are seldomly used by the owner
Next by thread: [HTCondor-users] Docker universe and GPUs
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] HTCondor daemons dying on SL7 worker nodes