Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] out-of-memory event?

Date: Fri, 27 Oct 2017 07:47:48 -0400
From: Michael Di Domenico <mdidomenico4@xxxxxxxxx>
Subject: Re: [HTCondor-users] out-of-memory event?

On Thu, Oct 26, 2017 at 4:33 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
> On 10/26/2017 08:10 AM, Michael Di Domenico wrote:
>>
>>   the jobs were failing on only a few
>> specific hosts and at exactly the same time everyday.  turns out there
>> is a cronjob on those machines that does 'systemctl restart
>> gdm.service'
>>
>> it's not clear exactly why restarting gdm kills off the jobs,
>
> Who is starting Condor on these machines?  If condor was started from a
> shell, I could understand this error.  If somehow, systemd thinks it is the
> owner of the condor cgroups, and is destroying the active cgroups out from
> under condor, that would explain this error as well.

condor is being started by systemd, using the supplied systemd scripts
from condor.  the start of condor takes place after the box is fully
booted via a cron job, if that makes a difference (i don't think it
does though).

References:
- [HTCondor-users] out-of-memory event?
  - From: Michael Di Domenico
- Re: [HTCondor-users] out-of-memory event?
  - From: Michael Di Domenico
- Re: [HTCondor-users] out-of-memory event?
  - From: Michael Di Domenico
- Re: [HTCondor-users] out-of-memory event?
  - From: Greg Thain

Prev by Date: [HTCondor-users] HTCondor and private Docker registries
Next by Date: [HTCondor-users] LD_LIBRARY_PATH not set with getenv=true
Previous by thread: Re: [HTCondor-users] out-of-memory event?
Next by thread: Re: [HTCondor-users] out-of-memory event?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] out-of-memory event?