[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] out-of-memory event?



On Thu, Oct 26, 2017 at 4:33 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
> On 10/26/2017 08:10 AM, Michael Di Domenico wrote:
>>
>>   the jobs were failing on only a few
>> specific hosts and at exactly the same time everyday.  turns out there
>> is a cronjob on those machines that does 'systemctl restart
>> gdm.service'
>>
>> it's not clear exactly why restarting gdm kills off the jobs,
>
> Who is starting Condor on these machines?  If condor was started from a
> shell, I could understand this error.  If somehow, systemd thinks it is the
> owner of the condor cgroups, and is destroying the active cgroups out from
> under condor, that would explain this error as well.

condor is being started by systemd, using the supplied systemd scripts
from condor.  the start of condor takes place after the box is fully
booted via a cron job, if that makes a difference (i don't think it
does though).