[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] schedd stopped working (died?) with SchedLog filled with 'WriteUserLog checking for event, log rotation, but no lock'
- Date: Mon, 17 Oct 2016 12:29:23 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] schedd stopped working (died?) with SchedLog filled with 'WriteUserLog checking for event, log rotation, but no lock'
On 10/17/2016 4:44 AM, Thomas Hartmann wrote:
Job submission failed around 2:00 tonight after which the SchedLog 
contained only of lines as
> WriteUserLog checking for event log rotation, but no lock
which occured before as well but not solely.
A bit later at ~2:16 the MasterLog  started to log sched daemons to
be reaped/to die(?) exiting with code 44. Restarts of the schedd went on
for ~20m after which the MasterLog went silent until the service got
I found so far no information on the schedd error code 44 but only for
the shadow .
For any/all HTCondor daemons, exit code 44 means that the HTCondor
daemon in question failed to write to write or rotate a log file. I.e.,
the write() system call failed. Look at the filesystems holding the
paths for LOG, LOCK, and/or EVENT_LOG via
condor_config_val LOG LOCK EVENT_LOG
The most likely cause for this is the filesystem(s) in question were
full, or fell offline (i.e. NFS mount failed if they are not local).
BTW, I would encourage your to have these directories on local disk if
they are not already.
Hope the above helps