[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] schedd stopped working (died?) with SchedLog filled with 'WriteUserLog checking for event, log rotation, but no lock'



Hi Todd,

many thanks for the lead!
Actually, I may have got fooled by /var/log being a 'local' ext4 with
sufficient space/inodes (disk I/O OKish as well) - while the node is
actually a VM :-/
Maybe the scheduler node got 'thwarted' by some other guest or the
underlying attached storage!? Going to check...

Cheers and thanks,
  Thomas


On 2016-10-17 19:29, Todd Tannenbaum wrote:
> On 10/17/2016 4:44 AM, Thomas Hartmann wrote:
>> Job submission failed around 2:00 tonight after which the SchedLog [1]
>> contained only of lines as
>>  > WriteUserLog checking for event log rotation, but no lock
>> which occured before as well but not solely.
>>
>> A bit later at ~2:16 the MasterLog [2] started to log sched daemons to
>> be reaped/to die(?) exiting with code 44. Restarts of the schedd went on
>> for ~20m after which the MasterLog went silent until the service got
>> restarted.
>>
>> I found so far no information on the schedd error code 44 but only for
>> the shadow [3].
> 
> 
> Hi Thomas,
> 
> For any/all HTCondor daemons, exit code 44 means that the HTCondor
> daemon in question failed to write to write or rotate a log file.  I.e.,
> the write() system call failed.  Look at the filesystems holding the
> paths for LOG, LOCK, and/or EVENT_LOG via
>   condor_config_val LOG LOCK EVENT_LOG
> 
> The most likely cause for this is the filesystem(s) in question were
> full, or fell offline (i.e. NFS mount failed if they are not local).
> BTW, I would encourage your to have these directories on local disk if
> they are not already.
> 
> Hope the above helps
> Todd
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature