Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Increasing shadow->schedd timeout

Date: Mon, 22 Feb 2016 13:00:22 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Increasing shadow->schedd timeout

On 2/22/2016 12:19 PM, Vladimir Brik wrote:

Hello.

We are having issues with our network filesystem that causes
condor_schedd and condor_shadow to sometimes hang for long periods of
time (I suspect when they try to update job logs), which I think causes
unnecessary job restarts.

Is it possible to increase the timeout for shadow->schedd connections?

Yes. You will want to use knob SHADOW_NOT_RESPONDING_TIMEOUT. See thebelow entries cut-n-pasted from section 3.3 of the HTCondor Manual.


best regards,
Todd


NOT_RESPONDING_TIMEOUT

When an HTCondor daemon's parent process is another HTCondordaemon, the child daemon will periodically send a short message to itsparent stating that it is alive and well. If the parent does not hearfrom the child for a while, the parent assumes that the child is hung,kills the child, and restarts the child. This parameter controls howlong the parent waits before killing the child. It is defined in termsof seconds and defaults to 3600 (1 hour). The child sends its alive andwell messages at an interval of one third of this value.


<SUBSYS>_NOT_RESPONDING_TIMEOUT

Identical to NOT_RESPONDING_TIMEOUT, but controls the timeout for aspecific type of daemon. For example, SCHEDD_NOT_RESPONDING_TIMEOUTcontrols how long the condor_schedd's parent daemon will wait withoutreceiving an alive and well message from the condor_schedd beforekilling it.

References:
- [HTCondor-users] Increasing shadow->schedd timeout
  - From: Vladimir Brik

Prev by Date: [HTCondor-users] Increasing shadow->schedd timeout
Next by Date: [HTCondor-users] stopping "Schedd restart report"
Previous by thread: [HTCondor-users] Increasing shadow->schedd timeout
Next by thread: [HTCondor-users] stopping "Schedd restart report"
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Increasing shadow->schedd timeout