[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Large number of shadow exceptions due to Connection time out



Hello,

On 11/22/2010 02:43 AM, Carsten Aulbert wrote:
we are currently seeing a large number of shadows dying due to connection time
outs. These are almost certainly caused by our network having a couple of
issues right now, however, is there any setting we can tell Condor or the
Linux kernel to mitigate this issue a bit as a short time solution before we
can weed out the networking problems at its root?

I believe setting "JobLeaseDuration" in your condor_config is what you might want.

Example of a 24-hour job lease duration:
JobLeaseDuration                = 86400

From the manual:
JobLeaseDuration - The number of seconds set for a job lease, the amount of time that a job may continue running on a remote resource, despite its submitting machine’s lack of response. See section 2.14.4 for details on job leases.

A link to 2.14.4:
http://www.cs.wisc.edu/condor/manual/v7.5/2_14Special_Environment.html#SECTION003144000000000000000


-Mick