[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] condor_shadow timeout when loosing contact with startd



Hello, I've noticed that when condor_shadow looses contact with
condor_startd on an execute machine, it typically takes roughly 2 hours
for the shadow to notice that the startd is gone and cause an exception,
thereby putting the job back into the queue.  My question is, can this
timeout be configured?  I'm on a very reliable network (internal lan)
and don't need to allow for 2 hours for network conditions to recover
such that they can start communicating again.  In other words, if they
can't communicate, it's likely that the startd is dead (along with the
machine).

I grepped for HOUR in condor_config and didn't find anything.  I also
looked for things relating to SHADOW and STARTD, and didn't come up with
anything.  If anyone knows if there's any way to control this behaviour,
please share.  All Condor daemons involved are v6.6.0.

Thanks,
Geoff

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>