[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] condor_shadow timeout when loosing contact with startd



On Mon, Jan 26, 2004 at 01:26:45PM -0600, Geoff Lovett wrote:
> Hello, I've noticed that when condor_shadow looses contact with
> condor_startd on an execute machine, it typically takes roughly 2 hours
> for the shadow to notice that the startd is gone and cause an exception,
> thereby putting the job back into the queue.  My question is, can this
> timeout be configured?

i think you mean the condor_starter and not the condor_startd.  the
starter is the daemon which launches and manages the job on the execute
machine.

by default it sends an update every 20 minutes and then shadow should
except after 3 missed updates, i.e. one hour.  i'm not sure why it is
taking 2 hours for you... maybe i'm wrong about something.

anyhow, this is configurable via the condor_config:

  SHADOW_UPDATE_INTERVAL = 300
  # 300 seconds == 5 minutes


cheers,
-zach

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>