[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] condor_shadow timeout when loosing contact withstartd



Well, basically I'd like to use condor in a semi real-time application. 
So I'd like to get the two hours condor takes to requeue a job onto a
new box when there's a failure down to maybe 20 minutes.  To reproduce
the 2 hour timeout behaviour, I'm simply running a job then turning off
the execute box (to simulate a crash).

Indeed, the STARTER_UPDATE_INTERVAL hasn't decreased the timeout.

--Geoff


On Mon, 2004-01-26 at 16:14, Zachary Miller wrote:
> On Mon, Jan 26, 2004 at 02:55:44PM -0600, Geoff Lovett wrote:
> > Ah, ok :)  I'm trying to replicate the problem, and so far, 20 minutes
> > into it, it's still hung.  I'll use STARTER_UPDATE_INTERVAL instead of
> > SHADOW_UPDATE_INTERVAL and give it a shot.
> 
> i don't think this is going to fix your root problem though.  this update
> interval simply controls how often the job stats (memory usage, run time,
> etc.) get updated from the starter to the shadow.
> 
> what is actually happening in your case?  is the starter getting killed,
> is hung, or something else?
> 
> 
> cheers,
> -zach

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>