[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs repeatedly evicted after 30 mins



Thanks Jaime (and others who replied).

The problem was indeed to do with UDP being blocked.
Our original 9600-9700 port range was expanded to
9000-10000 but ONLY for TCP. When it was CORRECTLY done
to include UDP the eviction problems disappeared.

Thanks.

Cheers

Greg

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Jaime Frey
> Sent: Friday, 3 March 2006 4:05 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Jobs repeatedly evicted after 30 mins
> 
> 
> On Mar 1, 2006, at 9:19 PM, <Greg.Hitchen@xxxxxxxx>  
> <Greg.Hitchen@xxxxxxxx> wrote:
> 
> > We have the situation where a user submits ~10 jobs,
> > all of which should run for ~5 hours. Many/most of
> > them get repeatedly evicted after 30 mins and requeued.
> > Below are the relevent logs from the submitting and execute 
> machines 
> > for one particular instance.
> >
> > I have tested this myself with different jobs and the eviction is 
> > ALWAYS ALMOST EXACTLY a few seconds (20?) under 30 minutes.
> >
> > The line in the START LOG:
> >
> > 3/1 05:57:16 State change: claim timed out (condor_schedd gone?)
> >
> > seems to be the relevant one?
> >
> > ALL of the evictions (for different execute machines and different 
> > jobs, same submit machine) occur at 30 minutes.
> 
> While a job is running, the schedd periodically sends an alive  
> message to the startd via UDP. If the startd doesn't receive any  
> alive messages for a while, it will kill the claim (and the 
> job). The  
> default is for the schedd to send an alive every 5 minutes and the  
> startd will kill the job if it misses 6 alives, which matches 
> your 30  
> minutes.
> 
> So it appears that UDP packets aren't making it from your submit  
> machine to your execute machines.
> 
> +--------------------------------+-----------------------------------+
> |           Jaime Frey           | I used to be a heavy gambler.     |
> |       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
> | http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
> +--------------------------------+-----------------------------------+
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx 
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>