[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] VMs being cleaned up/removed



Hi Alain

Thanks for the fast response.  Yes, 6.7.3 or 6.7.6 sounds like it might
have the feature fix we need.  I am reluctant to go the TCP update route
as currently this is a test configuration - soon we will have 220 nodes
behind the CM so I guess that is getting towards a large pool.

Leslie

On Tue, 26 Apr 2005, Alain Roy wrote:

> 
> >I am running Condor version 6.7.2 on Scientific Linux 3.0.3
> >with 11 dual-cpu worker nodes with 4 VMs each.  There are three schedulers
> >and the CM is using kerberos authentication.
> >
> >I notice that fairly often, VMs will be "cleaned up" during housecleaning
> 
> Try a newer version of Condor.
> 
> Condor 6.7.3 has:
> 
> >This release contains all the bug fixes from the 6.6 stable series upto 
> >and including version 6.6.7, and some of the fixes that will be included 
> >in version 6.6.8. The bug fixes in version 6.6.8 that were not included in 
> >version 6.7.3 are listed in a seperate section of the 6.6.8 version history.
> 
> 
> Condor 6.6.8 has:
> >Fixed issues that would cause condor_ startd to ``disappear'' from the 
> >pool because of dropped machine ad updates. This fix applies to all 
> >platforms, but the symptoms were exhibited predominantly on Windows machines.
> 
> And this is one of the bug fixes included in 6.7.3.
> 
> So there is a decent shot that this problem will be fixed by upgrading to 
> Condor 6.7.6, which is the most recent Condor release in the 6.7.x series.
> 
> The condor_startd advertises each virtual machine by sending a UDP update 
> to the collector. In some busy networks, these updates can be lost. If 
> upgrading doesn't work for you, you can tell Condor to use TCP instead. We 
> don't use this as a default in order to avoid having hundred of 
> simultaneous open TCP connections on large pools, but it's certainly 
> reasonable for your small pool. You can learn how to configure this in the 
> manual:
> 
> http://www.cs.wisc.edu/condor/manual/v6.7/3_11Setting_Up.html#sec:tcp-collector-update
> 
> Basically, you do "UPDATE_COLLECTOR_WITH_TCP = TRUE" in your config file.
> 
> I hope this helps. If it doesn't, please do let us know. It's not a feature 
> that machines disappear from your pool!
> 
> -alain
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 

-- 
   ,-~~-.___.       ________________________________________________
  / |  '     \      groer@xxxxxxxxxxxxxxxxxxx  Department of Physics
 (  )        0           Tel: +1-416-978-2959  University of Toronto
  \_/-, ,----'           Fax: +1-416-978-8221  60 St. George Street
     ====           //                         Toronto, ON M5S 1A7
    /  \-'~;    /~~~(O)                        Canada
   /  __/~|   /       |  Office: McLennan Physics Lab Room 911
 =(  _____| (_________|  http://home.fnal.gov/~groer
     Leslie S. Groer