[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] offline compute nodes and Rooster





On 10/17/10 10:48 AM, Paul Haldane wrote:

2. Hacked together a script using condor_advertise to publish ADS for offline machines.  This works and with the sensible setting for ROOSTER_UNHIBERNATE leads to hibernating machines being woken up by Rooster to service jobs.   Remaining problem was that the ADS disappeared after about 20 minutes.  Bit more poking around took me back to Ian's message to the list (https://lists.cs.wisc.edu/archive/condor-users/2010-January/msg00148.shtml).  Adding ClassAdLifetime to the published AD seems to have done the trick (at least the test machine has stayed visible for over 25 minutes).

I've just looked at the implementation of OFFLINE_EXPIRE_ADS_AFTER. Strangely, it only has any effect if the ad is advertised via the command UPDATE_STARTD_AD_WITH_ACK and Offline is not set to true in the ad that is sent to the collector. The collector then sets Offline=true and overrides a bunch of other stuff too, including ClassAdLifetime. In all other cases, ClassAdLifetime is just preserved as is in the ad.

This certainly doesn't match the documented behavior. I'm looking into what should be done about it.

--Dan