[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs vacating reason





On Fri, Dec 10, 2010 at 7:44 AM, Erik Aronesty <erik@xxxxxxx> wrote:
In case anyone wants to know the solution.    The symptom is processes dying after exactly 20 minutes.  That's the clue that ALIVE's aren't getting through.

Removing the entry in /etc/hosts that mapped "f0.<mydom>.local" to 127.0.0.1 on the schedd machine (which was also the collector/negotiator... so I'm not sure it's dependent on schedd) worked immediately to allow ALIVES to go through.  

I ran into this /etc/hosts problem a while back too, but with different symptoms.

Another solution is to set the NETWORK_INTERFACE parameter, or use BIND_ALL_INTERFACES = True, as noted here:
https://lists.cs.wisc.edu/archive/condor-users/2007-August/msg00116.shtml

--
David Brodbeck
System Administrator, Linguistics
University of Washington