[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor nodes vanish temporarily



Hi, 

I tried your suggestion and after setting the keys in the condor_config
file on the pool my machines do not vanish anymore from the list. 

Thank you very much for your help! 



Am Samstag, den 05.03.2011, 09:50 -0600 schrieb Daniel Forrest:
> Felix Wolfheimer wrote:
> 
> > I'm using Condor 7.4.4 on a pool of four machines running Windows Server
> > 2003 R2. When I look at the machine status using condor_status I can see
> > that the machines vanish temporarily from the list and come back some
> > minutes later (takes up to about 30 min.). The machines are up and
> > running 7x24h and the are always connected to a internal LAN and can see
> > (ping, nslookup etc.) each other all the time.
> > 
> > I've looked at the collector logfile and found the following statements
> > which seem to be related to the issue:
> > 
> > *** Removing stale ad <my_computer_name>
> > 
> > where my_computer_name is the name of the machine which vanishes from
> > the list.
> > 
> > As the machines have two network interfaces I tried to explicitly bind
> > Condor to one of them using NETWORK_INTERFACE = ... but that did not
> > change anything. The firewalls of the machines are also switched off.
> > 
> > Has anyone an idea what could be the issue?
> 
> This sounds familiar.  On the pool machines, try setting this:
> 
> STARTD_DEBUG = D_COMMAND D_NETWORK
> MASTER_DEBUG = D_COMMAND D_NETWORK
> 
> in "condor_config".  If this clears up the problem, I can go into more
> detail as to what the problem might be and why this "fixes" it.
>