[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor nodes vanish temporarily



Felix Wolfheimer wrote:

> I'm using Condor 7.4.4 on a pool of four machines running Windows Server
> 2003 R2. When I look at the machine status using condor_status I can see
> that the machines vanish temporarily from the list and come back some
> minutes later (takes up to about 30 min.). The machines are up and
> running 7x24h and the are always connected to a internal LAN and can see
> (ping, nslookup etc.) each other all the time.
> 
> I've looked at the collector logfile and found the following statements
> which seem to be related to the issue:
> 
> *** Removing stale ad <my_computer_name>
> 
> where my_computer_name is the name of the machine which vanishes from
> the list.
> 
> As the machines have two network interfaces I tried to explicitly bind
> Condor to one of them using NETWORK_INTERFACE = ... but that did not
> change anything. The firewalls of the machines are also switched off.
> 
> Has anyone an idea what could be the issue?

This sounds familiar.  On the pool machines, try setting this:

STARTD_DEBUG = D_COMMAND D_NETWORK
MASTER_DEBUG = D_COMMAND D_NETWORK

in "condor_config".  If this clears up the problem, I can go into more
detail as to what the problem might be and why this "fixes" it.

-- 
Dan