Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor nodes vanish temporarily
- Date: Sat, 5 Mar 2011 09:50:54 -0600
- From: Daniel Forrest <dan.forrest@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor nodes vanish temporarily
Felix Wolfheimer wrote:
> I'm using Condor 7.4.4 on a pool of four machines running Windows Server
> 2003 R2. When I look at the machine status using condor_status I can see
> that the machines vanish temporarily from the list and come back some
> minutes later (takes up to about 30 min.). The machines are up and
> running 7x24h and the are always connected to a internal LAN and can see
> (ping, nslookup etc.) each other all the time.
>
> I've looked at the collector logfile and found the following statements
> which seem to be related to the issue:
>
> *** Removing stale ad <my_computer_name>
>
> where my_computer_name is the name of the machine which vanishes from
> the list.
>
> As the machines have two network interfaces I tried to explicitly bind
> Condor to one of them using NETWORK_INTERFACE = ... but that did not
> change anything. The firewalls of the machines are also switched off.
>
> Has anyone an idea what could be the issue?
This sounds familiar. On the pool machines, try setting this:
STARTD_DEBUG = D_COMMAND D_NETWORK
MASTER_DEBUG = D_COMMAND D_NETWORK
in "condor_config". If this clears up the problem, I can go into more
detail as to what the problem might be and why this "fixes" it.
--
Dan