[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Explaining the Claimed + Idle state



> On Mon, Feb 07, 2005 at 03:25:21PM -0500, Ian Chesal wrote:
> > I'm seeing a fair number of VM's in my system reporting 
> "Claimed + Idle"
> > for a long, long period of time. What can bring about this state? 
> > There are no starters on these machines. Condor does not 
> appear to be 
> > actually running anything. Yet they are claimed and idling and not 
> > doing any work.
> > 
> 
> A condor_status -l to one of those machines should show what 
> schedd has it claimed - it will be the ClientMachine 
> attribute. I would be curious what the schedd is doing.

The are all claimed by the same machine ttc-eahmed3 -- and this machine
is showing LOTS of condor_write errors in its SchedLog -- I've restarted
condor on the machine (with net stop/net start). No condor_write errors
in the last 5 minutes, but I'm not holding my breath. This problem is
far from solved with a reboot.

This is the third machine at our site to get this "plague" of
condor_write errors for the schedd. It's no longer isolated to two
machines in two cubicles. See condor-admin bug report #11869. This is
beginning to worry me greatly. These machines have full network
connectivity. No dropped pings to these machines. Condor just can't seem
to keep an open port. Happens on both Windows XP and Linux machines.
They are all running 6.7.3 with SEC_DEFAULT_NEGOTIATION = NEVER set to
stop the condor_startd memory leak bug in 6.7.3. Could this be the
problem?

- Ian