[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Non-reporting of Windows nodes

I raised this some time ago, but to my knowledge it is not yet resolved.

I have a mixed pool (6.6.6 and 6.6.7; Windows XP, Windows 2000 and various Linux).
The central node runs Whitebox Linux (effectively a RH9 clone).

Periodically, the Windows nodes (I have 4 at present) vanish from the 
condor_status reports, often for hours or days. They usually jump back into 
action if I log on to them and run condor_status or something like that.
Once they vanish from the reports, I cannot remotely "prod" them with condor.
They are not hibernating, nor when they vanish are they even sleeping/suspended
or running screensavers. In fact one of them is my desktop and I can be 
typing away running condor jobs (over a remote login to the central node)
and then suddenly find that my machine is
no longer in the pool.

The machines usually eventually report back in, but it is a bit of a pain
when showing new users what the pool can do when they keep doing this.

I have no back-fill jobs.

Any ideas?

All nodes are execute+schedule

This problem was also present when the whole pool was 6.6.6