I’m running a highly virtualized environment with windows 2003 clients. We need to begin running with windows 2008 clients, but the problem that we are seeing is that the 2008 VMs are dropping off from condor_status (and thus not accepting jobs) after an idle time of a few hours. If you RDP into the system then it re-establishes a network connection for a few hours again before dropping off again.
I know that there are a lot of possible issues, but I am curious if there is anything established or well documented regarding this issue.
So the moving parts…
Dell R610 Servers deployed with Broadcom 1Gb adapters
VMware ESXi 4.1
Windows 2008 Server
Things I have tried or am in the process of trying:
Ensure Windows Firewall isn’t causing issues (will disable)
Check Windows 2008 Power Management settings
Try running on a system that has Intel network adapters (we mostly use Broadcom)
We have tried both 7.2.4 and 7.6.2 versions of condor.
Investigate Broadcom network adapter settings in VMware.
Try enabling Wake On LAN functionality in VMware.
Set VMware CPU reservation to a non-zero value (might be idling VM to 0% usage)
What I am worried about is that the VMware network driver is doing some ‘magic’ and intercepting ping and other network requests and not passing them to the VM so that it can attempt to idle/suspend the VM.
Thanks for your time everyone!