We have a Condor setup that's evolved over time from 6.4.x via 6.5.x to 6.6.x
(6.6.3 currently). Ever since the change to 6.6 happened condor_collector on
the master node (which is also a view host) has become a total resource hog,
grabbing the CPU 100% of the time.
Checking the collector logs with the debugging info, I see a constant stream
of messages of this type:
10/7 11:16:42 Got INVALIDATE_STARTD_ADS
10/7 11:16:42 **** Removing stale ad: "< x.x.x.edu , 192.168.0.22 >"
10/7 11:16:42 (Invalidated 1 ads)
10/7 11:16:42 (Invalidated 0 ads)
10/7 11:16:42 StartdAd : Updating ... "< x.x.x.edu , 192.168.0.22>"
10/7 11:16:42 (Could not get startd's private ad)
About 330 of these actions are recorded every second. This was not an issue
under earlier Condor installations. This particular one has not been used for
some time now due to this problem but needs to become usable again. I would
appreciate any help/ideas.
load average: 1.22, 1.06, 0.75