[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Could not get startd's private ad



Constantinos Evangelinos wrote:

We have a Condor setup that's evolved over time from 6.4.x via 6.5.x to 6.6.x (6.6.3 currently). Ever since the change to 6.6 happened condor_collector on the master node (which is also a view host) has become a total resource hog, grabbing the CPU 100% of the time.

Checking the collector logs with the debugging info, I see a constant stream of messages of this type:

10/7 11:16:42 Got INVALIDATE_STARTD_ADS
10/7 11:16:42           **** Removing stale ad: "< x.x.x.edu , 192.168.0.22 >"
10/7 11:16:42 (Invalidated 1 ads)
10/7 11:16:42 (Invalidated 0 ads)
10/7 11:16:42 StartdAd     : Updating ... "< x.x.x.edu , 192.168.0.22>"
10/7 11:16:42   (Could not get startd's private ad)

About 330 of these actions are recorded every second. This was not an issue under earlier Condor installations. This particular one has not been used for some time now due to this problem but needs to become usable again. I would appreciate any help/ideas.
load average: 1.22, 1.06, 0.75


The command 'uptime' for ex. gives you three load averages (depending on the time span to compute the mean)

-> load average: 1.22, 1.06, 0.75

From 'man uptime' :
      ... average number of jobs in the run queue over the last 1, 5 and
      15 minutes

For well equilibrated (working at full capacity for a long time) hyperthreading dual-Xeon, it gives roughly 4,4,4 (just one job in queue for each virtual CPU)

Now I do not know if the Condor view of load average is standardized between 0-1, to cope with the differences between what is a 'normal healthy state' for a moni-, bi-, quad-machine. Just curious ;-)

Cheers,

ALain

--
------------------------------------------------------------
Dr Alain EMPAIN <alain.empain@xxxxxxxxx> <alain@xxxxxxxxxx>
Bioinformatics, Molecular Genetics, Fac. Med. Vet., University of Liège, Belgium
Bd de Colonster, B43 B-4000 Liège (Sart-Tilman)
WORK: +32 4 366 3821 FAX: +32 4 366 4122
HOME: rue des Martyrs,7 B- 4550 Nandrin +32 85 51 23 41 GSM: +32 497 70 17 64
--------------------------------------------------------------------------------
"I worry about my child and the Internet all the time, even though she's
too young to have logged on yet. Here's what I worry about. I worry that
10 or 15 years from now, she will come to me and say 'Daddy, where were
you when they took freedom of the press away from the Internet?'" --Mike Godwin, Electronic Frontier Foundation --------------------------------------------------------------------------------