[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] busy&load calculation problems



On 3/6/06, Andrey Kaliazin <A.Kaliazin@xxxxxxxxxxx> wrote:
> Dear all,
>
> We have various windows boxes running Condor 6.7.10-16 and most of
> them, but not all have this repeating error in their StartLog file -
>
> ...
> 3/6 17:32:30 loadavg thread died, restarting. (exit code=2)
> 3/6 17:32:35 no loadavg samples this minute, maybe thread died???
> ...
>
>
> I suppose that some misconfiguration of WMI subsytem leads to those
> errors appearing, which, in turn, leads to the wrong Condor conclusions -
> workstations appear to be idling in terms of CPU load, which is not good -
>
> $ condor_status -l |grep 'oad\|usy'
> ...
> CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
> CondorLoadAvg = 0.000000
> LoadAvg = 0.000000
> TotalLoadAvg = 0.000000
> TotalCondorLoadAvg = 0.000000
> CpuBusyTime = 0
> CpuIsBusy = FALSE
> Activity = "Busy"
> ...
>
> Who knows how to fix it? Any help is appreciated.

Are these 64bit windows machines?

The WMI query stuff has been broken in the 6.6.x series for a while
(at least on AMD based 64bit windows machines). I have no idea if it's
fixed on the 6.7.x series.

Note that on older (sorry I forget the specific release) 6.6.x
machines suffered from a serious memory and handle leak in these
situations (sufficient to kill the condor sunsystem after a few
weeks). Again I would guess that the latest 6.7 includes the fix for
this.

Sorry I can't be more help than that

Matt