[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] LoadAvg calculation/bug?



I'm not seeing the same results on my systems.  I found a four-core node
with three slots claimed and busy and some sort of non-Condor load on
it:

carrion $ condor_status -l nauro-10 | grep -i loadavg | grep -vi start |
grep -vi busy
CondorLoadAvg = 1.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
CondorLoadAvg = 0.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
CondorLoadAvg = 1.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000
CondorLoadAvg = 1.000000
LoadAvg = 1.000000
TotalLoadAvg = 4.000000
TotalCondorLoadAvg = 2.990000


This machine is running 32bit Ubuntu with the RHEL5 Condor 7.4.2
tarball.

- dave


On Thu, 2010-06-10 at 16:05 -0700, kristian kvilekval wrote:
> Thanks for the pointer, but TotalLoadAvg still seems to be based on
> Condor Jobs only.  What we are look to do is not start jobs on nodes
> that have any recent load or jobs on them.. As not all people will use
> condor to start their jobs we need to use the actual kernel load
> average.   Note below that Condor is reporting no load for b0003, but
> really there is a load average of 2..
> 
> 
> $ condor_status -l b0003| fgrep TotalLoad
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> TotalLoadAvg = 0.0
> (bqenv)bqphytomorph@claw$ ssh b0003
> bqphytomorph@b0003$ w
>  23:02:40 up 78 days, 43 min,  2 users,  load average: 2.17, 1.84, 1.76
> USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> diana    pts/1    nail00           Wed22    1:18m  1:52m  1:52m /cluster/home/matlab_2009a_x86_64/bin/glnxa64/
> bqphytom pts/3    nail99           23:02    0.00s  0.00s  0.00s w
> 
> 
> 
> On Thu, 2010-06-10 at 17:23 -0500, David Kotz wrote:
> > Rather than LoadAvg, I think you should use target.TotalLoadAvg.
> > LoadAvg refers to the load average of a single slot on a multicore
> > machine, and without specifying target.TotalLoadAvg, the expression
> > might (I'm guessing) actually look at the load average of a slot on the
> > submit machine.  Condor has some disambiguation built in, but I like to
> > specify, just in case.
> > 
> > - dave
> > 
> > 
> > 
> > On Thu, 2010-06-10 at 14:46 -0700, kgk wrote:
> > > Condor: 7.5.2
> > > Debian Linux distribution AMD64
> > > Nodes: 64
> > > 
> > > We have shared cluster where users may log in and start jobs manuall.
> > > We would prefer
> > > that nodes/slots with a high local  load average be avoided for condor
> > > jobs.
> > > We have added Rank = (100 - LoadAvg) to our standard submit scripts.
> > > However,  using condor_status I see many nodes (already being used by
> > > others)
> > > show a LoadAvg of 0.0 meaning they are scheduled with equal rank.
> > > 
> > > In some condor documents it seems that LoadAvg is determined by the
> > > submitted condor jobs and in others it seems to be the true machine
> > > load reported by the OS.
> > > 
> > > 1. Is LoadAvg supposed to be the kernel reported load average?
> > > 2. If so, then I believe there is a bug
> > > 3. If not then how should I select for machine with no or very low
> > > load?
> > > 
> > > Thanks,
> > > Kris
> > > _______________________________________________
> > > Condor-users mailing list
> > > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > > subject: Unsubscribe
> > > You can also unsubscribe by visiting
> > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > > 
> > > The archives can be found at:
> > > https://lists.cs.wisc.edu/archive/condor-users/
> > 
> > 
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/