[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] LoadAvg calculation/bug?



Thanks for the pointer, but TotalLoadAvg still seems to be based on
Condor Jobs only.  What we are look to do is not start jobs on nodes
that have any recent load or jobs on them.. As not all people will use
condor to start their jobs we need to use the actual kernel load
average.   Note below that Condor is reporting no load for b0003, but
really there is a load average of 2..


$ condor_status -l b0003| fgrep TotalLoad
TotalLoadAvg = 0.0
TotalLoadAvg = 0.0
TotalLoadAvg = 0.0
TotalLoadAvg = 0.0
TotalLoadAvg = 0.0
(bqenv)bqphytomorph@claw$ ssh b0003
bqphytomorph@b0003$ w
 23:02:40 up 78 days, 43 min,  2 users,  load average: 2.17, 1.84, 1.76
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
diana    pts/1    nail00           Wed22    1:18m  1:52m  1:52m /cluster/home/matlab_2009a_x86_64/bin/glnxa64/
bqphytom pts/3    nail99           23:02    0.00s  0.00s  0.00s w



On Thu, 2010-06-10 at 17:23 -0500, David Kotz wrote:
> Rather than LoadAvg, I think you should use target.TotalLoadAvg.
> LoadAvg refers to the load average of a single slot on a multicore
> machine, and without specifying target.TotalLoadAvg, the expression
> might (I'm guessing) actually look at the load average of a slot on the
> submit machine.  Condor has some disambiguation built in, but I like to
> specify, just in case.
> 
> - dave
> 
> 
> 
> On Thu, 2010-06-10 at 14:46 -0700, kgk wrote:
> > Condor: 7.5.2
> > Debian Linux distribution AMD64
> > Nodes: 64
> > 
> > We have shared cluster where users may log in and start jobs manuall.
> > We would prefer
> > that nodes/slots with a high local  load average be avoided for condor
> > jobs.
> > We have added Rank = (100 - LoadAvg) to our standard submit scripts.
> > However,  using condor_status I see many nodes (already being used by
> > others)
> > show a LoadAvg of 0.0 meaning they are scheduled with equal rank.
> > 
> > In some condor documents it seems that LoadAvg is determined by the
> > submitted condor jobs and in others it seems to be the true machine
> > load reported by the OS.
> > 
> > 1. Is LoadAvg supposed to be the kernel reported load average?
> > 2. If so, then I believe there is a bug
> > 3. If not then how should I select for machine with no or very low
> > load?
> > 
> > Thanks,
> > Kris
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/