[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] imbalance question



Hi Lee,

HTCondor will fill a machine before moving onto the next one. However, if you want to spread your jobs over the entire cluster, you can easily change this behavior. Just read the following page on our wiki:

    https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToFillPoolBreadthFirst

...Tim

On 3/10/20 6:29 PM, Lee Damon wrote:
I suspect I'm missing something fundamental but it's the end of the work day and my brain is done.

I have a 6-host cluster. The hosts are mostly the same, they're all VMs running the same OS, configured the same (configuration management via puppet*) and they all have the same NFS mount access to the data. The only real difference is how much RAM the hosts have.

Users are submitting jobs and those jobs keep going to the two busiest nodes in the cluster instead of being spread around. I've just tested and see the same behavior.

When I put a requirements = (name of idle host) the job goes to the idle host with no problems. However, if no hostname requirements are set the jobs keep going to the same busy hosts. Oddly, the busiest hosts are the ones with the least available RAM overall.

I was pretty sure condor should be doing a better job of balancing the loads. What am I missing here?

; condor_status
Name                                         OpSys      Arch   State     Activi

slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx   LINUX      X86_64 Unclaimed Idle  
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx   LINUX      X86_64 Unclaimed Idle  
slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx   LINUX      X86_64 Unclaimed Idle  
slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1_7@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Claimed   Busy  
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle  
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle  
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx    LINUX      X86_64 Unclaimed Idle  

               Machines Owner Claimed Unclaimed Matched Preempting  Drain

  X86_64/LINUX       15     0       9         6       0          0      0

         Total       15     0       9         6       0          0      0

; ssh chrusm0 uptime ; ssh chrusm1 uptime ; ssh chrulg0 uptime ; ssh omics0 uptime ; ssh omics1 uptime ; ssh omics2 uptime
 16:24:15 up 21 days, 40 min, 10 users,  load average: 12.07, 13.56, 15.00
 16:24:16 up 20 days, 23:34,  0 users,  load average: 7.20, 7.20, 7.15
 16:24:16 up 21 days, 40 min,  5 users,  load average: 0.00, 0.02, 0.11
 16:24:17 up 76 days,  4:58,  0 users,  load average: 0.02, 1.53, 2.91
 16:24:18 up 76 days,  4:57,  0 users,  load average: 0.00, 0.40, 1.14
 16:24:18 up 76 days,  4:55,  0 users,  load average: 0.00, 0.01, 0.10

thanks,
nomad

* - this is a different lab than the one I emailed about last week. Different hosts and configuration management system.


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736