[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Groups with weighted slots in 7.6.9



Hi William,

I agree that it looks like #2958, and that fix went in at 7.6.7. 

Can you describe any configuration related to GROUP_AUTOREGROUP[_*]
and/or GROUP_ACCEPT_SURPLUS[_*]?


On Tue, 2012-09-11 at 16:37 -0400, William Strecker-Kellogg wrote:
> Hi all,
> 
> There is an interesting problem I'm having, related to the use of group
> quotas and weighted slots (similar to ticket #2958). While experimenting
> I came across something that looks like a bug similar to what was
> supposed to be addressed #2958.
> 
> The setup involves a large number of machines with three 8-core slots
> each (about 2000 cores total). When using group quotas I see the
> following behavior:
> 
> First, I submit 20 jobs matching only those slots (no other contention,
> plenty of free slots) each with "request_cpus = 8" and belonging to an
> AccountingGroup with a quota of >2000. I see the following (grep for
> "group_atlas.prod.mp" in the attached logs for the full story), the
> first two jobs match, then the rest are rejected with "group quota
> exceeded" warnings. It appears that the groupQuota it sees is 20 (the
> number of idle jobs), and after the first match it uses 8, the second
> and 16 are used, then the next fails because "pieLeft" is 4.0. It is as
> if the weights are being applied only after it matches and are not
> counted for in it's match-making algorithm limit (pieLeft is 20.0 at the
> start, should be 160.0?)
> 
> It is reproducible with numbers other than 20 jobs and 8-cores; with <N>
> k-core jobs in a queue up to floor(N/k) jobs will match before exceeding
> the quota.
> 
> The workaround I found is to set "SlotWeight=1" on the 8-core slots,
> which makes things work great except for the accounting (which doesn't
> matter for what we are doing right now).
> 
> We may be going to 7.8 soon so it may not be an issue if it is fixed
> then, but in case it isn't I figured I'd report on my findings anyway.
> 
> Thanks,
> Will Strecker-Kellogg
> RACF/BNL
> 
> 
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/