[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] group <none> appeared after upgrade from 7.4.3 to 7.6.2



On Tue, 2011-10-18 at 15:41 -0500, Joe Boyd wrote:

> In the attached log, group group_ntp has a quota of 843.433 with a usage 
> of 679.  Group group_highprio is also under quota.  When we were running 
> 7.4, I would see that free slots would go to those groups first, and 
> then any slots left over would get used by the jobs that weren't 
> submitted with a group that had a quota.

Taking group_ntp as one example, the negotiator log is indicating that
it was unable to match more than 679 jobs to slots.   "condor_q -bet
<job-id>" may shed some light.

As Dan suggested, if your problem is the "overlapping effective pool
problem"(*), then something like this should give you some improvement:

GROUP_QUOTA_ROUND_ROBIN_RATE = 5
# or some other smallish number >= 1

(*) "overlapping effective pool problem" is an awkward term for what
happens if you have two accounting groups who are competing for the same
subset of slots.   For example, if the jobs in group_a and group_b each
could only match the same small subset of slots, then the first group to
negotiate could match those slots and the second group could be starved.

Using GROUP_QUOTA_ROUND_ROBIN_RATE can be expensive, and so you should
seek the largest number that works for you.  It becomes a trade-off
between optimal group loading and expense.