[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Not enough jobs to fill batch (submitter limit exceeded)



On 5/30/2018 5:57 PM, Greg Thain wrote:
> On 05/30/2018 04:15 AM, Petr Vokac wrote:
>> I'm trying to figure out why our negotiator is not able to schedule
>> sufficient number of jobs and "Rejected" messages in NegotiatorLog
>> file tells me "submitter limit exceeded". Could this be caused by
>> worker node "SLOT_WEIGHT = Cpus * ScalingFactorHEPSPEC06" which
>> averages to ~ 10 and
>> negotiator "NEGOTIATOR_USE_SLOT_WEIGHTS = True"? I was trying to look
>> what comes in Matchmaker::SubmitterLimitPermits and numbers seemed to
>> me a bit suspicious with respect to average match_cost ~ 10 (slot
>> weight).
>
> Note that when the negotiator is computing submitter limits and quota
> it is before it has matched the request to any particular machine.Â
> Therefore, it doesn't have a machine ad to find a
> ScalingFactorHEPSPEC06 from, and this is probably not being evaluated
> to the result you would like.

OK, now negotiator behavior makes a bit more sense to me.

I don't really need scaled CPU power during negotiation, but with
SLOT_WEIGHT scaled by cpu power I was trying to achieve more fair
fairshare, because our cluster is not homogeneous and I would like to
include not just number of used Cpus in fairshare calculation.

Do you think that "SLOT_WEIGHT = Cpus *
ifThenElse(isUndefined(ScalingFactorHEPSPEC06), 1,
ScalingFactorHEPSPEC06))" could behave better for negotiation and still
preserve more correct fairshare calculation?

Petr