[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Fair-share limits reached while there are whole machines are available and idle jobs



Hello,

I recently noticed something strange with our condor pool. There are a lot of idle jobs in the queue and yet there are nearly equally many available slots. Whole machines even, where there are no jobs running, and yet
none of the idle jobs get allocated one of these empty slots.

After digging around in the negotiator logs and classads, it seems there are a lot of jobs that are being rejected based on fair-share limits. There are many more rejections happening than matches, and as far as I can tell they are due to fair-share limits. From the LastNegotiationCycleSubmittersShareLimit* classsad, it seems like all the ones being rejected are in the list provided from it.

These jobs are all getting submitted from the default <none> group which has the surplus flag set. In the negotiator log it displays "Group <none> is using its quota 2629 - halting negotiation".

Could it be something wrong with user prio and quotas disallowing slot matches? Also wonder if maybe it's related to bug fixed in 8.7.10 (https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6714) (https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6750)

Thanks for any help or thoughts,

Alec