[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users



Hi,

Fair share can indeed be used for this kind of thing, *when the system is full*.  It works like this (in some cases) on our grid cluster.

Our interactive cluster is rarely completely full.  So reducing start priority does not help in this case - if my start priority is rock bottom, but there is nobody else waiting â you get the picture.  You need something that says âhang on, the stop/start rate for this user is absurd - throttle back new job startsâ (regardless of how full the cluster is or whether other users are waiting).

JT


> On 17 Aug 2023, at 17:29, Luehring, Frederick C <luehring@xxxxxxxxxxx> wrote:
> 
> Hey Y'all,
> 
> Is there a built-in method for condor to apply fair-share scheduling:
> 
> https://en.wikipedia.org/wiki/Fair-share_scheduling
> 
> The ATLAS Panda implementation does something along the lines of a fair-share 
> algorithm using numbers of jobs submitted instead of CPU. When a user who has 
> not submitted a job in over a week starts submitting new jobs, his/her jobs get 
> the highest user priority of 10000. As the user submits additional jobs they are 
> assigned lower and lower priority and I have seen users who submit gazillions of 
> jobs get down to negative priority below -5000. Eventually Panda will move the 
> user's jobs into a throttled state which is a sort of circuit breaker that 
> temporarily prevents the user's new jobs from starting. The user's submission 
> priority recovers because the incremental priority reduction caused by 
> previously submitted jobs is removed 7 days after the job submission. This sort 
> of approach seems like what is needed. The system could increase the priority of 
> a limited number short jobs to allow users who are not abusing the queuing 
> system to quickly run limited number of short test jobs when developing the code.
> 
> Fred
> 
> On 8/17/23 2:42 AM, Jeff Templon wrote:
>> Thanks!  I didnât know about this stuff.
>> 
>>> On 16 Aug 2023, at 17:28, Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>>> 
>>>> Another issue to take into account is that a high start rate can put pressure on other systems, like shared file systems.
>>> 
>>> 	We already have a few throttles for high overall start rates.
>> 
>> Usually the problem is not so much high overall start rates, here itâs usually one user who generates 90% of the high start rate.  I really donât like making everyone suffer because of one clumsy user.  OTOH the other users might let this user know how clumsy he/she is - peer communication tends to be effective.
>> 
>> JT
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> -- 
> Frederick Luehring Indiana U luehring@xxxxxx       +1 812 855 1025  IU
> http://cern.ch/Fred.Luehring Fred.Luehring@xxxxxxx +41 22 767 11 66 CERN
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/