[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users



Hi,

fair share is built in in condor you can tweak to a certain amount using the slotweight etc. I kjust miss the explicit possibility to punish very short job runtimes by making them more costly for the user ...

Best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Luehring, Frederick C" <luehring@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 17. August 2023 17:29:49
Betreff: Re: [HTCondor-users] manipulate ranking/priority of very-short-jobs-users

Hey Y'all,

Is there a built-in method for condor to apply fair-share scheduling:

https://en.wikipedia.org/wiki/Fair-share_scheduling

The ATLAS Panda implementation does something along the lines of a fair-share 
algorithm using numbers of jobs submitted instead of CPU. When a user who has 
not submitted a job in over a week starts submitting new jobs, his/her jobs get 
the highest user priority of 10000. As the user submits additional jobs they are 
assigned lower and lower priority and I have seen users who submit gazillions of 
jobs get down to negative priority below -5000. Eventually Panda will move the 
user's jobs into a throttled state which is a sort of circuit breaker that 
temporarily prevents the user's new jobs from starting. The user's submission 
priority recovers because the incremental priority reduction caused by 
previously submitted jobs is removed 7 days after the job submission. This sort 
of approach seems like what is needed. The system could increase the priority of 
a limited number short jobs to allow users who are not abusing the queuing 
system to quickly run limited number of short test jobs when developing the code.

Fred

On 8/17/23 2:42 AM, Jeff Templon wrote:
> Thanks!  I didnât know about this stuff.
> 
>> On 16 Aug 2023, at 17:28, Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>>
>>> Another issue to take into account is that a high start rate can put pressure on other systems, like shared file systems.
>>
>> 	We already have a few throttles for high overall start rates.
> 
> Usually the problem is not so much high overall start rates, here itâs usually one user who generates 90% of the high start rate.  I really donât like making everyone suffer because of one clumsy user.  OTOH the other users might let this user know how clumsy he/she is - peer communication tends to be effective.
> 
> JT
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Frederick Luehring Indiana U luehring@xxxxxx       +1 812 855 1025  IU
http://cern.ch/Fred.Luehring Fred.Luehring@xxxxxxx +41 22 767 11 66 CERN


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/