[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] greedy_user ?



Since HTCondor 10.2.0, you could use the -setfloor option to condor_userprio to set a minimum number of cores for a particular user, regardless of fair share. There is also a -setceil option to set a maximum number of cores for a particular user. (Since version 8.9.9)

With your current version, you can't take advantage of this new feature. But, you have something to look forward to.

...Tim

On 9/19/23 14:32, Beaumont, Martin wrote:

Hi Joe,

 

Still canât make it work for some reason.

 

I tried adding âRank = 1000000.0â to the submit file.

condor_q -long does show the new rank of the job, but it still wonât take precedence when all other jobs are Idled.

 

I tried adding âDEDICATED_SCHEDULER_USE_FIFO = Falseâ to the CMâs config file, but nothing changed.

 

I also tried replacing âRANK = Scheduler =?= $(DedicatedScheduler)â on execute node with:

RANK = ("AcctGroupUser" == "pronto" * 1000000000000) + (Scheduler =?= $(DedicatedScheduler))

or simply

RANK = ("AcctGroupUser" == "pronto" * 1000000000000)

and still nothing changed.

 

Finally, I also updated to 10.0.8 from 9.0.17. Other than all 3 jobs waiting longer in IDLE before the first one going back to RUN, it didnât seem to change anything.

 

Somewhere during my tests, I tried with 10.7.0, but then it was the second job that started running instead of the third one when the first got pre-empted. And Iâm not sure if that was because the Schedd suddenly kept crashing or something elseâ

 

Martin

 

From: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Sent: September 15, 2023 4:02 PM
To: Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: greedy_user ?

 

Hi Martin,

 

So, with parallel universe jobs, things will work a little differently because parallel universe runs jobs FIFO. You will need to then assign the job a rank within the submit file. In the submit file you will need to add 'Rank = <floating_point_rank>' and the higher the rank should be run first when trying to match to a machine.

 

Here's the documentation: https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html?#about-requirements-and-rank

 

Best,

Joe


From: Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>
Sent: Friday, September 15, 2023 2:44 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Subject: RE: greedy_user ?

 

Hi Joseph,

 

Thanks for the quick reply!

 

Will this work with parallel universe jobs (DedicatedScheduler)? Because Iâm trying what you said right now and it doesnât seem to work.

 

Job 116 is from user âtestâ.

Job 117 is from user âtest2â.

Job 118 is from user âtest2â and âaccounting_group_user = prontoâ added to submit file.

All 3 jobs are parallel universe.

 

Jobs are set to be Pre-empted after running for 120 seconds (for quick testing purposes): use POLICY: Preempt_if_Runtime_Exceeds( 120 )

Job 116 keeps going back to the running state after being pre-empted.

I would assume Job 118 would start running instead.

 

Is this about FIFO? If so, is there any way to change it?

 

Also, I have dynamic partitionable slots configured:

 

DedicatedScheduler = "DedicatedScheduler@sms1"

STARTD_ATTRS = \$(STARTD_ATTRS), DedicatedScheduler

START = True

SUSPEND = False

CONTINUE = True

PREEMPT = False

KILL = False

WANT_SUSPEND = False

WANT_VACATE = False

RANK = Scheduler =?= \$(DedicatedScheduler)

use FEATURE: PartitionableSlot( 1, auto )

 

 

 

 

Thanks!

 

Martin

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of JOSEPH RYAN REUSS via HTCondor-users
Sent: September 15, 2023 2:32 PM
To: htcondor-users@xxxxxxxxxxx
Cc: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Subject: Re: [HTCondor-users] greedy_user ?

 

Hi Martin!

 

Condor assigns fair share by user, which is not necessarily a human, so let's create a high priority user that a human can utilize so jobs can get high priority. You would need to set 'accounting_group_user = <some_user>' in your submit file to override the default user selected and select <some_user> instead. You can then set the priority of that user by running 'condor_userprio -setfactor <some_user> <priority number>' on the AP you are submitting the job from. 

 

Here's the documentation for reference:

https://htcondor.readthedocs.io/en/latest/man-pages/condor_userprio.html

https://htcondor.readthedocs.io/en/latest/admin-manual/user-priorities-negotiation.html?highlight=accounting_group_user#group-accounting

 


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>
Sent: Friday, September 15, 2023 12:56 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] greedy_user ?

 

Hi,

 

We sometimes have urgent jobs where weâd want them to bypass all other jobs as soon as possible. Something like a reversed nice_user (greedy_user?).

 

Now that I know how to Hold or Preempt jobs with a timelimit, Iâd like a way for an urgent job to be put at the front of the queue, regardless of other users, fair-share, priorities, weights, quotas, job universe, etc. The system would then wait for enough resources to be free and launch that job before every other regular job from the queue.

 

Is there a configuration that could enable such behavior?

 

Thanks!

 

Martin

 


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736