[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] greedy_user ?



Hi Joe,

 

Still can’t make it work for some reason.

 

I tried adding ‘Rank = 1000000.0’ to the submit file.

condor_q -long does show the new rank of the job, but it still won’t take precedence when all other jobs are Idled.

 

I tried adding ‘DEDICATED_SCHEDULER_USE_FIFO = False’ to the CM’s config file, but nothing changed.

 

I also tried replacing ‘RANK = Scheduler =?= $(DedicatedScheduler)’ on execute node with:

RANK = ("AcctGroupUser" == "pronto" * 1000000000000) + (Scheduler =?= $(DedicatedScheduler))

or simply

RANK = ("AcctGroupUser" == "pronto" * 1000000000000)

and still nothing changed.

 

Finally, I also updated to 10.0.8 from 9.0.17. Other than all 3 jobs waiting longer in IDLE before the first one going back to RUN, it didn’t seem to change anything.

 

Somewhere during my tests, I tried with 10.7.0, but then it was the second job that started running instead of the third one when the first got pre-empted. And I’m not sure if that was because the Schedd suddenly kept crashing or something else…

 

Martin

 

From: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Sent: September 15, 2023 4:02 PM
To: Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: greedy_user ?

 

Hi Martin,

 

So, with parallel universe jobs, things will work a little differently because parallel universe runs jobs FIFO. You will need to then assign the job a rank within the submit file. In the submit file you will need to add 'Rank = <floating_point_rank>' and the higher the rank should be run first when trying to match to a machine.

 

 

Best,

Joe


From: Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>
Sent: Friday, September 15, 2023 2:44 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Subject: RE: greedy_user ?

 

Hi Joseph,

 

Thanks for the quick reply!

 

Will this work with parallel universe jobs (DedicatedScheduler)? Because I’m trying what you said right now and it doesn’t seem to work.

 

Job 116 is from user “test”.

Job 117 is from user “test2”.

Job 118 is from user “test2” and “accounting_group_user = pronto” added to submit file.

All 3 jobs are parallel universe.

 

Jobs are set to be Pre-empted after running for 120 seconds (for quick testing purposes): use POLICY: Preempt_if_Runtime_Exceeds( 120 )

Job 116 keeps going back to the running state after being pre-empted.

I would assume Job 118 would start running instead.

 

Is this about FIFO? If so, is there any way to change it?

 

Also, I have dynamic partitionable slots configured:

 

DedicatedScheduler = "DedicatedScheduler@sms1"

STARTD_ATTRS = \$(STARTD_ATTRS), DedicatedScheduler

START = True

SUSPEND = False

CONTINUE = True

PREEMPT = False

KILL = False

WANT_SUSPEND = False

WANT_VACATE = False

RANK = Scheduler =?= \$(DedicatedScheduler)

use FEATURE: PartitionableSlot( 1, auto )

 

 

 

 

Thanks!

 

Martin

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of JOSEPH RYAN REUSS via HTCondor-users
Sent: September 15, 2023 2:32 PM
To: htcondor-users@xxxxxxxxxxx
Cc: JOSEPH RYAN REUSS <jrreuss@xxxxxxxx>
Subject: Re: [HTCondor-users] greedy_user ?

 

Hi Martin!

 

Condor assigns fair share by user, which is not necessarily a human, so let's create a high priority user that a human can utilize so jobs can get high priority. You would need to set 'accounting_group_user = <some_user>' in your submit file to override the default user selected and select <some_user> instead. You can then set the priority of that user by running 'condor_userprio -setfactor <some_user> <priority number>' on the AP you are submitting the job from. 

 

Here's the documentation for reference:

 


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>
Sent: Friday, September 15, 2023 12:56 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] greedy_user ?

 

Hi,

 

We sometimes have urgent jobs where we’d want them to bypass all other jobs as soon as possible. Something like a reversed nice_user (greedy_user?).

 

Now that I know how to Hold or Preempt jobs with a timelimit, I’d like a way for an urgent job to be put at the front of the queue, regardless of other users, fair-share, priorities, weights, quotas, job universe, etc. The system would then wait for enough resources to be free and launch that job before every other regular job from the queue.

 

Is there a configuration that could enable such behavior?

 

Thanks!

 

Martin