[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] On a job starving issue



If setting this configuration knob results in an actual change in the Scheddâs set of significant attributes

when it will delete the autocluster id from all jobs in the queue.  But it wonât recalculate a new autocluster id

for a job until it needs to.   

 

So it wonât affect jobs that are already running, but it will affect any future attempts to get matches from the Negotiator.

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Weiming Shi
Sent: Tuesday, May 7, 2019 2:08 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] On a job starving issue

 

Hi John,

 

Thanks for the information. I will get it a trial. I am curious if changing ADD_SIGNIFICANT_ATTRIBUTES and condor_reconfig the submitter will change the job clusters of the jobs in the queue.

 

 

 

 

On Tue, May 7, 2019 at 9:49 AM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

You can force any attribute to be part of the set used for autoclustering, so that âgoodâ jobs will never be autoclustered with âbadâ jobs.

 

just configure

 

ADD_SIGNIFICANT_ATTRIBUTES = <Attr1>  <Attr2>  <ETC>

 

On your submit machine.  <Attr1> and <Attr2>, etc above are attributes that you want to be used for autoclustering that

are not currently used and can distinguish between your âgoodâ and your âbadâ jobs.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Weiming Shi
Sent: Monday, May 6, 2019 10:21 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] On a job starving issue

 

Hi HTCondor Community,

 

Is there a way to disable the autoclustering during the matchmaking of condor jobs or a way to re-initiate the matchmaking when the runnable queue is not changed?

 

Our main motivation is to prevent the 'good' jobs (which should be scheduled) from being clustered with the 'bad' jobs (which should be rejected) when the significant attributes of the 'bad' jobs and 'good' jobs are not sufficient to separate them into two different clusters.

 

A starving issue can happen when we have multiple pools and we enable the jobs to be flocked to multiple pools concurrently (by setting flock_increment = #pools). Since the jobs can be flocked to multiple pools concurrently, the order that a pool master is negotiated with is no longer deterministic. When the 'bad' jobs are rejected because of the resource capacity of a particular pool, the 'good' jobs that are clustered with 'bad' jobs are also rejected and not reconsidered for matchmaking with other pools that have idle resource and no capacity issue. 

 

Thanks

 

Weiming

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/