[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] On a job starving issue



Hi HTCondor Community,

Is there a way to disable the autoclustering during the matchmaking of condor jobs or a way to re-initiate the matchmaking when the runnable queue is not changed?

Our main motivation is to prevent the 'good' jobs (which should be scheduled) from being clusteredÂwith the 'bad' jobs (which should be rejected) whenÂthe significant attributes of the 'bad' jobs and 'good' jobs are not sufficient toÂseparateÂthem into two different clusters.

A starving issue can happen when we have multiple pools and we enable the jobs to be flocked to multiple pools concurrently (by setting flock_increment = #pools).ÂSince the jobs can be flocked to multiple pools concurrently, the order that a pool master is negotiated with is no longerÂdeterministic.ÂWhen the 'bad' jobs are rejected because of the resource capacity of a particular pool, the 'good' jobs that are clustered with 'bad' jobs are also rejected and not reconsidered for matchmaking with other pools that have idle resource and no capacity issue.Â

Thanks

Weiming