[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Prematurely exiting job_router route via Hooks



Hi all,

 we are using the job_router to hook into the scheduling of jobs [1,2]. Basically, we have a database on file placement, and pick the best hosts for locality.
In principle, this works flawlessly.

 However, many jobs do not actually benefit from this - some input files aren't present in our system, and some users just provide inadequate input lists. The ClassAd expressions for route `requirements` aren't sufficient to detect this. Only our hooks find out when trying the query.
 While this works, it means we get hundreds of routed jobs, each regularly calling update hooks, with no benefit at all. Either we let all those bogus routes persist or severely restrict also jobs that would profit. Either way, we've seen some massive load spikes and router or service performance degradation.

 Is there a way for *hooks* to end a route prematurely once it has been established? Our `translate` hooks can already detect if routing a job is useful, but we haven't found a way to tell this to the router.
Hook failure is just logged as an error and retried. I've pondered having a hook remove its own routed job via `condor_rm`, but it seems rather hacky.

Cheers,
Max

[1] HTCondor Hooks
http://research.cs.wisc.edu/htcondor/manual/current/3_3Configuration.html#SECTION004333000000000000000

[2] Example configuration
https://bitbucket.org/kitcmscomputing/hpda/src/61b129feaa9e94aab80fd4a989446783d762f0b2/docs/examples/htcondor/HPDA_Hook.cfg?at=master&fileviewer=file-view-default

[3] Router and Router Hooks
http://research.cs.wisc.edu/htcondor/manual/current/5_4HTCondor_Job.html#sec:JobRouter
http://research.cs.wisc.edu/htcondor/manual/current/4_4Hooks.html#SECTION00542000000000000000

Attachment: smime.p7s
Description: S/MIME cryptographic signature