[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Prematurely exiting job_router route via Hooks



Hi Max,

I can think of two approaches:

1) Use the python classad bindings to add a new function to libclassad.  These allow arbitrary code to be invoked in the requirements string (quite a good way to shoot yourself in the foot!).

2) Route the job twice:
   a) Route A sets EditJobInPlace = true, runs hook to determine whether this job should be routed, then set RequiresRealRouting = true or false.  In the requirement string for Route A include (RequiresRealRouting is UNDEFINED).
   b) Route B has a requirement (RequiresRealRouting =?= true).
   This way, Route A does some amount of "pre-processing" - but doesn't actually duplicate the job and Route B is applied only on those jobs that have the correct settings.

Brian

> On Sep 22, 2016, at 3:23 AM, Fischer, Max (SCC) <max.fischer@xxxxxxx> wrote:
> 
> Hi all,
> 
> we are using the job_router to hook into the scheduling of jobs [1,2]. Basically, we have a database on file placement, and pick the best hosts for locality.
> In principle, this works flawlessly.
> 
> However, many jobs do not actually benefit from this - some input files aren't present in our system, and some users just provide inadequate input lists. The ClassAd expressions for route `requirements` aren't sufficient to detect this. Only our hooks find out when trying the query.
> While this works, it means we get hundreds of routed jobs, each regularly calling update hooks, with no benefit at all. Either we let all those bogus routes persist or severely restrict also jobs that would profit. Either way, we've seen some massive load spikes and router or service performance degradation.
> 
> Is there a way for *hooks* to end a route prematurely once it has been established? Our `translate` hooks can already detect if routing a job is useful, but we haven't found a way to tell this to the router.
> Hook failure is just logged as an error and retried. I've pondered having a hook remove its own routed job via `condor_rm`, but it seems rather hacky.
> 
> Cheers,
> Max
> 
> [1] HTCondor Hooks
> http://research.cs.wisc.edu/htcondor/manual/current/3_3Configuration.html#SECTION004333000000000000000
> 
> [2] Example configuration
> https://bitbucket.org/kitcmscomputing/hpda/src/61b129feaa9e94aab80fd4a989446783d762f0b2/docs/examples/htcondor/HPDA_Hook.cfg?at=master&fileviewer=file-view-default
> 
> [3] Router and Router Hooks
> http://research.cs.wisc.edu/htcondor/manual/current/5_4HTCondor_Job.html#sec:JobRouter
> http://research.cs.wisc.edu/htcondor/manual/current/4_4Hooks.html#SECTION00542000000000000000_______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/