[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Prematurely exiting job_router route via Hooks



Hi Jose, Brian, all,

so I did try condor_rm'ing from inside the hooks...
Well, kudos to the HTCondor team that this actually *worked*.
I have the test code at [1] in case somebody wants to try. Basically the update hook reads the routed job's ID and calls condor_rm on it.

First of all, it accomplishes the goal: the routed job is removed, releasing the original job from the route.

However, it is slightly messy:
- The job_router will proceed the current cycle, trying to update the now gone job. It catches this gracefully, but still complains.
- The hook may receive some incomplete classad data once, so it must handle this gracefully as well.

Cheers,
Max

[1]
https://gist.github.com/maxfischer2781/5c3bb079fb730e7242267cdb326866ce

> Am 22.09.2016 um 16:25 schrieb Jose Caballero <jcaballero.hep@xxxxxxxxx>:
> 
>> 
>> 
>> Is there a way for *hooks* to end a route prematurely once it has been established? Our `translate` hooks can already detect if routing a job is useful, but we haven't found a way to tell this to the router.
> 
> 
> Hi Max,
> 
> I am facing a similar issue. Sometimes I have jobs that cannot be
> executed, and I would like them to not be routed at all.
> 
> BTW, there is way to generate the routing tables with a script.
> This can become very handy as it allows you to perform checks and
> validations, and generate the routing tables on the fly based on the
> results of those validations.
> I am not sure that would help in your case, but just in case. [1]
> In my case, what I do as well is to have a special route for those
> jobs with requirements that cannot be satisfied. In that route, I also
> 
> 1. set  EditJobInPlace, so no extra job is cloned from it
> 2. set an ad-hoc classad like "do_no_reroute_me_anymore", that I can
> use to detect if that job already passed previously the routing
> mechanism or not
> 3. I finally set a very aggressive PERIODIC_REMOVE expression for that
> job, killing it after a few seconds being IDLE.
> 
> However, I think that solution is ugly even though it seems to work.
> I think there should be a configuration variable like
> "HOLD_IF_NO_ROUTE" or similar that allows us to decide if we want the
> current behavior or we prefer to put on HOLD those jobs that fail at
> the TRANSLATE hook.
> 
> Anyways, I am more than interested on any progress you make dealing with this.
> 
> Cheers,
> Jose
> 
> [1] http://research.cs.wisc.edu/htcondor/manual/v8.4/3_3Configuration.html#27292
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature