[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Rank expression for match making using a python script



Hi Luke,

we're successfully using the job_router this way to match jobs to worker nodes where input files are cached [1, 2]. This is very powerful.
Basically all our jobs provide a list of files either as plain ClassAd Attribute or point to a file with the list. The hook reads this, queries our database, and updates the RANK expression.

We have some problems with scalability since the cluster is shared with other resource types. Many jobs queued at our scheed do not profit from locality, but trigger hooks anyways.
If you expect jobs to wait long compared to the residency of files, you have to update them with hooks as well. This can create considerable load bursts, since all hooks run in parallel (even if some exit right away for load balancing).
Combined, load spikes can be problematic when managing some O(1000) jobs per schedd.

Cheers,
Max

[1]
http://iopscience.iop.org/article/10.1088/1742-6596/664/9/092008/pdf

[2] Router Config and Hooks
https://bitbucket.org/kitcmscomputing/hpda/src/61b129feaa9e94aab80fd4a989446783d762f0b2/docs/examples/htcondor/HPDA_Hook.cfg?at=master&fileviewer=file-view-default
https://bitbucket.org/kitcmscomputing/hpda/src/61b129feaa9e94aab80fd4a989446783d762f0b2/hpda/interfaces/htcondor/router_hooks.py?at=master&fileviewer=file-view-default


> Am 11.08.2016 um 17:08 schrieb L Kreczko <L.Kreczko@xxxxxxxxxxxxx>:
> 
> Hi Jose,
> 
> Thanks for the reply. Answers are inline.
> 
> On 11 August 2016 at 15:35, Jose Caballero <jcaballero.hep@xxxxxxxxx> wrote:
> HI Luke,
> 
> I am not really an expert, but anyways...
> Quick question: when (or where) do you want to analyze the input
> files? Before submitting the job, or once the job have been submitted
> but just before the matchmaking?
> After submission, before matchmaking. Essentially I want to be able to influence on which node a job can run based on the listed input files (or a custom jobAd). 
> 
> If you can analyze the input files before submission, I guess you can
> always set a convenient list of classads in the submit file, as a
> result of the analysis, and then let the rank expression + matchmaking
> to pick the right node.
> I see. So I could add a list to the JobAd asking for preferably and of the nodes in the list. 
> 
> If that is not feasible, and must be done after submission, you can
> try manipulating the job classads just before the matchmaking is being
> called using the Job Router hooks
> (which I am currently learning myself how to use, so still in the
> process of understanding them):
> 
>       http://research.cs.wisc.edu/htcondor/manual/v8.5/4_4Hooks.html#SECTION00542000000000000000
> Yes, I think this is where I've seen the python script being used. Job Router sounds good, might try this one:
> PILOT_HOOK_TRANSLATE_JOB = /usr/libexec/htcondor-pilot-job-router/pilot-translate.py
> JOB_ROUTER_HOOK_KEYWORD = PILOT
> 
> from https://github.com/treydock/htcondor-pilot-job-router
> 
> 
> 
> What I am not sure if that you will be able to inspect the content of
> the input file at that point. Maybe you can. It will be nice to know.
> Essentially I want to check the path of the input file (not the content) and ask our HDFS where the files (or chunks of them) are located.
> This will give me a list of nodes that the job should run on (therefore reducing network traffic). So making HTCondor behave more like YARN. 
> 
> 
> Cheers,
> Luke 
> 
> 
> 
> 
> 2016-08-10 17:37 GMT-04:00 L Kreczko <L.Kreczko@xxxxxxxxxxxxx>:
> > Dear experts,
> >
> > I am trying to write a simple python script that takes the job input files,
> > does a quick analysis and returns useful results to condor for match making.
> > I am very sure that this was demonstrated somewhere (python scripts in
> > condor expressions), but I cannot find it in my notes nor via google.
> >
> > Could you please send me some pointers?
> >
> > What I want to do:
> > - before matchmaking analyse job input files with python
> > - return a result that condor can use for making the match making decision
> > (i.e. some nodes are better for input A then others)
> >
> > Cheers,
> > Luke
> >
> >
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> -- 
> *********************************************************
>   Dr Lukasz Kreczko            
>   Research Associate
>   Department of Physics
>   Particle Physics Group
> 
>   University of Bristol
>   HH Wills Physics Lab
>   University of Bristol
>   Tyndall Avenue
>   Bristol
>   BS8 1TL
> 
>   +44 (0)117 928 8724  
>   L.Kreczko@xxxxxxxxxxxxx
>   
>   A top 5 UK university with leading employers (2015)
>   A top 5 UK university for research (2014 REF)
>   A world top 40 university (QS Ranking 2015)
> *********************************************************
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature