[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [CondorLIGO] Depth-first for non-DAG and DAG jobs?



Hi Steffen,

I just looked into this. The DAGMAN_SUBMIT_DEPTH_FIRST implementation has not really changed since that email thread, so whatever improvements Kent had in mind never happened.

Having said that, I'm pretty sure the current approach is safe, just not optimal for all DAG structures. For very wide graphs with short-running jobs, it means that children of top-level nodes will get submitted before some other top-level nodes. But the DAG should still run correctly. It works almost exactly the same whether this option is set to True to False.

So I think this is still the best approach for what you're trying to accomplish. How useful it is really depends on the shape of the graph and the runtime of the jobs. But since we fundamentally have to follow the dependencies of your graph, I think it's the best way to do what you're looking for.

Mark





On Fri, Apr 5, 2019 at 2:32 AM Steffen Grunewald <steffen.grunewald@xxxxxxxxxx> wrote:
Good morning,

I've been asked why my matching patterns look rather irregular.

To keep slots as unfragmented as possible, and instead spending most-
fragmented slots first, my policy contains
 NEGOTIATOR_DEPTH_FIRST = True
also supported by a NEGOTIATOR_PRE_JOB_RANK that prefers matching with
already fragmented resources.

For "normal" jobs that works nicely, jobs get matched against the same
node until it's resources are exhausted.

I'd like to apply the same for DAGs as well, but I'm puzzled by the old
discussion here:
 https://lists.cs.wisc.edu/archive/htcondor-users/2007-January/msg00056.shtml
that somewhat discourages to set DAGMAN_SUBMIT_DEPTH_FIRST = True as well,
as I apparently cannot guarantee that there are side effects (parents
never getting to the top of the matching queue?).

Since more than 10 years have passed, has this "much easier to implement"
(as Kent stated) approach seen any brush-up, making this setting safe?
What are the recommendations for my situation nowadays?

Job runtimes are as inhomogeneous as one could imagine, and preemption and
defrag would precisely hit the wrong jobs, this is why I'd like to avoid
them as much as possible and have a clean matching pattern instead.

My last resort would be to setup special rules ("full-node requests only",
i.e. one static "cover-all" slot) for a subset of nodes, but we all know
that walls always happen to be in the wrong places once set up...


Thanks,
 Steffen


--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am MÃhlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~
_______________________________________________
Condorligo mailing list
Condorligo@xxxxxxxxxx
https://lists.aei.mpg.de/mailman/listinfo/condorligo


--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison