Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS

Date: Thu, 30 Jul 2015 13:58:34 -0500 (CDT)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS

On Thu, 30 Jul 2015, John N Calley wrote:

I make a lot of use of SPLICE-ing to compose dags into complexworkflows and these often have dependencies on each other. DAGMAN dealswith this by adding dependencies between every final node for the PARENTdag and every initial node of the CHILD dag. When there are thousands ofinitial and final nodes (as is common with my workflows) this can resultin extremely large numbers of dependencies and I've had cases whereparsing a rescue dag took quite a few hours. I've been living with thisfor a while, but I recently came up with a work-around and I wondered ifothers might have any thoughts on it or perhaps better ways of dealingwith the issue.

We're glad that you're finding splices useful. Hopefully we can make someimprovements to make them more useful...

What I have now started to do is to add a final NOOP job to each of mysub-dags, so at least I just have all the dependencies from initial jobsin the CHILD dag with this single final place-holder node. I assume thatI could do the same thing to make every one of my dags start with a NOOPinitial node that all the real initial nodes depend on, though I haven'tactually tried this. This is clearly not the intended use of the NOOPkeyword and it's a bit of a hack, so I wondered if others had betterideas?

Hmm, I wouldn't consider this a hack. There's not really a specific"intended" use for NOOP nodes -- they're for whatever someone findsuseful, as in this case.

Also, it would seem that it would be easy for DAGMAN to do this for meas part of the SPLICE-ing process and the result would be a good dealcleaner. I don't see any reason for DAGMAN not to do this. Am I missingsomething? If not, please consider it a feature request.

That's actually something we thought of pretty much when splices werefirst implemented. Anyhow, there is already a corresponding featurerequest:


  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=3587,4

I guess it's kind of languished until now because nobody has really runinto a use case where it was really necessary (or, if they did, we didn'tfind out about it).

Maybe it's time to move that up in priority... At any rate, though,there's no reason to not do it, other than its relative priority among theseveral hundred outstanding DAGMan bugs/feature requests.

What I'd really like to do is to reach 'into' each sub-dag and insertdependencies between specific final nodes and specific initial nodes.I've considered hacking this solution together, but the ways of doing itthat I can think of seem inelegant. I wonder if anyone has thoughts onhow to do this kind of thing cleanly? To expand a bit, this comes upwhenI want to do Analysis A on samples 1-2000 and then I want to do AnalysisB on the same samples. Analysis B for sample 1 depends on Analysis A forthe same sample, but not on Analysis A for any other samples. It's ashame to require that Analysis A finish for all samples before I startAnalysis B for any samples, but that is what I feel stuck with at themoment.

So you're saying that right now you have all of the A nodes in one splice,and all of the B nodes in another splice, right? I guess one thing Iwould want to understand in this case is what is driving yourdecomposition of the workflow. Because if you have a single splice thathas all of your As and all of your Bs, you could do this easily. Or, ifyour decomposition is governed by size, you could have a splice that hasA1-A100 and B1-B100, another splice that has A101-A200, B101-B200, etc.

If you really do need to have all of the As in one splice and all of theBs in another I guess it might be possible to implement some kind of"weaker" dependency between splices, wherein a given node in thesecond splice only depends on some of the nodes in the first splice. Thatwould definitely take some thinking, though, about how the dependenciesshould be specified, and this is something that hasn't come up previously,as far as I know, so I don't have any pre-existing ideas on it.


So, to summarize:
1) There's no problem with using NOOP nodes as you describe.

2) There's no reason to not have DAGMan automatically introduce suchnodes. (This would also allow splices to have pre and post scripts, whichwould make them more consistent with sub-DAGs.)3) Before any kind of implementation of the more flexible inter-splicedependencies, there would have to be some serious thinking involved,probably starting with a better understanding of your use case.


Kent Wenger
CHTC Team

Follow-Ups:
- Re: [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS
  - From: John N Calley

References:
- [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS
  - From: John N Calley

Prev by Date: Re: [HTCondor-users] condor_rm & the docker universe
Next by Date: Re: [HTCondor-users] condor_rm & the docker universe
Previous by thread: [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS
Next by thread: Re: [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Avoiding combinatorial explosion in dependencies between spliced DAGS