[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] DAGMan and decision making



Hi all,

 

We have a user case here, where they use an Octave script to generate jobs and submit them using condor. The Octave script is not currently running as a condor job.

That script is a bit more complex, as it will wait for the results of the previous jobs it submitted, evaluate and incorporate the results into its database, then find optimal parameters and launch additional jobs, in an undefined length loop.

 

Furthermore, they plan on adding more complexity to their workflow and add different “routes”, depending on what’s going on during the computing process.

Something like a Monte Carlo first step, with a check where if not satisfactory enough, will pause it, then launch another software, wait for result and then feed it back to the Monte Carlo to continue its work.

 

Now, as the one in charge of the system, I’m not really liking the fact that an Octave script, running on the submit node, is trying to handle the condor jobs on its own and making plenty of info requests to the scheduler to try handling all possibilities. I’d prefer something more reliable, like DAGMan.

 

We never used DAGMan before, so I’m very new to it. Both me and the users are interested in trying it out to replace the Octave script, but we are unsure if it can do what we require.

 

Can there be an infinite loop?

Can there be different routes within the looped workflow?

Can a “decision making” job within the loop dynamically modify and add to the ongoing DAGMan workflow?

If so, how would it be done?

 

Thanks!

 

Martin