[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] About DAGMan Model



On Thu, 30 Mar 2006, Xuke wrote:

>     I'm new to Condor. I understand that DAGMan forbits cycled
> dependency between jobs to avoid deadlocks. However, since I notice that
> DAGMan supports the keyword of "Retry" to repeatedly execute a failed
> job, I'm just wandering whether it is possible in DAGMan to specify the
> looped execution of another DAG for a pre-defined number of times (just
> like the "for" clause in many common programming languages)?

Well, the problem with using retry for looping is that the retry only
happens if your job fails, and if you run out of retries the node is
considered failed.  So that probably won't do what you want.  I guess
you might be able to achieve what you want by using a POST script that
counted the number of tries somehow, and exited with 0 after the required
number of times through the loop.  Of course, you'd have to set
the number of retries for that node to be greater than or equal to the
number of times you wanted to actually execute the lower-level DAG.

How many times do you need to loop?  Would it be possible to "unroll"
the loop?  That would be a much cleaner way to do this.

>     Also, how to use DAGMan to specify the alternative choice of two
> jobs for execution? I think I read a paper that says DAGMan supports the
> specification of sequential, parallel and alternative relations. But
> after I read the manual of Condor about DAGMan, I'm just a little
> confused how to specify the alternative relation with DAGMan.

Right now about the only way to do this would be to have a POST script
of one node do something like overwrite the submit file for a subsequent
node.  Note that if you do this, some submit file for every node must
exist at the time you submit the DAG; also, if you change a submit file
on the fly, the new version should have the same log file as the old
version.

Kent Wenger
Condor Team