[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Recursive DAGman



Without going in to too much detail at this stage, we have a user who is
trying to get a recursive DAGman to work.
Can anyone point me at examples or advice on this?

At the end of the DAG script he does a convergence test and if necessary
re-submits the DAG with updated files.
At first this was failing because DAGman thought it was the same job and
the lock file stopped it running.
He 'fixed' this by renaming the recursive DAGman script.

This is the comment I got from him
"So we can see, it has some issue that the parent process (I'm not
actually sure whether this is the parent dagman process or the parent
script) exits, causing the newly launched dagman process to get signal 3
and thus enter recovery mode. It does this infinitely, never escaping from
this loop until I removed the dagman process from the queue using
condor_rm."

I can supply more details if anyone can help; I've also asked him to
create a bare bones example of the problem
(the real one is quite hairy/messy).

Thanks all
-Ian


-- 
Ian Cottam
IT Services -- supporting research
Faculty of Engineering and Physical Sciences
The University of Manchester
"The only strategy that is guaranteed to fail is not taking risks." Mark
Zuckerberg