[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Calling condor_submit_dag recursively



Hi Alain,

Thanks for your reply.

Alain Roy wrote:


I'm trying to call condor_submit_dag recursively from within a DAGMan POST script, but without much luck.


I would think that would work, but I'm not aware that we've tried it.

But it makes me wonder: why do you want to submit a DAG from a POST script? Shouldn't it be another node in the DAG?

The purpose of a POST script is to do simple post-processing and to decide if the job that is in the node succeeded or failed. In this case, your DAG node will be considered to have succeeded if condor_submit_dag returns 0. Is that the semantics you want?


The point is that I don't know how many nodes I'll need, so recursion via the POST files seemed ideal to me.

The initial submission works fine, but when the POST script tries to kick off another dag (either by passing it to a "system" call, or by forking and exec'ing a new process), nothing happens. The job and dagman log files report no errors.


Did you get any output from condor_submit_dag? Did it return an error code?

It's possible that the environment was set up so that it couldn't find condor_submit_dag, and it just failed. Could that be what happened?

-alain

As it turned out, I believe it was a subtle (to me, anyway) file contention problem. The new dag job would try to open and write to its dag log files while the old one was still using them. What solved the problem was forking off a new process from the post file and sleeping for a few seconds before kicking off the new job. I must admit, it took me quite a while before I stumbled on that!

Cheers,

Mark


Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>