[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Tracking DAGMan jobs



> Wrap it in a nested dag and this should be pretty easy: The toplevel DAG will handle all the messy details.
> 
> subdag external mydag the-original-dag.dag
> script post mydag post.script
> script pre mydag pre.script
> 
> Nathan Panike

I was going to wrap it that way anyway as a clean way to add a FINAL node, but the issue of finding the clusterID still remains.

According to the documentation (and confirmed by experiment), $JOBID can be used as an argument to a POST script, but not to a PRE script.

Let me try to be a bit clearer about my requirements.

* I want to insert a record in a database at DAG start, and update it at DAG end
* I want to include the clusterID of the dagman process in the database row, so that for example someone can manually "condor_rm" it or otherwise examine its status.
* I would prefer to use the clusterID as the key when updating the row, to avoid having to allocate some additional unique ID and pass it to the SCRIPT POST.

So if I insert the database row as part of SCRIPT PRE, it still needs some way to find its own clusterID.

Now, testing with

$ cat testwrap.dag
SUBDAG EXTERNAL mydag test.dag
SCRIPT PRE mydag do_final.sh 0 $JOB
SCRIPT POST mydag do_final.sh $RETURN $JOB $JOBID

it looks like the SCRIPT PRE/POST do both have CONDOR_ID in the environment. So I guess I can use that (undocumented) feature.

One slightly messy thing I noticed about wrapping the DAG as a SUBDAG EXTERNAL is that if it fails, we get two rescue DAGs: one for the subdag and one for the outer dag.
SPLICE doesn't have this issue, but you can't use PRE/POST with a SPLICE. However you can with a FINAL node:

SPLICE mydag test.dag
FINAL final_node /dev/null NOOP
SCRIPT PRE final_node do_final.sh $DAG_STATUS $FAILED_COUNT

Regards,

Brian.