[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Behaviour of DAGMAN_ALWAYS_RUN_POST in absence of PRE



Brian Candler wrote:

 
> On 03/04/2017 19:02, Dimitri Maziuk wrote:
>> I wonder: in what scenario a post script that starts with
>>
>> #!/bin/sh
>> if [ $1 -ne 0 ] ; then exit $1 ; fi
>>
>> would cause problems?

> Only when you forget to do it.

> We recently had a problem when a broken dataset ended up getting
deployed. It was controlled by a top-level dag with subdags. After
grubbing through various condor log files, it turns out it was due to
one of the inner dags failing, but the top level DAG had POST scripts to
notify progress, and they weren't handling $RETURN properly.

> So I was just wondering if it was possible to idiot-proof this.

I'm liking the idea of dealing with this with one line in your DAG file, for example:

  RUN_POST_ON_JOB_FAIL ALL_NODES false

(On the other hand, doing it in configuration rather than with a DAG command would make it easier to do across splices and sub-DAGs, but you'd have no way to do it on a per-node basis then.)

Kent