[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] DAGMAN Workflow Assertion ERROR



I recently upgraded my HTCondor cluster from 8.6.12 to 9.10.0. I have a DAG file, test.dag, that looks like this:

```
JOB ÂA Âtest.sub DONE
JOB ÂB Âtest.sub
JOB ÂC Âtest.sub
JOB ÂD Âtest.sub
PARENT A CHILD B C
PARENT B C CHILD D

SCRIPT PREÂ AÂ pre.sh
SCRIPT POSTÂ AÂ post.sh
```

Running version 8.6.12, `condor_submit_dag test.dag`, would execute just nodes B, C, and D:

But running the version 9.10.0, the entire dag is stuck in idle. Looking at `test.dag.dagman.out`, I see `ERROR "Assertion ERROR on (GetStatus() != STATUS_DONE)" at line 749 in file ./src/condor_dagman/`.

If I remove `DONE` from the first JOB in `test.dag`, everything runs fine.

The documentation says "Users should generally not use the DONE keyword." and to use NOOP instead (https://htcondor.readthedocs.io/en/latest/users-manual/dagman-workflows.html#job). But I don't see anything about the behavior of `DONE` changing between these two versions. And since DAGMan still uses it, I wouldn't think that using it would result in the Assertion ERROR being thrown.

I don't want to use NOOP because I don't want the PRE and POST scripts to be run and I don't want to have to manually comment-out all of the PRE and POST scripts.

Is there a way to get `test.dag` to run when JOB A is marked as DONE?

Thanks,

Curtis