[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGMAN_ABORT_ON_SCARY_SUBMIT



On Wed, 7 May 2014, Jiande Wang wrote:

I submitted several dagman jobs at almost the same time. Each dagman script is like the following:

Job A ......
...
 I got error message in the dagman output file


ERROR: node D1: job ID in userlog submit event (9025.0.0) doesn't match ID reported earlier by submit command (9021.0.0)! Aborting DAG; set DAGMAN_ABORT_ON_SCARY_SUBMIT to false if you are *sure* this shouldn't cause an abort.


Is this because condor can not handle several "D1" job at the same time although they belong to different dagman?

Are your D1 jobs from the different DAGs using the same log file? If so, and you're using a version of HTCondor older than 7.9.0, that is your problem.

Any suggestions on this?

Two possibilities:

1) Change your submit files so that no node job user log file is shared between more than one DAG that is running at the same time.

2) Upgrade to a post-7.9.0 version of HTCondor -- that will make everything work okay as long as your DAG files have different names.

Kent Wenger
CHTC Team