[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGs and job ID mismatch



On Thu, 24 Jan 2013, Michael O'Donnell wrote:

I have submitted 5 different DAGs on the same submit machine. Some of these
DAGs are completing and a rescue file was generated. I then submit those
DAGs with failures, but no jobs run. In the DAG log I am told that the job
ID in the userlog does not match the previously reported ID:

ERROR: node j806: job ID in userlog submit event (917.0.0) doesn't match ID
reported earlier by submit command (1099.0.0)!  Aborting DAG; set
DAGMAN_ABORT_ON_SCARY_SUBMIT to false if you are *sure* this shouldn't
cause an abort.

Is it possible that there is any overlap of node job log files between DAGs? (In other words, a node job from DAG 1 and a node job from DAG 2 use the same log file.) If that's the case, it will cause problems like this...

But seeing the dagman.out files from all 5 of the DAGs would be useful.

Kent Wenger