[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGMAN_ABORT_ON_SCARY_SUBMIT



Thanks for your quick reply. You are right, we are using old version of condor.


On May 7, 2014, at 2:50 PM, "R. Kent Wenger" <wenger@xxxxxxxxxxx> wrote:

> On Wed, 7 May 2014, Jiande Wang wrote:
> 
>> I submitted several dagman jobs at almost the same time. Each dagman script is like the following:
>> 
>> Job A ......
>> ...
>> I got error message in the dagman output file
>> 
>> 
>> ERROR: node D1: job ID in userlog submit event (9025.0.0) doesn't match ID reported earlier by submit command (9021.0.0)!  Aborting DAG; set DAGMAN_ABORT_ON_SCARY_SUBMIT to false if you are *sure* this shouldn't cause an abort.
>> 
>> 
>> Is this because condor can not handle several "D1" job at the same time although they belong to different dagman?
> 
> Are your D1 jobs from the different DAGs using the same log file?  If so, and you're using a version of HTCondor older than 7.9.0, that is your problem.
> 
>> Any suggestions on this?
> 
> Two possibilities:
> 
> 1) Change your submit files so that no node job user log file is shared between more than one DAG that is running at the same time.
> 
> 2) Upgrade to a post-7.9.0 version of HTCondor -- that will make everything work okay as long as your DAG files have different names.
> 
> Kent Wenger
> CHTC Team
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/