[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] aborting DAG because of bad event



This doesn't make much sense to me. I had a large dag, and I noticed a lot of jobs got put on hold, so I did a condor_release on them.
This is the end of my dagman.out file

It says there was a bad event (I have no idea what the event was)
and then it's aborting the dag, but it also says it's continuing the dag.

What just happened?
Thanks
-Peter

01/29 09:05:33 Currently monitoring 1 Condor log file(s)
01/29 09:05:34 Currently monitoring 1 Condor log file(s)
01/29 09:05:34 BAD EVENT: job (377972.0.0) executing, total end count ! = 0 (1) 01/29 09:05:34 ERROR: aborting DAG because of bad event (BAD EVENT: job (377972.0.0) executing, total end count != 0 (1)) 01/29 09:05:34 BAD EVENT: job (377972.0.0) ended, total end count != 1 (2) 01/29 09:05:34 Continuing with DAG in spite of bad event (BAD EVENT: job (377972.0.0) ended, total end count != 1 (2)) because of allow_events setting 01/29 09:05:34 BAD EVENT: job (376465.0.0) executing, total end count ! = 0 (1) 01/29 09:05:34 ERROR: aborting DAG because of bad event (BAD EVENT: job (376465.0.0) executing, total end count != 0 (1)) 01/29 09:05:34 BAD EVENT: job (376465.0.0) ended, total end count != 1 (2) 01/29 09:05:34 Continuing with DAG in spite of bad event (BAD EVENT: job (376465.0.0) ended, total end count != 1 (2)) because of allow_events setting
01/29 09:05:34 Aborting DAG...
01/29 09:05:35 Writing Rescue DAG to ../.dag/2vpw- g10.dag.rescue001.rescue001...
01/29 09:05:35 Removing submitted jobs...
01/29 09:05:35 Removing any/all submitted Condor/Stork jobs...
01/29 09:05:36 Note: 663691422 total job deferrals because of -MaxJobs limit (4000)
01/29 09:05:36 Note: 0 total job deferrals because of -MaxIdle limit (0)
01/29 09:05:36 Note: 0 total job deferrals because of node category throttles 01/29 09:05:36 Note: 0 total PRE script deferrals because of -MaxPre limit (0) 01/29 09:05:36 Note: 0 total POST script deferrals because of -MaxPost limit (0) 01/29 09:05:36 Warning: ReadMultipleUserLogs destructor called, but still monitoring 1 log(s)! 01/29 09:05:36 **** condor_scheduniv_exec.376286.0 (condor_DAGMAN) pid 10470 EXITING WITH STATUS 1