[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Bad event error in condor DAG



On Thu, 25 Aug 2005, Alexander Dietz wrote:

> I ran a DAG on a cluster with Fedora Core 3 (Heidelberg) and using
> condor version 6.7.10, but I always get a bad event error:
>
> 8/23 09:14:29 EVENT ERROR: job 1782278.0.0 ended; total end count != 1 (2)
> 8/23 09:14:29 WARNING: bad event here may indicate a serious bug in
> Condor -- beware!
> 8/23 09:14:29 Continuing with DAG in spite of bad event (EVENT ERROR:
> job 1782278.0.0 ended; total end count != 1 (2)) because of allow_events
> setting

I took a look at your dagman.out file, and I now know what the problem
is.  Sometimes, when a node job aborts, Condor writes both a terminated
and an aborted event in the job log.  This is actually a bug in Condor.

In this case, it isn't actually hurting anything, so you can ignore the
warnings.  (If you see a ULOG_JOB_TERMINATED event followed immediately
by a ULOG_JOB_ABORTED for the same job, don't worry about the warnings.)

Kent Wenger
Condor Team