Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Bad event in condor

Date: Fri, 7 Oct 2005 11:59:39 -0500 (CDT)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [Condor-users] Bad event in condor

On Thu, 6 Oct 2005, Alexander Dietz wrote:

> I get a 'bad-event' eror when running a DAG with condor version 6.7.10.
> Could this be a bug or how to find out whats going on?
>
>
> 10/6 07:45:06 Event: ULOG_JOB_ABORTED for Condor Job 759b6aa466ffc2a17142d1dcba59db92 (800289.0)
> 10/6 07:45:06 EVENT ERROR: job 800289.0.0 ended; total end count != 1 (2)
> 10/6 07:45:06 WARNING: bad event here may indicate a serious bug in Condor -- beware!
> 10/6 07:45:06 Continuing with DAG in spite of bad event (EVENT ERROR: job 800289.0.0 ended; total end count != 1 (2)) because of allow_events
> setting

Yes, there's a bug.  Is your node job a Grid job?  There is a known
problem that Grid jobs are more likely to generate two terminated events
in the log.  This problem has at least been reduced in 6.7.12.

One thing to check -- look at the actual user log of your job, and check
that you just got two terminated events as opposed to the job actually
running twice.  If all that happened is that you got two terminated
events, it won't really hurt anything except that you'll get the above
warning in DAGMan.  Most likely this is what happened.

Kent Wenger
Condor Team

References:
- [Condor-users] Bad event in condor
  - From: Alexander Dietz

Prev by Date: Re: [Condor-users] Error initializing GAHP
Next by Date: [Condor-users] Condor based Meta Data Collector/Negotator for cross domain pooling
Previous by thread: [Condor-users] Bad event in condor
Next by thread: [Condor-users] Windows DAGMan attribute name changes in 6.7.12?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Bad event in condor