Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagman "BAD EVENT" problems on Windows

Date: Tue, 17 Jan 2012 17:23:41 -0600 (CST)
From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
Subject: Re: [Condor-users] Dagman "BAD EVENT" problems on Windows

On Tue, 17 Jan 2012, Rowe, Thomas wrote:

I'm running Condor Stable on Windows. A couple times I've seen my bigDAGs die with incomprehensible "BAD EVENT" stuff. The dagman.out logbelow seems to indicate 5886 exits successfully, but then an unexpectedULOG_EXECUTING event happens for no clear reason?
There are a bunch of these "bad event" messages scattered throughoutthe log alongside "Continuing with DAG in spite of bad event". But thensuddenly "Aborting DAG" happens and everything gets condor_rm'ed. Ican't figure out what the proximate issue to the "Aborting DAG" messageis.

In general DAGMan doesn't like to see any events for a job after theTERMINATED event. I'll have to look at the code -- it may be that, evenafter a bad event, DAGMan completes a "cycle" of reading events, so that

may be why the abort happens some time after the bad events.

At any rate, you should be able to avoid the DAG aborting by setting theDAGMAN_ALLOW_EVENTS configuration parameter appropriately (see

http://research.cs.wisc.edu/condor/manual/v7.7/3_3Configuration.html#sec:DAGMan-Config-File-Entries).
If you set it to 1, I think that should avoid the DAG aborts in your case.

You can set DAGMAN_ALLOW_EVENTS with a DAG configuration file (see
http://research.cs.wisc.edu/condor/manual/v7.7/2_10DAGMan_Applications.html#SECTION003106500000000000000)
or by setting the environment variable _CONDOR_DAGMAN_ALLOW_EVENTS in
the shell in which you run condor_submit_dag.

I'm curious what's going on though, because at least some of your badevents happened on DAG-level NOOP jobs, which seems really weird. Can yousend me a copy of your dag file and your dagman.out file? I'd like totake a look at them to try to figure out what is going on, rather thanjust working around the problem with the DAGMAN_ALLOW_EVENTS setting.


Kent Wenger
Condor Team

References:
- [Condor-users] Dagman "BAD EVENT" problems on Windows
  - From: Rowe, Thomas

Prev by Date: [Condor-users] Dagman "BAD EVENT" problems on Windows
Next by Date: Re: [Condor-users] Dagman "BAD EVENT" problems on Windows
Previous by thread: [Condor-users] Dagman "BAD EVENT" problems on Windows
Next by thread: Re: [Condor-users] Dagman "BAD EVENT" problems on Windows
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Dagman "BAD EVENT" problems on Windows