Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0

Date: Thu, 17 Aug 2006 16:02:49 +0200
From: Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx>
Subject: [Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0

Hi,

For quite a while - using the 6.7.x series - we used a script to restartparent dependent child jobs by traversing the hierarchyand restarting jobs (using hold + release) that were required for thecompletion of a child job. (Sometimes software license issues,disk problems or data read / write errors can make a task unusable for awhile although restarting after a short amount of time makes

it work and the whole dag continue.)

The script restarts the parent jobs, waits for their completion andafter completion it modifies the child jobs' data using qeditand restarts the child jobs.(hold and release again). Now this worked okwith 6.7 but with 6.8 I get a DAG error message in the dagman.out fileand *all* tasks in the DAGMan job goes into the removed state. Thereason being: RemoveReason = "via condor_rm (by user szabolcs)"


8/17 15:53:02 BAD EVENT: job (34202.0.0) executing, total end count != 0 (1)

8/17 15:53:02 ERROR: aborting DAG because of bad event (BAD EVENT: job(34202.0.0) executing, total end count != 0 (1))

8/17 15:53:02 Aborting DAG...

Now this is not really good for me. Could you tell me what happens underthe hood? How can I avoid it and get my script working or simply

disable this "error" checking?

Thanks in advance!

Cheers,
Szabolcs

Follow-Ups:
- Re: [Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0
  - From: Peter F. Couvares
- Re: [Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0
  - From: R. Kent Wenger

Prev by Date: Re: [Condor-users] Maximum length of the requirements expression
Next by Date: Re: [Condor-users] Maximum length of the requirements expression
Previous by thread: Re: [Condor-users] Strange scheduling behavior in 6.8.0
Next by thread: Re: [Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Restarting completed dag jobs does not work anymore with 6.8.0