Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Dagman error with 6.9.1 on windows
- Date: Tue, 16 Jan 2007 13:10:10 +0100
- From: Horvátth Szabolcs <szabolcs@xxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Dagman error with 6.9.1 on windows
Hi Dan,
Sorry for the delay. In the meantime Kent told me that the error code
indicates a "DLL not initialized" error on windows, so its kind of an
ethereal problem.
The expressions in the submit file are right, and the same job
submitted after a restart usually works ok.
(As I wrote I saw and logged this kind of error already but before
6.9.1 it did not crash the schedule.)
Anyway, thanks for fixing the bug and taking the time to investigate.
Cheers,
Szabolcs
Dan Bradley wrote:
Szabolcs,
I investigated your report and found a bug in 6.9.1. I'm very sorry
about that!
I have yet to identify the full effects of this bug, but it certainly
strikes in the case you found, where OnExitRemove evaluates to
UNDEFINED, and also when OnExitHold evaluates to UNDEFINED.
The bug is fixed for 6.9.2.
Now the question is why your OnExitRemove expression is evaluating to
UNDEFINED. I assume your dag condor.sub file contain the usual expression:
on_exit_remove = ( ExitSignal == 11 || (ExitCode >=0 && ExitCode <= 2))
Unfortunately, I can't answer that myself by looking at your report,
because the log message is not reliable when it claims the OnExitRemove
expression was never set. I've fixed that too for the next release.
What I observe about this expression is that it evaluates to undefined
when ExitSignal is undefined and (ExitCode < 0 || ExitCode > 2). I
really doubt that is intended. I'll find out and get this expression
fixed if it is indeed broken.
--Dan