[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagman error with 6.9.1 on windows



Hi Dan,

Sorry for the delay. In the meantime Kent told me that the error code
indicates a "DLL not initialized" error on windows, so its kind of an ethereal problem.
The expressions in the submit file are right, and the same job
submitted after a restart usually works ok.
(As I wrote I saw and logged this kind of error already but before 6.9.1 it did not crash the schedule.)

Anyway, thanks for fixing the bug and taking the time to investigate.

Cheers,
Szabolcs

Dan Bradley wrote:
Szabolcs,

I investigated your report and found a bug in 6.9.1. I'm very sorry about that!

I have yet to identify the full effects of this bug, but it certainly strikes in the case you found, where OnExitRemove evaluates to UNDEFINED, and also when OnExitHold evaluates to UNDEFINED.

The bug is fixed for 6.9.2.

Now the question is why your OnExitRemove expression is evaluating to UNDEFINED. I assume your dag condor.sub file contain the usual expression:

on_exit_remove  = ( ExitSignal == 11 || (ExitCode >=0 && ExitCode <= 2))

Unfortunately, I can't answer that myself by looking at your report, because the log message is not reliable when it claims the OnExitRemove expression was never set. I've fixed that too for the next release.

What I observe about this expression is that it evaluates to undefined when ExitSignal is undefined and (ExitCode < 0 || ExitCode > 2). I really doubt that is intended. I'll find out and get this expression fixed if it is indeed broken.

--Dan