[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dagman "BAD EVENT" problems on Windows



On Tue, 18 Jan 2012, Kent Wenger wrote:
>> I will experiment with the DAGMAN_ALLOW_EVENTS hack. Thanks.
>>
>> I hope the attached may help you fix any bugs. I'd be happy to help 
>> you test.
>
> I took a quick look at the files you sent (thanks for sending such complete info) and I'm about 95% sure that the problem is this:
> 
>    MultiLogFiles: macros ('$(...') not allowed in log file name
>    (log_scen0.sim_$(seed).txt) in DAG node submit files
> 
> (this is an error message from the dagman.out file).
> 
> I would bet that if you change your log file names to not have a macro in them (or get rid of the log file name entirely) things will work okay.

I entirely removed the "log" specifier from the the submit files and the macro warnings are gone. The DAG still aborts eventually and dagman.out is still littered with BAD_EVENT messages. There is nothing to distinguish the last BAD_EVENT message from the many others before the DAG just seems to spontaneously abort. It seems to get farther along, but I cannot be statistically sure of that.

I will try again with DAGMAN_ALLOW_EVENTS set to '1'. If I still get the abort error, then what are the real consequences of setting DAGMAN_ALLOW_EVENTS to '5'? I really need this to work. I'm extremely nervous about this failure and have a lot staked on DAG working reliably. I'm happy to help debug.