[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Suppress Windows error dialogs popping up for crashing Condor jobs



Derrick, I have run into similar problems and generally this is handled in 
the application. One thought is to check if the developers can add a 
switch that causes the program to exit with a STDOUT error code versus a 
popup message. I was working on a numerical hydrologic model that was 
written by someone else in Fortran and they essentially had a popop that 
required the user to click ok when the program completed successfully (as 
if you would not know the program completed its analysis successfully). 
Anyhow, I was able to change the underlining code so popups did not occur. 
I would imagine this could be done in your case.

Most of my applications that I run are wrapped inside a python script, 
which allows me a better programming language then using something like 
DOS batch files. VBS or something else could also be used. I had also 
looked into sendkeys, but I had a difficult time getting this to work 
because there was something different about the window station environment 
(a popup occurs, but it does not actually exist) and although sendkeys 
worked running the application locally, it would not work when executed 
via condor. 

A couple other ideas are to evaluate the CPU for the exe task. If it falls 
below a threshold and remains there for a certain duration then kill it. 
You can also set a maximum runtime for a condor job and if this is 
exceeded then kill it. Although these methods work, in my opinion the best 
method is to add a switch or something that allows errors messages to be 
sent to STDOUT versus a popup. There may be a better way, but this is what 
I did in the past.


mike





From:
Derrick Karimi <derrick.karimi@xxxxxxxxx>
To:
Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date:
05/18/2011 07:13 AM
Subject:
[Condor-users] Suppress Windows error dialogs popping up for    crashing 
Condor jobs
Sent by:
condor-users-bounces@xxxxxxxxxxx



Hi,

I am working on fault tolerance on our system.  When our job's 
run sometimes they crash.  I told the developers to fix the code but they 
told me to rerun the job because they can't reproduce the problem...I will 
work on their attitude later.

My problem was windows popping up various error reporting and crash 
dialogs.  When the dialog pops up the process won't exit till the user 
clicks OK, and eventually condor will restart the job.  The first process 
is still holding resources and the second process keeps failing.  After 
mucking with 4 different places in the registry and UI on xp, vista and 7 
(as wall as every place in the UI I could control error reporting, and 
disabling the error reporting service), I was still seeing popups.  I 
started using the windows SetErrorMode function, which in practice only 
worked for me on Windows 7 and Vista.  I was still seeing a popup 
Application Error, memory could not be "read" on a simple null value 
dereference

Finally I came across the article
http://support.microsoft.com/kb/128642


which tells you to set in the registry: 
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Windows\ErrorMode = 2

This seems to suppress the failure dialog on the XP systems.  
As a Note: I am still not sure if you need to also disable the Dr. Watson 
debugger...but I have done that on the way to finding this solution.

--Derrick_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/