[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Suppress Windows error dialogs popping up for crashing Condor jobs



The error mode is the only sure way to prevent this.  You can stuff a
value in there at run-time if you have admin privs.

When developing in C++, application developers can wrap their code in a

try {
} catch (...) {
}

block ... and then simply (exit 11) if the program fails

11 is nice because it's the same as the linux segfault signal, and
might have a chance of being recognized by someone.  catch(...)
catches many memory faults.   I think some things can slip past it
though.

- Erik



On Wed, May 18, 2011 at 11:53 AM, Derrick Karimi
<derrick.karimi@xxxxxxxxx> wrote:
> Thanks for your input.  I still need a check like you are talking about with
> cpu usage and kill-time.  In particular I want to guard against
> programmer infinite loop bugs, or if they popped up a message box on
> purpose.  They know they aren't supposed to pop up an error message
> explicitly in a job, actually they are supposed to call our wrapped versions
> of all API calls, which if resulting in a GUI will go through a switch that
> just logs in condor mode.
> My problem was an unexpected crash that made windows produce an invalid
> memory access error message, and for some reason the programtic method of
> using windows API SetErrorMode was missing this one, for windows XP.  The
> registry key I listed fixed that, but it is a tough option to decide to tell
> the customer to edit their registry, or additional dev/test/doc time to
> develop a configuration tool for them.  If there is a way to keep that
> dialog from appearing all from in the code on XP I would love to know about
> it.
> As for implementing your idea of auto killing a long running job that was
> not using much CPU...do you implement this in condor with a periodic remove?
>  Or do you implement this in your a thread of the Condor job via it's python
> wrapper?
>
>
> --Derrick
>
> On Wed, May 18, 2011 at 9:30 AM, Michael O'Donnell <odonnellm@xxxxxxxx>
> wrote:
>>
>> Derrick, I have run into similar problems and generally this is handled in
>> the application. One thought is to check if the developers can add a
>> switch that causes the program to exit with a STDOUT error code versus a
>> popup message. I was working on a numerical hydrologic model that was
>> written by someone else in Fortran and they essentially had a popop that
>> required the user to click ok when the program completed successfully (as
>> if you would not know the program completed its analysis successfully).
>> Anyhow, I was able to change the underlining code so popups did not occur.
>> I would imagine this could be done in your case.
>>
>> Most of my applications that I run are wrapped inside a python script,
>> which allows me a better programming language then using something like
>> DOS batch files. VBS or something else could also be used. I had also
>> looked into sendkeys, but I had a difficult time getting this to work
>> because there was something different about the window station environment
>> (a popup occurs, but it does not actually exist) and although sendkeys
>> worked running the application locally, it would not work when executed
>> via condor.
>>
>> A couple other ideas are to evaluate the CPU for the exe task. If it falls
>> below a threshold and remains there for a certain duration then kill it.
>> You can also set a maximum runtime for a condor job and if this is
>> exceeded then kill it. Although these methods work, in my opinion the best
>> method is to add a switch or something that allows errors messages to be
>> sent to STDOUT versus a popup. There may be a better way, but this is what
>> I did in the past.
>>
>>
>> mike
>>
>>
>>
>>
>>
>> From:
>> Derrick Karimi <derrick.karimi@xxxxxxxxx>
>> To:
>> Condor-Users Mail List <condor-users@xxxxxxxxxxx>
>> Date:
>> 05/18/2011 07:13 AM
>> Subject:
>> [Condor-users] Suppress Windows error dialogs popping up for    crashing
>> Condor jobs
>> Sent by:
>> condor-users-bounces@xxxxxxxxxxx
>>
>>
>>
>> Hi,
>>
>> I am working on fault tolerance on our system.  When our job's
>> run sometimes they crash.  I told the developers to fix the code but they
>> told me to rerun the job because they can't reproduce the problem...I will
>> work on their attitude later.
>>
>> My problem was windows popping up various error reporting and crash
>> dialogs.  When the dialog pops up the process won't exit till the user
>> clicks OK, and eventually condor will restart the job.  The first process
>> is still holding resources and the second process keeps failing.  After
>> mucking with 4 different places in the registry and UI on xp, vista and 7
>> (as wall as every place in the UI I could control error reporting, and
>> disabling the error reporting service), I was still seeing popups.  I
>> started using the windows SetErrorMode function, which in practice only
>> worked for me on Windows 7 and Vista.  I was still seeing a popup
>> Application Error, memory could not be "read" on a simple null value
>> dereference
>>
>> Finally I came across the article
>> http://support.microsoft.com/kb/128642
>>
>>
>> which tells you to set in the registry:
>> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Windows\ErrorMode = 2
>>
>> This seems to suppress the failure dialog on the XP systems.
>> As a Note: I am still not sure if you need to also disable the Dr. Watson
>> debugger...but I have done that on the way to finding this solution.
>>
>> --Derrick_______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>>
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
> --
> --Derrick
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>