[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] assertion, hang



Matt,

I don't know if a reboot is required for this to take effect or not.  I
know on our machines, the modal dialog was always visible somewhere - I
can't remember if you had to log in first or if it appeared over top of
the login dialog.  It's been two years since I did all initial tweaking
of my execution nodes so I'm fuzzy on some of the details.  By the way,
we use value = 1, I don't know why we chose that over 2 but that's what
we use.

Try downloading pslist.exe from sysinternals.com; it' free.  In fact,
get all their free tools, they are indispensable.  Log onto a machine
with the assertion failure and run pstools -t to get a process tree.
Look at the process tree for condor_starter to see if there is anything
else attached to it like a debugger.

Let me know what you find either way.

-Bryan

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthew Galati
Sent: Monday, March 20, 2006 12:20 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] assertion, hang

I tried changing this to value=2. However, it still seems to hang when I
hit an assertion failure. I logged onto the process nodes that had the
assertion failure. The e.xe is still in the process list, but there was
no dialog box that I could see. So, the registry change did not seem to
help. Do I need to reboot after registry change?

Matt 


> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Bryan S. Maher
> Sent: Monday, March 20, 2006 10:02 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] assertion, hang
> 
> Matt,
> 
> In addition to the UNC cmd.exe registry change, we also make 
> a registry change that suppresses the windows modal dialog 
> popup that occurs when an application crashes.  I believe 
> your application may be triggering this modal dialog and 
> until the dialog is cleared, Condor will think the 
> application is still running.
> 
> Change the value of the following registry key:
> 
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Windows\ErrorMode
> 
> MS KB229012 explains the value settings:
> 
> http://support.microsoft.com/?scid=http%3a%2f%2fwww.support.mi
> crosoft.co
> m%2fkb%2f229012%2fen-us%2f
> 
> Hope this helps.
> 
> -Bryan
> 
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthew Galati
> Sent: Friday, March 17, 2006 10:14 PM
> To: Condor-Users Mail List
> Subject: [Condor-users] assertion, hang
> 
> My condor pool consists of a set of machines running Windows 
> 2003 Server. All of my input, executables and output are on a 
> shared windows drive. Here is part of my sub:
> 
> ====
> environment =
> PATH=\\ordsrv3\ormpdata\bin\WinXP-Debug;c:\WINDOWS\system32;c:
> \WINNT\sys
> tem32
> executable  = condor_exec.bat
> initialdir  = \\ordsrv3\ormpdata\milprun\test_win
> transfer_executable = false
> should_transfer_files = NO
> requirements = (OpSys=="WINNT52")
> 
>    output   = 10teams.out
>    error    = 10teams.err
>    log      = 10teams.log
>    universe = vanilla
>    arguments = --parm \\ordsrv3\ormpdata\parm\milpwin.parm 
> --instance 10teams  queue 1
> 
>   
>    output   = 22433.out
>    error    = 22433.err
>    log      = 22433.log
>    universe = vanilla
>    arguments = --parm \\ordsrv3\ormpdata\parm\milpwin.parm --instance
> 22433
>  queue 1
> ====
> 
> I am using condor_exec.bat as a wrapper to my executable. If 
> I try to run the executable directly, I get Shadow Exception 
> at "CreateProcess".
> The .bat file was suggested on this mailing list - it seems to work.
> 
> condor_exec.bat:
> \\ordsrv3\ormpdata\bin\WinXP-Debug\exemilpNET.exe %*
> 
> 
> If my executable dies due to an assertion failure (this is a 
> C app, using assert( )), then the failure correctly reports 
> to stderr. However, the job seems to get hung. That is, it 
> stays in the condor queue indefinitely, as if condor does not 
> know that it is done - even after the assertion. Is there 
> some way to handle this situation? I want condor to treat the 
> assertion as a completion so that it moves on to the next in 
> the queue.
> 
> Thanks,
> Matt
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users