[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job ExitCode differnt on Windows 7 vs Windows XP



On Wednesday, May 18, 2011 at 10:44 PM, Derrick Karimi wrote:

I am trying to add fault tolerance to my condor pool.  I am attempting to retry jobs up to 5 times if they return a non zero ExitCode.using requirements in the submission file:
== 0 || (ExitCode != 0 && JobRunCount >= 5)

This is working on Windows 7 machines, but not on my Xp machines.  Condor believes the return code of the failing jobs is always zero on the XP machines.
A better way to say this would be:

The command "C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat" on Windows XP has an ERRORLEVEL of 0 and on Windows 7 an ERRORLEVEL of 1 (Windows return code speak).

It isn't that Condor is always returning zero, it's that cmd.exe is always returning 0 on XP and Condor is just echo'ing this back to you.
 
I have attached snippets from two StarterLogs, one on a Win7 slot, and one on an XP Slot.  In each case I have logged onto the machine a job was running on and stimulated a failure in the same way.  I have assured in my application logs, and job stdout log that the .bat file that is referenced as the command in the submit file is returning a non zero error code.  I think I am returning error code from the .bat file in the "right" way.

I am using Condor 7.2.5.  Does anyone know if this was a bug that was fixed?
Doubtful since there isn't likely a Condor bug here -- this looks a like a fundamental difference between cmd.exe on Windows XP and Windows 7. IIRC cmd.exe on XP was limited to be able to return 0 or 1 errorlevel codes. See: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true

There's a note in that page that says:

"If a command completes an operation successfully, it returns an exit code of zero (0) or no exit code."

You can try some tests to convince yourself of this. If I have


test.bat:
@echo off
exit /b %1



I can call it from a cmd prompt and see that it works:

C:\tmp>.\test.bat 0

C:\tmp>echo %ERRORLEVEL%
0

C:\tmp>.\test.bat 1

C:\tmp>echo %ERRORLEVEL%
1

C:\tmp>.\test.bat 2

C:\tmp>echo %ERRORLEVEL%
2



So that works, but now call it the same way Condor has to call it:



C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 0

C:\tmp>echo %ERRORLEVEL%
0

C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 1

C:\tmp>echo %ERRORLEVEL%
0

C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 2

C:\tmp>echo %ERRORLEVEL%
0



Repeat on Windows 7 to see if cmd.exe has gotten better an echoing the error level of the last command it runs.

It seems pretty critical, so perhaps there is some other explanation for the behavior I am seeing.  I need some help.  Is this the kind of grief I should expect when working with .bat files?
More the kind of grief you should expect from working with Windows XP. Error levels aren't echoed by cmd.exe.

Regards,
- Ian 

-- 
Ian Chesal
ichesal@xxxxxxxxxxxxxxxxxx
http://www.cyclecomputing.com/