[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job ExitCode differnt on Windows 7 vs Windows XP



Ian, thanks for the response.

You are right that I stated the subject incorrectly.  I actually tracked the problem down to "exit /b" is behaving different on Win7 vs XP.  If I remove the /b from the exit line of my .bat file, cmd seems to propagate the error back to condor.  I made some scripts similar to what you listed to track down the problem.  

--Derrick

On Thu, May 19, 2011 at 3:12 PM, Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx> wrote:

On Wednesday, May 18, 2011 at 10:44 PM, Derrick Karimi wrote:

I am trying to add fault tolerance to my condor pool.  I am attempting to retry jobs up to 5 times if they return a non zero ExitCode.using requirements in the submission file:
== 0 || (ExitCode != 0 && JobRunCount >= 5)

This is working on Windows 7 machines, but not on my Xp machines.  Condor believes the return code of the failing jobs is always zero on the XP machines.
A better way to say this would be:

The command "C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat" on Windows XP has an ERRORLEVEL of 0 and on Windows 7 an ERRORLEVEL of 1 (Windows return code speak).

It isn't that Condor is always returning zero, it's that cmd.exe is always returning 0 on XP and Condor is just echo'ing this back to you.
 
I have attached snippets from two StarterLogs, one on a Win7 slot, and one on an XP Slot.  In each case I have logged onto the machine a job was running on and stimulated a failure in the same way.  I have assured in my application logs, and job stdout log that the .bat file that is referenced as the command in the submit file is returning a non zero error code.  I think I am returning error code from the .bat file in the "right" way.

I am using Condor 7.2.5.  Does anyone know if this was a bug that was fixed?
Doubtful since there isn't likely a Condor bug here -- this looks a like a fundamental difference between cmd.exe on Windows XP and Windows 7. IIRC cmd.exe on XP was limited to be able to return 0 or 1 errorlevel codes. See: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds_shelloverview.mspx?mfr=true

There's a note in that page that says:

"If a command completes an operation successfully, it returns an exit code of zero (0) or no exit code."

You can try some tests to convince yourself of this. If I have


test.bat:
@echo off
exit /b %1



I can call it from a cmd prompt and see that it works:

C:\tmp>.\test.bat 0

C:\tmp>echo %ERRORLEVEL%
0

C:\tmp>.\test.bat 1

C:\tmp>echo %ERRORLEVEL%
1

C:\tmp>.\test.bat 2

C:\tmp>echo %ERRORLEVEL%
2



So that works, but now call it the same way Condor has to call it:



C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 0

C:\tmp>echo %ERRORLEVEL%
0

C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 1

C:\tmp>echo %ERRORLEVEL%
0

C:\tmp>C:\WINDOWS\system32\cmd.exe /Q /C test.bat 2

C:\tmp>echo %ERRORLEVEL%
0



Repeat on Windows 7 to see if cmd.exe has gotten better an echoing the error level of the last command it runs.

It seems pretty critical, so perhaps there is some other explanation for the behavior I am seeing.  I need some help.  Is this the kind of grief I should expect when working with .bat files?
More the kind of grief you should expect from working with Windows XP. Error levels aren't echoed by cmd.exe.

Regards,
- Ian 

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--
--Derrick