[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] RE: Condor, Windows XP,exit codes and the on_exit_remove setting



As further follow up to this problem I tried changing the exit code in
the perl script to 0, 1, 2 and 3 respectivily. For exit(0) the log for
the job reported:

	(1) Normal termination (return value 0)

But for exit(1), exit(2) and exit(3) the log for the job reported:

	(1) Normal termination (return value 1)

For all three cases.

I've also ran the wrapper.bat file from a cmd window to make sure I
could test the error level returned properly in Windows. For all four
cases the %errorlevel% environment variable had been set correctly to 0,
1, 2, or 3 based on how the perl script wrapped in the bat file was
calling exit().

So I'm down to thinking that this is indeed a Condor on Windows issue.
It seems to be only able to distinguish be error level 0 (success) and
anything else (failure).

Am I right? I didn't see the original message pass to the mailing list
(possibly because of the inclusion of the .bat file) so you may need to
read below for a little background.

Thanks!
Ian

-----Original Message-----
From: Ian Chesal 
Sent: September 1, 2004 5:21 PM
To: 'Condor-Users Mail List'
Subject: Condor, Windows XP, exit codes and the on_exit_remove setting


In my job ticket (attached to this email) I have the following line:

	on_exit_remove = (ExitCode != 2)

Which, as I understood things after reading the manual for
condor_submit, should ensure that my jobs are requeued if they exit with
code 2 but finish normally if they exit with any other code.

To test this theory I wrote a tiny perl script that simply called:

	exit(2);

I wrapped this perl script as a batch script so it would run as an
"executable" (using pl2bat) on windows and fired it off to condor with
the aforementioned submission ticket.

However instead of requeueing the job when it exited condor allowed it
to terminate. Looking at the output from the log and the email condor
sent me it appears that condor seems to think the job exited with status
code 1, not 2.

This could be the bat script wrapper (also attached but may get filtered
out by our corporate email server).

I thought I'd take a chance and ask first am I using this condor_submit
tag properly? And, can condor distinguish between exit codes on Windows
or is it 0/1 situation where 1 is all errors and 0 is success?

Thanks!

Ian

-----Original Message-----
From: Condor 
Sent: September 1, 2004 4:06 PM
To: Ian Chesal
Subject: [Condor] Condor Job 23.0


This is an automated email from the Condor system
on machine "ttc-ichesal-lnx.altera.com".  Do not reply.

Your Condor job 23.0 
	/ttcbatch/experiments/ichesal/condor/test/wrapper.bat
/experiments/ichesal/condor/test/no_sweep_parameter/adc_fir1
has exited normally with status 1.


Submitted at:        Wed Sep  1 16:00:24 2004
Completed at:        Wed Sep  1 16:05:33 2004
Real Time:           0 00:05:09

Virtual Image Size:  2 Kilobytes

Statistics from last run:
Allocation/Run time:     0 00:00:06
Remote User CPU Time:    0 00:00:00
Remote System CPU Time:  0 00:00:00
Total Remote CPU Time:   0 00:00:00

Statistics totaled from all runs:
Allocation/Run time:     0 00:00:06

Network:
    1.9 KB Run Bytes Received By Job
   66.1 KB Run Bytes Sent By Job


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Questions about this message or Condor in general?
Email address of the local Condor administrator: ichesal@xxxxxxxxxx The
Official Condor Homepage is http://www.cs.wisc.edu/condor