[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Email Error notication



Todd:

I am running Condor 8.5.8. I find that "notification = error" does not generate e-mail notifications for scripts that terminate "normally" with non-zero exit codes. This is true whether I set success_exit_code explicitly to 0 or allow its default behavior. Your link is suggestive that success_exit_code is tied into the retry mechanism rather than notifications so I'm not sure I should expect that.

I do find that your JobNotification hack works as advertised.

--
Tom Downes
Senior Scientist and Data CenterÂManager
Center for Gravitation, Cosmology and Astrophysics
University of Wisconsin-Milwaukee
414.229.2678

On Sun, Nov 20, 2016 at 12:36 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 11/15/2016 3:04 PM, Uchenna Ojiaku - NOAA Affiliate wrote:

> In the command reference manual it states this concerning error email
> notification:
> "If defined by /Error/, the owner will only be notified if the job
> terminates abnormally, or if the job is placed on hold because of a
> failure, and not by user request".
>
> I've ran multiple jobs, in this job below the log file returned a
> non-zero value yet the job was complete. *How do I get an error email
> notification when there is an "error' with the job, i.e. a non-zero value?*
>

Hi Uche,

As you discovered, when notification=Error, HTCondor sends email when there was an error launching the job (for instance, if the initial working directory or job executable is missing) or if the job exits with a signal.

If you want HTCondor to do something based upon a normal exit status code, you need to explicitly tell HTCondor what exit code(s) is/are considered "success" and, and what codes are considered failure.

In the upcoming HTCondor v8.5.8+ release, things are made more intuitive with the introduction of the "success_exit_code=X" macro in the job submit file. See https://is.gd/vsQvJk

In earlier versions of HTCondor, you can still achieve what you want via the power of ClassAds by replacing your "notification=error" line with one other line, although it is a bit non-obvious. In the HTCondor Manual in Appendix A, there is a list description of many of the job classad attributes, including the attribute JobNotification ( see https://is.gd/PeDlhv ). When you put "notification=complete" in your job submit file, condor_submit sets in the job classad "JobNotification=2", and when you put "notification=error" in the submit file, condor_submit sets "JobNotification=3". All classad attributes be set to be literals (like integers 2, 3), or they can be set to expressions that can use a bunch of functions including conditionals. So to achieve what you want to do, whereby email is sent even if a job runs ok but exits with a non-zero exit code, you can explicitly set JobNotification like the following example in your job submit file:

 executable = /bin/bash
 # Make notification=complete if ExitCode is non-zero, else make it error
 +JobNotification = IfThenElse(ExitCode=!=UNDEFINED && ExitCode=!=0, 2, 3)
 # So this job will not send email
 arguments = "-c 'exit 0'"
 queue
 # And this job will send email
 arguments = "-c 'exit 1'"
 queue

Hope the above helps. I realize the above is non-obvious, which is why we made things easier starting in HTCondor v8.5.8. But I hope the above is instructive re learning about the flexibility/power that ClassAds gives end users and administrators. Details about the ClassAd language is in section 4.1 of the Manual.

regards
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/