[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] criteria for non-DAG job failures?



Nathan,

The _expression_ you gave me had the effect that the jobs with non-zero return codes get placed on hold. That is good. However, with "notification = error" I would have liked Condor to send me an email about such job failures, but that does not seem to happen.

Someone I discussed this with have found this: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1976 The behavior described there is exactly what I get. The issue is known but also appears as fixed in v7.7.5. I am on v7.8. Could this be a regression or am I missing more configuration settings?

Thank you,
Vlad


On Wed, Jun 27, 2012 at 11:26 AM, Nathan Panike <nwp@xxxxxxxxxxx> wrote:
On Wed, Jun 27, 2012 at 11:06:21AM -0400, Vlad wrote:
> Greetings,
>
> Condor documentation provides some details for what's considered to be a job failure for DAG submissions (e.g. http://research.cs.wisc.edu/condor/manual/v7.8/2_10DAGMan_Applications.html#SECTION003105000000000000000) and that seems to cover process exit codes.
>
> What about non-DAG (cluster) jobs? I use 'notification = error' and the empirical observation (using a very new v7.8 install) is that I do get emails when jobs crash as a result of SIGBUS, etc. However, if a job returns with a non-zero error code (e.g. non-zero return from main() in C/C++) there are no emails. Is it possible to change this behavior? Could this be a matter of changing the default Condor configuration or using the appropriate submit descriptor incantation?
>

Vlad,

For pool-wide configuration, you can use the following config line:

SYSTEM_PERIODIC_HOLD = ExitBySignal =?= True || ExitCode =!= 0

You could put a similar line in your submit file for per-job
configuration:

=?= True || ExitCode =!= 0
notification = Error

Nathan Panike
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/