[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] criteria for non-DAG job failures?
- Date: Tue, 3 Jul 2012 15:04:19 -0400
- From: Vlad <vlad@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] criteria for non-DAG job failures?
The _expression_ you gave me had the effect that the jobs with non-zero return codes get placed on hold. That is good. However, with "notification = error" I would have liked Condor to send me an email about such job failures, but that does not seem to happen.
Someone I discussed this with have found this: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1976 The behavior described there is exactly what I get. The issue is known but also appears as fixed in v7.7.5. I am on v7.8. Could this be a regression or am I missing more configuration settings?
On Wed, Jun 27, 2012 at 11:26 AM, Nathan Panike <nwp@xxxxxxxxxxx>
On Wed, Jun 27, 2012 at 11:06:21AM -0400, Vlad wrote:
> Condor documentation provides some details for what's considered to be a job failure for DAG submissions (e.g. http://research.cs.wisc.edu/condor/manual/v7.8/2_10DAGMan_Applications.html#SECTION003105000000000000000
) and that seems to cover process exit codes.
> What about non-DAG (cluster) jobs? I use 'notification = error' and the empirical observation (using a very new v7.8 install) is that I do get emails when jobs crash as a result of SIGBUS, etc. However, if a job returns with a non-zero error code (e.g. non-zero return from main() in C/C++) there are no emails. Is it possible to change this behavior? Could this be a matter of changing the default Condor configuration or using the appropriate submit descriptor incantation?
For pool-wide configuration, you can use the following config line:
SYSTEM_PERIODIC_HOLD = ExitBySignal =?= True || ExitCode =!= 0
You could put a similar line in your submit file for per-job
=?= True || ExitCode =!= 0
notification = Error
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: