[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Email Error notication



Hello,

ÂÂÂÂ I thought I'd take the liberty to explain this a bit further.

ÂÂÂÂ I've encountered an issue with having a generated error email notification sent to my email account after there is an error with my jobs. I understand that if the job completes I receive an email notification if I specified "complete" or "always" in my submit file.

ÂÂÂÂ When I specify complete or always, I received an automated email about the job being complete. "condor_history" also lists jobs as either complete or removed.

In the command reference manual it states this concerning error email notification:
"If defined by Error, the owner will only be notified if the job terminates abnormally, or if the job is placed on hold because of a failure, and not by user request".

I've ran multiple jobs, in this job below the log file returned a non-zero value yet the job was complete. How do I get an error email notification when there is an "error' with the job, i.e. a non-zero value?

Submit File:

UniverseÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = Vanilla
executableÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ÂÂ = run_framework.pl
argumentsÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Â = "/data/data284/uchenna/testproj/test/ INTEL TEST2.cfg $(jobname).run.out"
logÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Â ÂÂ = $(jobname).log
outputÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $(jobname).out
errorÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = $(jobname).err
initialdirÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = /data/data284/uchenna/testproj/test/
+IwdFlushNFSCacheÂÂ = False
Request_cpusÂÂÂÂÂÂÂÂÂÂÂÂÂ = 2
Request_diskÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = 20000
Request_memoryÂÂÂÂÂÂÂÂ = 10000
should_transfer_filesÂÂÂ = YES
notify_userÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = uchenna.ojiaku@xxxxxxxx
notificationÂÂÂÂÂÂÂ = Error
getenvÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = False
queueÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 1
~


Log File:

000 (2338.000.000) 11/15 15:10:28 Job submitted from host: <10.2.7.151:9618?addrs=10.2.7.151-9618&noUDP&sock=3972319_52bf_3>
...
001 (2338.000.000) 11/15 15:10:38 Job executing on host: <10.2.7.152:9618?addrs=10.2.7.152-9618&noUDP&sock=3262762_3d5a_2>
...
006 (2338.000.000) 11/15 15:10:38 Image size of job updated: 2
ÂÂÂÂÂÂÂ 0Â -Â MemoryUsage of job (MB)
ÂÂÂÂÂÂÂ 0Â -Â ResidentSetSize of job (KB)
...
005 (2338.000.000) 11/15 15:10:39 Job terminated.
ÂÂÂÂÂÂÂ (1) Normal termination (return value 255)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Usr 0 00:00:00, Sys 0 00:00:00Â -Â Run Remote Usage
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Usr 0 00:00:00, Sys 0 00:00:00Â -Â Run Local Usage
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Usr 0 00:00:00, Sys 0 00:00:00Â -Â Total Remote Usage
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Usr 0 00:00:00, Sys 0 00:00:00Â -Â Total Local Usage
ÂÂÂÂÂÂÂ 0Â -Â Run Bytes Sent By Job
ÂÂÂÂÂÂÂ 1576Â -Â Run Bytes Received By Job
ÂÂÂÂÂÂÂ 0Â -Â Total Bytes Sent By Job
ÂÂÂÂÂÂÂ 1576Â -Â Total Bytes Received By Job
 Partitionable Resources : Usage Request Allocated
ÂÂÂÂÂÂÂÂÂÂ CpusÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ :ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂÂÂÂÂ 2
ÂÂÂÂÂÂÂÂÂÂ Disk (KB)ÂÂÂÂÂÂÂÂÂÂÂ :ÂÂÂÂÂÂ 11ÂÂÂ 20000ÂÂÂÂ 65979

condor_history results:

IDÂÂÂÂ OWNERÂÂÂÂÂÂÂÂÂ SUBMITTEDÂÂ RUN_TIMEÂÂÂÂ ST COMPLETEDÂÂ CMD
2339.0ÂÂ uchenna.ojiaku 11/15 15:22ÂÂ 0+00:00:03 CÂ 11/15 15:22 /data/data284/
2338.0ÂÂ uchenna.ojiaku 11/15 15:10ÂÂ 0+00:00:02 CÂ 11/15 15:10 /data/data284/
2337.0ÂÂ uchenna.ojiaku 11/15 15:08ÂÂ 0+00:00:02 CÂ 11/15 15:08 /data/data284/
2336.0ÂÂ uchenna.ojiaku 11/15 13:47ÂÂ 0+00:00:02 CÂ 11/15 13:47 /data/data284/


Regards,

Uche Ojiaku





On Tue, Nov 15, 2016 at 3:34 PM, Uchenna Ojiaku - NOAA Affiliate <uchenna.ojiaku@xxxxxxxx> wrote:
Hello,

I've set my email notification. But when there's an error in the log file from a job I don't get the email. I only get emails when the job completes and when the notification is set to "complete" or "always" in the submit file. I don't get an error email notification when I set it to "error". How can I fix this issue?

Regards,

Uche Ojiaku