[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] monitor for dagman job to finish






On Apr 29, 2014, at 5:11 PM, "R. Kent Wenger" <wenger@xxxxxxxxxxx> wrote:

> On Tue, 29 Apr 2014, Jiande Wang wrote:
> 
>> Thanks for your suggestion.
>> My purpose is for running real time operational job. So once the dagman job is launched, I need to get some kind of signal inside shell script (like return code in unix shell) to know that the whole set of job inside dagman is finished, so that I can do some error checking step for my executable and run script. Email notification method does not fit my purpose for security reason in the place I am working.
> 
> Ah, it helps to understand your problem a bit better.
> 
> There are two alternatives that come to mind:
> 
> 1. You can do condor_wait on the condor_dagman job.  The condor_dagman job doesn't exit from the queue until the whole workflow has finished.
In the CONDOR manual, it says condor_wait check for log files from condor script, do you mean it also works for log file from dagman?  I note when submit dagman job, it generates a log file automatically with name like xxxx.dagman.log. Can I apply condor_wait for this log file for my purpose?
> 
> 2. You can put a FINAL node into your DAG.  A FINAL node is always run at the end of the workflow, even if the workflow fails.  So you can do error checking, etc., in your FINAL node.

In CONDOR manual, it says this kind of method does not work for dagman with parent/child relation.
> 
> Kent Wenger
> CHTC Team
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/