Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Detailled monitoring of a DAG

Date: Tue, 31 Aug 2021 12:33:23 +0200
From: Nicolas Arnaud <narnaud@xxxxxxxxxxxx>
Subject: [HTCondor-users] Detailled monitoring of a DAG


Dear all,

I have a DAG containing ~30 parallel "blocks", including each 3-4 jobsconnected by parent-child links. That DAG could be triggeredautomatically a dozen times per day or so and would run each time ondifferent "live" data.

What (Python) framework/approach would you recommend to monitor in adetailled way the running of each DAG instance? Which DAG/blocks/jobscompleted successfully or failed, how long each DAG/block/job took, whya particular job took that long (evictions, etc.), etc. I would then usethe individual DAG summary data to build long-term statistics, identifyproblems in my code or the software environment...

All that information is available combining the .dag and .dag.dagman.outfiles: are there existing tools that parse these and could be directlyused for or adapted to this goal?


Thanks in advance for your advices,

Nicolas

--

============================================
= Nicolas ARNAUD                           =
=                                          =
= Laboratoire de physique des deux infinis =
= IrÃne Joliot-Curie (IJCLab)              =
= CNRS/IN2P3 & UniversitÃ Paris-Saclay     =
=                                          =
= Virgo Experiment                         =
=                                          =
= European Gravitational Observatory (EGO) =
= Via E. Amaldi, 5                         =
= 56021 Santo Stefano a Macerata           =
= Cascina (PI) -- Italia                   =
= Tel: + 39 050 752 314                    =
============================================

Follow-Ups:
- Re: [HTCondor-users] Detailled monitoring of a DAG
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] Negotiator only allocating 1 job per machine per cycle
Next by Date: Re: [HTCondor-users] Detailled monitoring of a DAG
Previous by thread: Re: [HTCondor-users] update of condor-version and job-behaviour
Next by thread: Re: [HTCondor-users] Detailled monitoring of a DAG
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] Detailled monitoring of a DAG