Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] best practice for periodic metric script along jobs

Date: Tue, 18 Jun 2019 09:46:31 +0200
From: Joan Josep Piles-Contreras <jpiles@xxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] best practice for periodic metric script along jobs

Hi,

We use for that update job info hooks (<Keyword>_HOOK_UPDATE_JOB_INFO)[1]. The default update period happens to be 5m, although that can beconfigured, and IIRC it's even run in the same environment of the job(although we don't use PID namespaces, so I don't know in that case. Wealso use job start and end hooks, which depending on your use case mightalso be helpful or not.


So far it has worked pretty well for us.

Best,

Joan

[1]:https://htcondor.readthedocs.io/en/v8_8_3/misc-concepts/hooks.html?highlight=%3CKeyword%3E_HOOK_UPDATE_JOB_INFO#work-fetching-hooks-invoked-by-htcondor


On 18/6/19 0:29, Todd Tannenbaum wrote:

On 6/17/2019 10:01 AM, Thomas Hartmann wrote:

Hi all,

I would like to ask, if there is some 'established best practice' to run
periodically a script along each job.

I.e., I would like to run a small metrics script periodically (~5m) for
each job, collect the output and add a summary of the metrics to the
job's summary.

I guess, it should work to start such a script as pre job process into
the background, loop/write the metrics in a separate file/pipe and
colelct the metrics by a post job script.
But I wonder, if there is a more Condor way(?), e.g., a cron for each
starter (startd?) and storing the metrics in an extra job class ad (or
adding it to the job log with a grep'able identifier)?

Cheers,
    Thomas


Hi Thomas!

A quick thought :  If you have control of the execute nodes involved,
you could set the config knobs

    USE_PID_NAMESPACES = True
    USER_JOB_WRAPPER = /some/path/monitor_my_jobs.sh

and monitor_my_jobs.sh could be:

    #!/bin/bash
    # Run my monitor script
    collect_metrics.sh &
    # Exec my actual job, keeping the same pid
    exec ""$@"

and collect_metrics.sh then monitor whatever you want.  The only
processes it would "see" would be the pids associated with the job
(which is what USE_PID_NAMESPACES=True does).  Every five minutes it
could publish metrics via
    condor_chirp set_job_attr_delayed <JobAttributeName> <AttributeValue>
which will cause the metrics to get published into the job classad so
they are visible in the history classad.  See "man condor_chirp".
Warning... the above was just the first idea I had, I didn't test it...

But a question I have for you... what metrics would your script collect?
   HTCondor is already collecting info about memory, cpu, local disk
usage, and a few others... what other metrics are you interested in?

Thanks
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Dr. Joan Josep Piles-Contreras
ZWE Scientific Computing
Max Planck Institute for Intelligent Systems
(p) +49 7071 601 1750

References:
- [HTCondor-users] best practice for periodic metric script along jobs
  - From: Thomas Hartmann
- Re: [HTCondor-users] best practice for periodic metric script along jobs
  - From: Todd Tannenbaum

Prev by Date: Re: [HTCondor-users] Workaround to use LDAP condor account
Next by Date: Re: [HTCondor-users] job does not run
Previous by thread: Re: [HTCondor-users] best practice for periodic metric script along jobs
Next by thread: Re: [HTCondor-users] best practice for periodic metric script along jobs
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] best practice for periodic metric script along jobs