[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] postCMD-like exit wrapper with access to job environment/ads?



FWIW, every CMS job which runs in the Grid chirps back a lot of stuff,
never any problem. Stefano


On 22/02/2022 17:13, Thomas Hartmann wrote:
Hi Mark,

many thanks for the suggestion - yes, chirp might be promising to
exchange information between the payload and pre/postCmd (however, I am
not sure, how much load it might put on the schedulers in cases with
many (failing) jobs)
Another idea might be to try to extract 'accounting' information from
the starters/payloads cgroups - I have not tested it but I would assume,
that a preCmd/postCmd processes would be started as process under the
original job's cgroups (and not tlike the starters under the parent
Condor groups).

Aim would be to have each job on the cluster send a basic set of metrics
via Kafka to a ES DB, i.e., on the schedulers inject such a postCmd
transform into each job.
(Probably it is not fully within Condor's philosophy to circumvent the
schedulers in this way... ;) )

Cheers,
   Thomas

On 18/02/2022 00.20, Mark Coatsworth wrote:
Hi Thomas,

I think that PostCmd is still your best option here. As you mentioned
this does not have access to the job's environments. However you could
use condor_chirp to publish whatever information you need to the
PostArgs or PostEnv job attributes, then access these from your
script?

What metrics information are you looking for, is this available in the
job ad? If so, another option could be to run your job in a DAGMan
node with a SCRIPT POST that uses condor_history to extract this
information. Would that work?

Mark

On Tue, Feb 15, 2022 at 7:35 AM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:

Hi all,

has somebody maybe an idea or recipe on a admin script, that is run
after each jobs payload exit?
AFAIS, postCMD is realized in its own  environmant and does not has
necessarily access to the actual job's environments and ads/attributes?

Alternatively a post-cmd-like job wrapper extension might be difficult.
AFAIS such a post-job cmd would have to be forked from the wrapper
(before the wrapper starts the actual payload and detaches/exits) and
would have to monitor/run in parallel to the payloads startd/process
until the end, or?

Background is, that we would like to wrap-up each job with a small
accounting script, that collects basic job metrics and forward these to
a local accounting DB (Kafka, ES,...).

Cheers,
   Thomas
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/