[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] postCMD-like exit wrapper with access to job environment/ads?



Hi Mark,

many thanks for the suggestion - yes, chirp might be promising to
exchange information between the payload and pre/postCmd (however, I am
not sure, how much load it might put on the schedulers in cases with
many (failing) jobs)
Another idea might be to try to extract 'accounting' information from
the starters/payloads cgroups - I have not tested it but I would assume,
that a preCmd/postCmd processes would be started as process under the
original job's cgroups (and not tlike the starters under the parent
Condor groups).

Aim would be to have each job on the cluster send a basic set of metrics
via Kafka to a ES DB, i.e., on the schedulers inject such a postCmd
transform into each job.
(Probably it is not fully within Condor's philosophy to circumvent the
schedulers in this way... ;) )

Cheers,
  Thomas

On 18/02/2022 00.20, Mark Coatsworth wrote:
> Hi Thomas,
> 
> I think that PostCmd is still your best option here. As you mentioned
> this does not have access to the job's environments. However you could
> use condor_chirp to publish whatever information you need to the
> PostArgs or PostEnv job attributes, then access these from your
> script?
> 
> What metrics information are you looking for, is this available in the
> job ad? If so, another option could be to run your job in a DAGMan
> node with a SCRIPT POST that uses condor_history to extract this
> information. Would that work?
> 
> Mark
> 
> On Tue, Feb 15, 2022 at 7:35 AM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
>>
>> Hi all,
>>
>> has somebody maybe an idea or recipe on a admin script, that is run
>> after each jobs payload exit?
>> AFAIS, postCMD is realized in its own  environmant and does not has
>> necessarily access to the actual job's environments and ads/attributes?
>>
>> Alternatively a post-cmd-like job wrapper extension might be difficult.
>> AFAIS such a post-job cmd would have to be forked from the wrapper
>> (before the wrapper starts the actual payload and detaches/exits) and
>> would have to monitor/run in parallel to the payloads startd/process
>> until the end, or?
>>
>> Background is, that we would like to wrap-up each job with a small
>> accounting script, that collects basic job metrics and forward these to
>> a local accounting DB (Kafka, ES,...).
>>
>> Cheers,
>>   Thomas
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature