Hi Mark, many thanks for the infor and the suggestions! ð While in principle attractive, I am not sure if the JobEventLog would work well. My general idea would be a smallish tool to roughly approximate the energy consumption. Since our users do remote submits, I would need to inject a UserLog/JobEventLog file AFAIS. I am not sure, how well that would scales with some of our users' more odd jobs. Alternatively, I would aim for the general event log - which for the schedds should contain all their jobs state transitions. With forwarding the daemon events into a DB or into something parsable with Spark, it should be possible to prepare a query for the job runs and fold them with the node stats (not sure about the core count). While we are in principle writing the event logs as XML, I had to disable parsing them into JSONs & forwarding these into ES due to load issues - thus, I would be very interested in JSON as native event log output in the stable series (noticed it in 8.9/9.1) ð Cheers and thanks, Thomas On 03/11/2021 21.55, Mark Coatsworth wrote: > Hi Thomas, > > Unfortunately we don't have anything native in HTCondor that breaks > down RemoteWallClockTime into individual runs like you're asking for. > > However, we just introduced a new feature that may help. In our > upcoming HTCondor v9.4.0 release (shipping next month) we're adding a > new attribute called LastRemoteWallClockTime. This just records the > runtime for the last job execution. > > Do you think that by polling the job, or maybe using some clever > scripts that run whenever a job iteration completes, you could grab > the information from there? > > Another idea: I think all the information you're looking for is in the > job event log (or global event log) which shows all the individual > execution start/stop times. You could probably write a very simple > Python script using our JobEventLog API to scrape this information. > Would that get you what you need? > > Mark > > On Tue, Nov 2, 2021 at 11:23 AM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote: >> >> Hi all, >> >> is there a job class ad like RemoteWallClockTime (or CommittedTime), >> that is broken down per individual runs? >> >> Background is, that I would like to calculate power usage statistics for >> our users' jobs. >> Thus, we add a few benchmark values as additional machine ads. After >> injecting these machine ads via a transform into the jobs, I can in >> principle calculate my stats with these [2]. >> However, unfortunately not all our users' jobs are single job runs. So, >> I would need to sum over all run iterations of a job - which might have >> run on different nodes with different benchmark values. >> But AFAIS `RemoteWallClockTime` is the total wall time over all job runs >> - where I would need the wall times broken down per run [3] >> >> Is there a job ad, that describes the wall time per run - or am I >> probably overthinking? ð >> >> Cheers, >> Thomas >> >> >> [1] >> JobMachineAttrs = "HS06PerSlot HS06perWatt..." >> >> >> [2] >>> condor_history 151. -af "RemoteWallClockTime/60.0/60.0 * RequestCpus * >> MachineAttrHS06PerSlot0 / MachineAttrHS06perWatt0" >> 0.1323784722222222 >> >> [3] >> RemoteWallClockTime0 * ... / MachineAttrHS06perWatt0 >> + >> RemoteWallClockTime1 * ... / MachineAttrHS06perWatt1 >> + >> ... >> _______________________________________________ >> HTCondor-users mailing list >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/htcondor-users/ > > >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature