[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RemoteWallClockTime broken down per run?



Hi Thomas,

Unfortunately we don't have anything native in HTCondor that breaks
down RemoteWallClockTime into individual runs like you're asking for.

However, we just introduced a new feature that may help. In our
upcoming HTCondor v9.4.0 release (shipping next month) we're adding a
new attribute called LastRemoteWallClockTime. This just records the
runtime for the last job execution.

Do you think that by polling the job, or maybe using some clever
scripts that run whenever a job iteration completes, you could grab
the information from there?

Another idea: I think all the information you're looking for is in the
job event log (or global event log) which shows all the individual
execution start/stop times. You could probably write a very simple
Python script using our JobEventLog API to scrape this information.
Would that get you what you need?

Mark

On Tue, Nov 2, 2021 at 11:23 AM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
>
> Hi all,
>
> is there a job class ad like RemoteWallClockTime (or CommittedTime),
> that is broken down per individual runs?
>
> Background is, that I would like to calculate power usage statistics for
> our users' jobs.
> Thus, we add a few benchmark values as additional machine ads. After
> injecting these machine ads via a transform into the jobs, I can in
> principle calculate my stats with these [2].
> However, unfortunately not all our users' jobs are single job runs. So,
> I would need to sum over all run iterations of a job - which might have
> run on different nodes with different benchmark values.
> But AFAIS `RemoteWallClockTime` is the total wall time over all job runs
> - where I would need the wall times broken down per run [3]
>
> Is there a job ad, that describes the wall time per run - or am I
> probably overthinking? ð
>
> Cheers,
>   Thomas
>
>
> [1]
> JobMachineAttrs = "HS06PerSlot HS06perWatt..."
>
>
> [2]
> > condor_history 151. -af "RemoteWallClockTime/60.0/60.0 * RequestCpus *
> MachineAttrHS06PerSlot0 / MachineAttrHS06perWatt0"
> 0.1323784722222222
>
> [3]
>   RemoteWallClockTime0 * ... / MachineAttrHS06perWatt0
>   +
>   RemoteWallClockTime1 * ... / MachineAttrHS06perWatt1
>   +
>   ...
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/



-- 
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison