Re: [HTCondor-devel] NRAO's "gantt chart" visualization of job runtimes in PATh Facility


Date: Mon, 06 Mar 2023 21:33:59 +0000
From: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
Subject: Re: [HTCondor-devel] NRAO's "gantt chart" visualization of job runtimes in PATh Facility
Hi Greg,

Thanks for sharing.

How should I think about these plots?  Looking at the scale, I see 5-10 minutes of stagein, 5-10 minutes of computing, and 1-2 minutes of stageout.  Good, bad, indifferent?

Example:

I picked a job at random and I see ~2.5 minutes of CPU-time.

So, if these are 1% scale jobs to understand how the data movement works, then at full scale the stage-in/-out is about 2% of the overall throughput.

If these are full scale, then stage-in/-out is >50% of the throughput...

Brian

On Mar 6, 2023, at 3:20 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:


All:

On the NRAO call this am, Kscott shared the visualization they are using to think about how their jobs run in PATh Facility, and how they try to see how much time input vs output transfer takes compared to the job, and when which jobs run on what machines.  I thought others might like to see this as well, so here is one screenshot -- each horizontal line represent one particular machine, yellow represents file transfer time, and blue and red are different kinds of job time.


-greg

<yShqfWh0B4fpeoK4.png>


[← Prev in Thread] Current Thread [Next in Thread→]