[Gems-users] Trying to make Sense of Ruby's L2 miss numbers


Date: Fri, 09 Jan 2009 19:17:07 -0600
From: Philip Garcia <pcgarcia@xxxxxxxx>
Subject: [Gems-users] Trying to make Sense of Ruby's L2 miss numbers
Hello,
I was recently looking through my Ruby stats file after running a simulation and just wanted to get the number of L2 cache misses from my run. However, when I started looking at the numbers, things seemed a bit off. I'm basically running a 2 processor system (using the MOESI_CMP_directory_m protocol). Both "cpus" are regularly interchanging data, so that might be responsible for some of the strangeness I'm seeing, but that doesn't quite seem right.

The raw L2 stats are as follows:

Total_misses: 6036454
total_misses: 6036454 [ 3219996 2816458 ]

however, the memory controller says that the total number of requests submitted to it is only 1,897,959. This is far less than the total number of misses. While it's possible that the others are just sharing requests, I'm unsure of how that would effect the timing. To measure this I've added histograms for the measured latencies for requests coming in from every L1 cache. This results in the following data:

[binsize: 2 max: 187 count: 559652707 average: 2.23477 | standard deviation: 2.92835 | 0 553582652 0 0 0 0 0 0 3334660 0 66 2040058 293102 54 10 4757 54 0 3 112 0 417 0 5 2 1 120 0 5 112 0 474 19 13 6 6 0 0 6 0 7 3 0 1 147797 167476 56906 13032 3445 2621 965 332 243 170 195 176 131 131 134 143 132 140 145 137 162 131 222 333 404 171 62 14 11 3 1 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 2 3 5 1 ]
For node 1 we have the following latencies:
[binsize: 2 max: 202 count: 56302897 average: 2.88856 | standard deviation: 7.68617 | 0 55023311 0 0 0 0 0 0 310505 4 0 608329 410 4 0 14 0 1 0 0 0 0 1 0 3 0 0 10 0 7 0 2 7 0 101 0 0 0 0 0 0 0 0 0 105829 129676 72405 21235 3184 3054 3174 1735 930 315 404 309 308 271 276 134 135 132 168 147 193 296 295 778 10722 2070 777 131 358 196 120 59 69 40 47 10 14 6 12 3 11 2 8 1 7 2 7 5 99 18 3 0 2 2 3 0 0 1 ]

This is where things get interesting, as the total number of L1 misses for node 0 is 6,070,055, and node 1 it is 1,279,575 (these are all requests that don't hit in the 2 cycle SEQUENCER_TO_CONTROLLER_LATENCY). If I subtracting the first bin of L2 hits from these numbers (3,334,660 and 310,505), as I'm assuming these have to be hits in the L2 cache (As they're at the minimum L2 cache latency), that leaves 2,735,395 memory requests for node 0, and 969,073 for node 1. Together, these don't add up to the total number of L2 cache misses at all, and they also don't match the number of requests submitted to the memory controller.

The only thing I'm unsure of that might effect these numbers is the prefetch requests, as I don't know if prefetches can generate L2 misses, and/or get counted in my latency histograms offhand. Unless I'm completely confused about what the latencies for L2 cache accesses found in another processors L1 cache are (but they should be at least a couple cycles longer that L2 cache accesses, as they should require additional transitions), I don't quite see how Ruby is getting it's total number of misses, and how the memory controller is getting its numbers.

Any help in deciphering these numbers would be greatly appreciated.

thanks,
Phil
[← Prev in Thread] Current Thread [Next in Thread→]
  • [Gems-users] Trying to make Sense of Ruby's L2 miss numbers, Philip Garcia <=