Hello,
I was recently looking through my Ruby stats file after running a
simulation and just wanted to get the number of L2 cache misses from
my run. However, when I started looking at the numbers, things seemed
a bit off. I'm basically running a 2 processor system (using the
MOESI_CMP_directory_m protocol). Both "cpus" are regularly
interchanging data, so that might be responsible for some of the
strangeness I'm seeing, but that doesn't quite seem right.
The raw L2 stats are as follows:
Total_misses: 6036454
total_misses: 6036454 [ 3219996 2816458 ]
however, the memory controller says that the total number of requests
submitted to it is only 1,897,959. This is far less than the total
number of misses. While it's possible that the others are just
sharing requests, I'm unsure of how that would effect the timing. To
measure this I've added histograms for the measured latencies for
requests coming in from every L1 cache. This results in the following
data:
[binsize: 2 max: 187 count: 559652707 average: 2.23477 | standard
deviation: 2.92835 | 0 553582652 0 0 0 0 0 0 3334660 0 66 2040058
293102 54 10 4757 54 0 3 112 0 417 0 5 2 1 120 0 5 112 0 474 19 13 6 6
0 0 6 0 7 3 0 1 147797 167476 56906 13032 3445 2621 965 332 243 170
195 176 131 131 134 143 132 140 145 137 162 131 222 333 404 171 62 14
11 3 1 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 2 3 5 1 ]
For node 1 we have the following latencies:
[binsize: 2 max: 202 count: 56302897 average: 2.88856 | standard
deviation: 7.68617 | 0 55023311 0 0 0 0 0 0 310505 4 0 608329 410 4 0
14 0 1 0 0 0 0 1 0 3 0 0 10 0 7 0 2 7 0 101 0 0 0 0 0 0 0 0 0 105829
129676 72405 21235 3184 3054 3174 1735 930 315 404 309 308 271 276 134
135 132 168 147 193 296 295 778 10722 2070 777 131 358 196 120 59 69
40 47 10 14 6 12 3 11 2 8 1 7 2 7 5 99 18 3 0 2 2 3 0 0 1 ]
This is where things get interesting, as the total number of L1 misses
for node 0 is 6,070,055, and node 1 it is 1,279,575 (these are all
requests that don't hit in the 2 cycle
SEQUENCER_TO_CONTROLLER_LATENCY). If I subtracting the first bin of
L2 hits from these numbers (3,334,660 and 310,505), as I'm assuming
these have to be hits in the L2 cache (As they're at the minimum L2
cache latency), that leaves 2,735,395 memory requests for node 0, and
969,073 for node 1. Together, these don't add up to the total number
of L2 cache misses at all, and they also don't match the number of
requests submitted to the memory controller.
The only thing I'm unsure of that might effect these numbers is the
prefetch requests, as I don't know if prefetches can generate L2
misses, and/or get counted in my latency histograms offhand. Unless
I'm completely confused about what the latencies for L2 cache accesses
found in another processors L1 cache are (but they should be at least
a couple cycles longer that L2 cache accesses, as they should require
additional transitions), I don't quite see how Ruby is getting it's
total number of misses, and how the memory controller is getting its
numbers.
Any help in deciphering these numbers would be greatly appreciated.
thanks,
Phil
|