Re: [Gems-users] Coherence misses


Date: Thu, 19 Mar 2009 14:15:18 -0400
From: Edward Lee <edwl202@xxxxxxxxx>
Subject: Re: [Gems-users] Coherence misses
On Thu, Mar 19, 2009 at 12:15 PM, Dan Gibson <degibson@xxxxxxxx> wrote:
You are using a broadcast protocol. Simply because a request is observed does not imply a state transitions. For instance, in S state, an OTHER_GETS doesn't need to change coherence state.

What you should look for is the data from the cache's - Transitions - section of the stats file. It will look something like this:
OM  Load  0 <--
OM  Ifetch  0 <--
OM  Store  0 <--
OM  L1_Replacement  0 <--
OM  Own_GETX  0 <--
OM  Fwd_GETX  0 <--
OM  Fwd_GETS  0 <--
OM  Ack  0 <--
OM  All_acks  0 <--

The format is:
[CurrentState] [messageType] [count] <--

You are looking for all the transitions that indicate a coherence miss, like:
S Other_GETX 45 <--
or
M Other_GETS 191 <--

Thanks again Dan for your prompt response. So, accordingly total coherence misses are something like this from my ruby.stats file

S  Other_GETX  25446 +   M  Other_GETS  26792 = 52238 (total coherence misses)

But, doesn't
O  Other_GETX
and
M  Other_GETX
cause coherence misses as well?



I also notice that some of your instruction fetch stats appear to be zero. Is this intentional on your part?
If not, verify you invoke Simics with the -stall flag, that you issue 'instruction-fetch-mode instruction-fetch-trace' and 'istc-disable' to simics before loading Ruby.

Here is a copy of SImics Driver Transaction Stats, Instr. requests seem to be 0.

Simics Driver Transaction Stats
----------------------------------
Insn requests: 0
Data requests: 20097728
Memory mapped IO register accesses: 6892964
Device initiated accesses: 0
Other initiated accesses: 0
Atomic load accesses: 7861
Exceptions: 10494
Non stallable accesses: 166754
Prefetches: 502504
Cache Flush: 0

However, I followed the guidelines from the wiki and ISCA tutorial, also some suggestions from previous threads. My script is like this:

Load warm-checkpoint (actually this checkpoint is created after loading a warm checkpoint and continue till first magic break where main computation starts)

@sys.path.append("../../../gen-scripts")
@import mfacet

istc-disable
dstc-disable
instruction-fetch-mode instruction-fetch-trace
cpu-switch-time 1
magic-break-enable
break-hap "Core_Magic_Instruction"

load-module ruby
ruby0.setparam g_NUM_PROCESSORS 8
ruby0.setparam g_MEMORY_SIZE_BYTES 2147483648
ruby0.setparam g_PROCS_PER_CHIP 1
ruby0.setparam g_NUM_L2_BANKS 16
ruby0.setparam L2_CACHE_NUM_SETS_BITS 13
ruby0.init

ruby0.load-caches fft-8p-caches.gz
ruby0.clear-stats

So, I am loading Ruby and making the necessary Simics changes at this point to speed-up the simulation. Am I doing something wrong?

Regards,

Ed



On Thu, Mar 19, 2009 at 10:52 AM, Edward Lee <edwl202@xxxxxxxxx> wrote:
Thanks Dan for your reply. I assume you are referring to "Chip Stats" section. I thought about that I was a little confused.

I am using MOSI_SMP_bcast, which means there is only one SLICC controller for L1 and L2 caches. Is this the reason I only see "L1Cache" under Chip Stats? Here is my output showing L1Cache and directory events. I think these are the totals for various other transition options from various cache states but anyways totals are fine for my purpose.  

I just want to verify my understanding here as I am not very confident with my interpretation:

I am not sure how to understand whether I used inclusive caches or not but I believe L2 cache is inclusive and since only one controller is present, cache-to-cache transfers only occur between different L2s. And below stats show actually those L2 stats and for coherence misses I should look at "L1Cache" stats not directory stats. 


And finally I am thinking of coherence misses as --> (Total of Other_* )
and the percentage of coherence misses as -->(Total of Other_* )  / (Total of all event counts in cache stats)

 --- L1Cache ---
 - Event Counts -
Load  133041
Ifetch  0
Store  56336
L1_to_L2  176418
L2_to_L1D  101743
L2_to_L1I  0
L2_Replacement  9650
Own_GETS  82664
Own_GET_INSTR  0
Own_GETX  44650
Own_PUTX  5091
Other_GETS  578648
Other_GET_INSTR  0
Other_GETX  312550
Other_PUTX  0
Data  118632

.....


 --- Directory ---
 - Event Counts -
OtherAddress  0
GETS  82664
GET_INSTR  0
GETX  44650
PUTX_Owner  5091
PUTX_NotOwner  0

Regards,

Ed


On Thu, Mar 19, 2009 at 9:04 AM, Dan Gibson <degibson@xxxxxxxx> wrote:
Total_misses are L2 misses -- probably not what you want. Towards the bottom of the stats file, there should be a summary of protocol transitions. Depending on your protocol, you should be able to get a notion of how many 'coherence misses' there are.

Regards,
Dan

On Thu, Mar 19, 2009 at 12:22 AM, Edward Lee <edwl202@xxxxxxxxx> wrote:
Let me try to summarize what I am trying to do, maybe I can get a feedback this time.

I am running FFT on an 8 processor SMP target using MOSI_SMP_bcast cache coherence protocol. I used the warm caches and loaded Ruby for the main computation only. And my purpose is to somehow measure the overhead of maintaining coherent caches. Accordingly, I would like to isolate different types of cache misses especially the coherence misses.

I got the ruby.stats file but I am not sure if I can use this output directly for what I need. I have the total misses as copied from my ruby.stats file like this:

Total_misses: 127314
total_misses: 127314 [ 22540 16951 16571 16564 12979 12803 12781 16125 ]
user_misses: 96467 [ 13183 12989 12742 12637 11184 11161 11144 11427 ]
supervisor_misses: 30847 [ 9357 3962 3829 3927 1795 1642 1637 4698 ]

I didn't paste the whole stats as it is quite large but my question is whether there is any information already existing in the ruby-stats file that can isolate different cache misses (global count is fine)? Or should I try to modify the profiler code to get this info?

Also, I have the number of misses but I don't see the total number of accesses in that section? So, would it be correct if I use the "Data requests" from "Simics Driver Transaction Stats"? However, the "Request missed" shows 189346 there, bigger than the misses shown above.

I would really appreciate any input on this.

Regards,

Ed



On Sun, Mar 15, 2009 at 12:09 AM, Edward Lee <edwl202@xxxxxxxxx> wrote:
Hi,

I am trying to isolate the cache misses according to their types. So, what would be the best way of differentiating cold, capacity and coherence misses?

Thanks,

Ed


_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.





--
http://www.cs.wisc.edu/~gibson [esc]:wq!

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.




_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.





--
http://www.cs.wisc.edu/~gibson [esc]:wq!

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.



[← Prev in Thread] Current Thread [Next in Thread→]