Re: [Gems-users] [MOESI CMP TOKEN] Several Questions


Date: Wed, 30 Nov 2005 10:39:59 -0600
From: Dan Gibson <degibson@xxxxxxxx>
Subject: Re: [Gems-users] [MOESI CMP TOKEN] Several Questions
Those results do seem unusual. The 1MB vs 2MB data seems to make sense, though the 512KB L2 size is quite strange. What are the working-set sizes for your applications? Very large or very small working sets can be less sensitive to cache size. It could be that the working sets overwhelm the L2 regardless of the three configurations below...try using a very large (~2x working set size) L2 cache.

As for the instruction count, consider this:
In multithreaded applications, there is some interthread interaction, through data sharing, cooperative caching, and synchronization. Changing the caches changes the interactions here...suppose a processor is spinning, waiting for a lock to release. The length of the spin (and the number of instructions executed as a result of the spin) is influenced by _other_ processor's cache performance (especially the thread holding the lock!).
I was initially confused by this behavior as well...it is subtle.

Regards,
Dan

thethem wrote:

Thanks for responding so quickly, Dan!  Let me clarify...

> Actually, the count of Ruby_cycles is the performance metric in Ruby.
> It is true that they are proportional to the simulation time, but
> simulation time (host execution time) is proportional to simulated
> time (target execution time).

If this is true then some of my results are confusing. For example, all other being equal, a larger L2 cache results in a longer execution time even with fewer cache misses. This seems backwards to me. I'll list some parameters and results below. The only item that was changed from experiment to experiment was the L2 size:

CMP 1x8 (one chip 8 procs per chip)
2MB L2
     Ruby_cycles: 2388071860
     instruction_executed: 22174042603
     cycles_per_instruction: 0.861574

CMP 1x8 (one chip 8 procs per chip)
1MB L2
     Ruby_cycles: 2638412275
     instruction_executed: 24980575450
     cycles_per_instruction: 0.844948

CMP 1x8 (one chip 8 procs per chip)
512kB L2
     Ruby_cycles: 2470006953
     instruction_executed: 23467006962
     cycles_per_instruction: 0.842036


I'm confused about the difference in instruction count. Since they are executing the same number of instructions, according to Simics, shouldn't the instructions that Ruby sees be approximately the same from experiment to experiment?

As for Ruby cycles, I would expect it to decrease for larger cache sizes (up to a point). The miss rate that Ruby is reporting makes sense, but I can't seem to figure what's happening with the simulation time. The results above are from the Splash-2 benchmark, Ocean. I have results from dbench2 which are similar.

Thanks for your time, Dan.

~Clay

Dan Gibson wrote:
Hello, Clay.

Let me answer your questions individually, below.

At 08:37 AM 11/30/2005 -0500, you wrote:

Hello everyone,

I've got several questions about the output from the Ruby module so I'll
just list them below.

-Is the L1 instruction cache assumed to be perfect?
If by "perfect" you mean zero-cycle latency, then yes. This is the default setting for Ruby. However, this can be turned off by editing the ruby/config/rubyconfig.defaults and setting REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH to true, and then selecting L1 parameters in the same file to suit your needs.

The L1 is *not* infinite in size.



-Is there a performance metric in the Ruby output?  One can't use number
of cycles or CPI when comparing different cache implementations because
the number of instructions is different and the ruby cycle time appears
to be a function of the simulation time.
Actually, the count of Ruby_cycles is the performance metric in Ruby. It is true that they are proportional to the simulation time, but simulation time (host execution time) is proportional to simulated time (target execution time).



-Why is the instruction count different if the simulation starts from
the same checkpoint and terminates at the same flag?  It is, of course,
the same if you run the same model repeatedly.  However, if the cache
size is changed from 1MB L2 to, say, 8MB L2 then the instruction count
changes.
Are you reporting a value from Simics or from Ruby? Ruby_cycles is the measure of simulated time required, so it will definitely change with changing cache parameters. If you're talking about Ruby_cycles, that is the explanation.

Also, are you simulating simulations multiprocessor or single processor systems?



Thanks in advance,
Clay
Regards,
Dan

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
[← Prev in Thread] Current Thread [Next in Thread→]