Thanks for responding so quickly, Dan! Let me clarify...
> Actually, the count of Ruby_cycles is the performance metric in Ruby.
> It is true that they are proportional to the simulation time, but
> simulation time (host execution time) is proportional to simulated
> time (target execution time).
If this is true then some of my results are confusing. For example, all
other being equal, a larger L2 cache results in a longer execution time
even with fewer cache misses. This seems backwards to me. I'll list
some parameters and results below. The only item that was changed from
experiment to experiment was the L2 size:
CMP 1x8 (one chip 8 procs per chip)
2MB L2
Ruby_cycles: 2388071860
instruction_executed: 22174042603
cycles_per_instruction: 0.861574
CMP 1x8 (one chip 8 procs per chip)
1MB L2
Ruby_cycles: 2638412275
instruction_executed: 24980575450
cycles_per_instruction: 0.844948
CMP 1x8 (one chip 8 procs per chip)
512kB L2
Ruby_cycles: 2470006953
instruction_executed: 23467006962
cycles_per_instruction: 0.842036
I'm confused about the difference in instruction count. Since they are
executing the same number of instructions, according to Simics,
shouldn't the instructions that Ruby sees be approximately the same from
experiment to experiment?
As for Ruby cycles, I would expect it to decrease for larger cache sizes
(up to a point). The miss rate that Ruby is reporting makes sense, but
I can't seem to figure what's happening with the simulation time. The
results above are from the Splash-2 benchmark, Ocean. I have results
from dbench2 which are similar.
Thanks for your time, Dan.
~Clay
Dan Gibson wrote:
Hello, Clay.
Let me answer your questions individually, below.
At 08:37 AM 11/30/2005 -0500, you wrote:
Hello everyone,
I've got several questions about the output from the Ruby module so I'll
just list them below.
-Is the L1 instruction cache assumed to be perfect?
If by "perfect" you mean zero-cycle latency, then yes. This is the default
setting for Ruby. However, this can be turned off by editing the
ruby/config/rubyconfig.defaults and setting
REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH to true, and then selecting L1
parameters in the same file to suit your needs.
The L1 is *not* infinite in size.
-Is there a performance metric in the Ruby output? One can't use number
of cycles or CPI when comparing different cache implementations because
the number of instructions is different and the ruby cycle time appears
to be a function of the simulation time.
Actually, the count of Ruby_cycles is the performance metric in Ruby. It is
true that they are proportional to the simulation time, but simulation time
(host execution time) is proportional to simulated time (target execution
time).
-Why is the instruction count different if the simulation starts from
the same checkpoint and terminates at the same flag? It is, of course,
the same if you run the same model repeatedly. However, if the cache
size is changed from 1MB L2 to, say, 8MB L2 then the instruction count
changes.
Are you reporting a value from Simics or from Ruby? Ruby_cycles is the
measure of simulated time required, so it will definitely change with
changing cache parameters. If you're talking about Ruby_cycles, that is the
explanation.
Also, are you simulating simulations multiprocessor or single processor
systems?
Thanks in advance,
Clay
Regards,
Dan
|