Re: [Gems-users] [MOESI CMP TOKEN] Several Questions


Date: Wed, 30 Nov 2005 10:39:24 -0600
From: "Min Xu (Hsu)" <xu@xxxxxxxxxxx>
Subject: Re: [Gems-users] [MOESI CMP TOKEN] Several Questions
On Wed, 30 Nov 2005 thethem wrote :
> Thanks for responding so quickly, Dan!  Let me clarify...
> 
>  > Actually, the count of Ruby_cycles is the performance metric in Ruby.
>  > It is true that they are proportional to the simulation time, but
>  > simulation time (host execution time) is proportional to simulated
>  > time (target execution time).
> 
> If this is true then some of my results are confusing.  For example, all 
> other being equal, a larger L2 cache results in a longer execution time 
> even with fewer cache misses.  This seems backwards to me.  I'll list 
> some parameters and results below.  The only item that was changed from 
> experiment to experiment was the L2 size:
> 
> CMP 1x8 (one chip 8 procs per chip)
> 2MB L2
>       Ruby_cycles: 2388071860
>       instruction_executed: 22174042603
>       cycles_per_instruction: 0.861574

It is perhaps not suitable to use CPI as a performance metric
for your parallel workloads. For example, your small cache
may cause a processor to spinning more loops when it is
acquiring a busy lock. (because the lock holder is taking
longer to finished its work) The spinning doesn't do real work,
but increases the number of instruction executed.

For more of information, see

@Article(alameldeen:simulation-challenges:ieeecomputer:2003,
  author =       "Alaa R. Alameldeen and Milo M. K. Martin and Carl J. Mauer
                  and Kevin E. Moore and Min Xu and Daniel J. Sorin and Mark
                  D. Hill and David A. Wood",
  title =        "Simulating a \$2M Commercial Server on a \$2K PC",
  journal =      IEEECOMPUTER,
  month =        Feb,
  year =         2003,
  volume =       36,
  number =       2,
  pages =        "50-57",
  url =          "http://dx.doi.org/10.1109/MC.2003.1178046";,
  topic =        "multifacet, wisconsin, workload, simulation",
)

> 
> CMP 1x8 (one chip 8 procs per chip)
> 1MB L2
>       Ruby_cycles: 2638412275
>       instruction_executed: 24980575450
>       cycles_per_instruction: 0.844948
> 
> CMP 1x8 (one chip 8 procs per chip)
> 512kB L2
>       Ruby_cycles: 2470006953
>       instruction_executed: 23467006962
>       cycles_per_instruction: 0.842036
> 
> 
> I'm confused about the difference in instruction count.  Since they are 
> executing the same number of instructions, according to Simics, 
> shouldn't the instructions that Ruby sees be approximately the same from 
> experiment to experiment?
> 
> As for Ruby cycles, I would expect it to decrease for larger cache sizes 
> (up to a point).  The miss rate that Ruby is reporting makes sense, but 
> I can't seem to figure what's happening with the simulation time.  The 
> results above are from the Splash-2 benchmark, Ocean.  I have results 
> from dbench2 which are similar.
> 
> Thanks for your time, Dan.
> 
> ~Clay
> 
> Dan Gibson wrote:
> > Hello, Clay.
> > 
> > Let me answer your questions individually, below.
> > 
> > At 08:37 AM 11/30/2005 -0500, you wrote:
> > 
> >>Hello everyone,
> >>
> >>I've got several questions about the output from the Ruby module so I'll
> >>just list them below.
> >>
> >>-Is the L1 instruction cache assumed to be perfect?
> > 
> > 
> > If by "perfect" you mean zero-cycle latency, then yes. This is the default 
> > setting for Ruby. However, this can be turned off by editing the 
> > ruby/config/rubyconfig.defaults and setting 
> > REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH to true, and then selecting L1 
> > parameters in the same file to suit your needs.
> > 
> > The L1 is *not* infinite in size.
> > 
> > 
> > 
> >>-Is there a performance metric in the Ruby output?  One can't use number
> >>of cycles or CPI when comparing different cache implementations because
> >>the number of instructions is different and the ruby cycle time appears
> >>to be a function of the simulation time.
> > 
> > 
> > Actually, the count of Ruby_cycles is the performance metric in Ruby. It is 
> > true that they are proportional to the simulation time, but simulation time 
> > (host execution time) is proportional to simulated time (target execution 
> > time).
> > 
> > 
> > 
> >>-Why is the instruction count different if the simulation starts from
> >>the same checkpoint and terminates at the same flag?  It is, of course,
> >>the same if you run the same model repeatedly.  However, if the cache
> >>size is changed from 1MB L2 to, say, 8MB L2 then the instruction count
> >>changes.
> > 
> > 
> > Are you reporting a value from Simics or from Ruby? Ruby_cycles is the 
> > measure of simulated time required, so it will definitely change with 
> > changing cache parameters. If you're talking about Ruby_cycles, that is the 
> > explanation.
> > 
> > Also, are you simulating simulations multiprocessor or single processor 
> > systems?
> > 
> > 
> > 
> >>Thanks in advance,
> >>Clay
> > 
> > 
> > Regards,
> > Dan
> 
> _______________________________________________
> Gems-users mailing list
> Gems-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
[← Prev in Thread] Current Thread [Next in Thread→]