Re: [Gems-users] [MOESI CMP TOKEN] Several Questions

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Date:	Wed, 30 Nov 2005 10:39:59 -0600
From:	Dan Gibson <degibson@xxxxxxxx>
Subject:	Re: [Gems-users] [MOESI CMP TOKEN] Several Questions

Those results do seem unusual. The 1MB vs 2MB data seems to make sense,though the 512KB L2 size is quite strange. What are the working-setsizes for your applications? Very large or very small working sets canbe less sensitive to cache size. It could be that the working setsoverwhelm the L2 regardless of the three configurations below...tryusing a very large (~2x working set size) L2 cache.


As for the instruction count, consider this:

In multithreaded applications, there is some interthread interaction,through data sharing, cooperative caching, and synchronization. Changingthe caches changes the interactions here...suppose a processor isspinning, waiting for a lock to release. The length of the spin (and thenumber of instructions executed as a result of the spin) is influencedby _other_ processor's cache performance (especially the thread holdingthe lock!).

I was initially confused by this behavior as well...it is subtle.

Regards,
Dan

thethem wrote:

Thanks for responding so quickly, Dan!  Let me clarify...

> Actually, the count of Ruby_cycles is the performance metric in Ruby.
> It is true that they are proportional to the simulation time, but
> simulation time (host execution time) is proportional to simulated
> time (target execution time).
If this is true then some of my results are confusing. For example, allother being equal, a larger L2 cache results in a longer execution timeeven with fewer cache misses. This seems backwards to me. I'll listsome parameters and results below. The only item that was changed fromexperiment to experiment was the L2 size:
CMP 1x8 (one chip 8 procs per chip)
2MB L2
     Ruby_cycles: 2388071860
     instruction_executed: 22174042603
     cycles_per_instruction: 0.861574

CMP 1x8 (one chip 8 procs per chip)
1MB L2
     Ruby_cycles: 2638412275
     instruction_executed: 24980575450
     cycles_per_instruction: 0.844948

CMP 1x8 (one chip 8 procs per chip)
512kB L2
     Ruby_cycles: 2470006953
     instruction_executed: 23467006962
     cycles_per_instruction: 0.842036
I'm confused about the difference in instruction count. Since they areexecuting the same number of instructions, according to Simics,shouldn't the instructions that Ruby sees be approximately the same fromexperiment to experiment?
As for Ruby cycles, I would expect it to decrease for larger cache sizes(up to a point). The miss rate that Ruby is reporting makes sense, butI can't seem to figure what's happening with the simulation time. Theresults above are from the Splash-2 benchmark, Ocean. I have resultsfrom dbench2 which are similar.
Thanks for your time, Dan.

~Clay

Dan Gibson wrote:
Hello, Clay.

Let me answer your questions individually, below.

At 08:37 AM 11/30/2005 -0500, you wrote:
Hello everyone,

I've got several questions about the output from the Ruby module so I'll
just list them below.

-Is the L1 instruction cache assumed to be perfect?
If by "perfect" you mean zero-cycle latency, then yes. This is the defaultsetting for Ruby. However, this can be turned off by editing theruby/config/rubyconfig.defaults and settingREMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH to true, and then selecting L1parameters in the same file to suit your needs.
The L1 is *not* infinite in size.
-Is there a performance metric in the Ruby output?  One can't use number
of cycles or CPI when comparing different cache implementations because
the number of instructions is different and the ruby cycle time appears
to be a function of the simulation time.
Actually, the count of Ruby_cycles is the performance metric in Ruby. It istrue that they are proportional to the simulation time, but simulation time(host execution time) is proportional to simulated time (target executiontime).
-Why is the instruction count different if the simulation starts from
the same checkpoint and terminates at the same flag?  It is, of course,
the same if you run the same model repeatedly.  However, if the cache
size is changed from 1MB L2 to, say, 8MB L2 then the instruction count
changes.
Are you reporting a value from Simics or from Ruby? Ruby_cycles is themeasure of simulated time required, so it will definitely change withchanging cache parameters. If you're talking about Ruby_cycles, that is theexplanation.
Also, are you simulating simulations multiprocessor or single processorsystems?
Thanks in advance,
Clay
Regards,
Dan
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users

[← Prev in Thread]	Current Thread	[Next in Thread→]
[Gems-users] [MOESI CMP TOKEN] Several Questions, thethem Re: [Gems-users] [MOESI CMP TOKEN] Several Questions, Dan Gibson Re: [Gems-users] [MOESI CMP TOKEN] Several Questions, thethem Re: [Gems-users] [MOESI CMP TOKEN] Several Questions, Min Xu (Hsu) Re: [Gems-users] [MOESI CMP TOKEN] Several Questions, Dan Gibson <=

Previous by Date:	Re: [Gems-users] [MOESI CMP TOKEN] Several Questions, Min Xu (Hsu)
Next by Date:	, (nil)
Previous by Thread:	Re: [Gems-users] [MOESI CMP TOKEN] Several Questions, Min Xu (Hsu)
Next by Thread:	[Gems-users] (no subject), adgg1
Indexes:	[Date] [Thread]

Mailing List Archives

Public Access

Re: [Gems-users] [MOESI CMP TOKEN] Several Questions