Re: [Gems-users] Low IPC in Opal


Date: Tue, 18 Nov 2008 13:22:43 -0600
From: "Dan Gibson" <degibson@xxxxxxxx>
Subject: Re: [Gems-users] Low IPC in Opal
Daniel,
Woo's SPLASH-2 paper suggests that FFT has something like an ~8MB working set. How big are your caches? How do your cache miss rates compare to those in Figure 3 of the paper?

Also, what is your effective memory latency? A low hit rate and a high latency can still cripple big windows, if misses are data-dependent.

Regards,
Dan

On Tue, Nov 18, 2008 at 11:59 AM, Daniel Sánchez Pedreño <sanatox@xxxxxxxxx> wrote:
Dear list,

I have experimented a very low IPC when using Opal in GEMS 2.1. Around 0,6 when the processor width is 4. For example, for the FFT application from SPLASH-2 using just 1 processor, the results are:

Total number of instructions                         57450293
Total number of cycles                               89828116
number of continue calls                             57450293
Instruction per cycle:                             0.639558

I have also seen that the fetch stage is usually stalled because of window full event, which represents around the 50% of the number of cycles of the simulation:

Reasons for fetch stalls:
Fetch ready         :                        0   0.00%
Fetch i-cache miss  :                1,553,619   2.73%
Fetch squash        :                      167   0.00%
Fetch I-TLB miss    :                   20,351   0.04%
Window Full         :               43,860,074  77.14%
Fetch Barrier       :               11,421,297  20.09%
Write Buffer Full   :                        0   0.00%

However, the ROB size in this experiment is 1024 entries while the window size is 512. Additionally,The L1 cache miss ratio is 3%.

Finally, the Retire stage is stalled because of the following events:

Retire Not-Ready Stage Histogram
FETCH_STAGE   = 201993   ( 0.249%)
DECODE_STAGE   = 202944   ( 0.250%)
READY_STAGE   = 8931472   (11.022%)
EXECUTE_STAGE   = 35066395   (43.273%)
CACHE_MISS_STAGE   = 23505376   (29.006%)
CACHE_NOTREADY_STAGE   = 4287931   ( 5.291%)
COMPLETE_STAGE   = 8839216   (10.908%)

So the question is: is correct this low IPC? From the results obtained, what can I do to increase performance?.

Thank you.

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.





--
http://www.cs.wisc.edu/~gibson [esc]:wq!
[← Prev in Thread] Current Thread [Next in Thread→]