Re: [Gems-users] About memory latency for CMP


Date: Sat, 29 Jan 2011 21:34:32 -0500
From: Abdullah Kayi <apokayi@xxxxxxxxxxxxxx>
Subject: Re: [Gems-users] About memory latency for CMP

On 1/29/11 8:52 PM, "Byn Choi" <bynchoi1@xxxxxxxxxxxx> wrote:

>
>On Jan 29, 2011, at 6:05 PM, junli gu wrote:
>
>> Hey all:
>>
>>   I am simulating a 16-core CMP using Simics+Ruby. First I know that
>> the latency values are all ruby cycles, which means 1 ruby cycles
>> equals to 2 CPU cycles. I am simulating a 16-core CMP using the
>> default values as the following:
>
>That depends on the SIMICS_RUBY_MULTIPLIER parameter, which is 4 by
>default, which means simics will be advanced 4 times for every ruby
>cycle. I personally use 1 since the processor modeled by the simics is
>a very simple single-issue in-order 5-stage processor.
>
>>
>> NULL_LATENCY:                      1                     ; Shortest
>> possible latency
>> ISSUE_LATENCY:                     2                     ; Latency
>> to send out a request to the interconnect
>> CACHE_LATENCY:                     1                     ; Latency
>> to source data from a cache to the interconnect
>> MEMORY_LATENCY:                    35                    ; Latency
>> to source data from a memory module to the interconnect
>> DIRECTORY_LATENCY:                 1                     ; Latency
>> of directory lookup
>> NETWORK_LINK_LATENCY:              1                     ; Latency
>> for a single node-to-node hop in the interconnect
>> SEQUENCER_TO_CONTROLLER_LATENCY:   8                     ; Latency
>> added by sequencer to requests to cache controller
>> TRANSITIONS_PER_RUBY_CYCLE:        32                    ; Maximum
>> transitions per cycle for all SLICC state machines
>> SEQUENCER_OUTSTANDING_REQUESTS:    20                    ; Number of
>> outstanding requests per sequencer
>>
>> My questions are:
>>
>> A) I am positive about the L2 cache latency and memory latency. It
>> is supposed to be 10 and 35 ruby cycles, which means 20 and 70 cpu
>> cycles. Am I right?
>
>This depends on how far the L2 bank is located wrt to the requestor.
>The latency will vary depending on the number of hops and number of
>routers that the request has to go through.
>
>>
>> B)  are these numbers realistic? I mean do they match the ones are
>> in real products?
>
>The following paper has detailed latency numbers from the Intel
>Nehalem and AMD Shanghai chips.
>
>Comparing Cache Architectures and Coherency Protocols on x86-64
>Multicore SMP Systems (MICRO'09)
>
>>
>> C) For big cores like 16-core or even 32-core, how should these
>> numbers change?  I guess when we have more cores the inter
>> connection latency and memory latency will also increase?  Also I am
>> not sure whether NETWORK_LINK_LATENCY:              1     is too
>> small.
>
>The per-hop interconnection latency and memory latency (memory look up
>time) should remain unchanged here. Again, as mentioned in A), the
>overall (average) latency would increase due to increased
>interconnection diameter.


Also, you should note that NETWORK_LINK_LATENCY is not used if you are
using FILE_SPECIFIED as your topology. In that case you need to check the
parameters inside the network  configuration text file under
$GEMS/ruby/simple/Network_files.

Cheers,

Abdullah


[← Prev in Thread] Current Thread [Next in Thread→]