Re: [Gems-users] About memory latency for CMP


Date: Sat, 29 Jan 2011 19:52:24 -0600
From: Byn Choi <bynchoi1@xxxxxxxxxxxx>
Subject: Re: [Gems-users] About memory latency for CMP

On Jan 29, 2011, at 6:05 PM, junli gu wrote:

Hey all:

I am simulating a 16-core CMP using Simics+Ruby. First I know that the latency values are all ruby cycles, which means 1 ruby cycles equals to 2 CPU cycles. I am simulating a 16-core CMP using the default values as the following:

That depends on the SIMICS_RUBY_MULTIPLIER parameter, which is 4 by default, which means simics will be advanced 4 times for every ruby cycle. I personally use 1 since the processor modeled by the simics is a very simple single-issue in-order 5-stage processor.


NULL_LATENCY: 1 ; Shortest possible latency ISSUE_LATENCY: 2 ; Latency to send out a request to the interconnect CACHE_LATENCY: 1 ; Latency to source data from a cache to the interconnect MEMORY_LATENCY: 35 ; Latency to source data from a memory module to the interconnect DIRECTORY_LATENCY: 1 ; Latency of directory lookup NETWORK_LINK_LATENCY: 1 ; Latency for a single node-to-node hop in the interconnect SEQUENCER_TO_CONTROLLER_LATENCY: 8 ; Latency added by sequencer to requests to cache controller TRANSITIONS_PER_RUBY_CYCLE: 32 ; Maximum transitions per cycle for all SLICC state machines SEQUENCER_OUTSTANDING_REQUESTS: 20 ; Number of outstanding requests per sequencer

My questions are:

A) I am positive about the L2 cache latency and memory latency. It is supposed to be 10 and 35 ruby cycles, which means 20 and 70 cpu cycles. Am I right?

This depends on how far the L2 bank is located wrt to the requestor. The latency will vary depending on the number of hops and number of routers that the request has to go through.


B) are these numbers realistic? I mean do they match the ones are in real products?

The following paper has detailed latency numbers from the Intel Nehalem and AMD Shanghai chips.

Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems (MICRO'09)


C) For big cores like 16-core or even 32-core, how should these numbers change? I guess when we have more cores the inter connection latency and memory latency will also increase? Also I am not sure whether NETWORK_LINK_LATENCY: 1 is too small.

The per-hop interconnection latency and memory latency (memory look up time) should remain unchanged here. Again, as mentioned in A), the overall (average) latency would increase due to increased interconnection diameter.

Byn


Thank you in advance!

--
************************************************
Junli Gu--谷俊丽
Coordinated Science Lab
University of Illinois at Urbana-Champaign
************************************************
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


---
Byn Choi
Ph.D. Candidate in Computer Science
University of Illinois, Urbana-Champaign

[← Prev in Thread] Current Thread [Next in Thread→]