On Jan 29, 2011, at 6:05 PM, junli gu wrote:
Hey all:
I am simulating a 16-core CMP using Simics+Ruby. First I know that
the latency values are all ruby cycles, which means 1 ruby cycles
equals to 2 CPU cycles. I am simulating a 16-core CMP using the
default values as the following:
That depends on the SIMICS_RUBY_MULTIPLIER parameter, which is 4 by
default, which means simics will be advanced 4 times for every ruby
cycle. I personally use 1 since the processor modeled by the simics is
a very simple single-issue in-order 5-stage processor.
NULL_LATENCY: 1 ; Shortest
possible latency
ISSUE_LATENCY: 2 ; Latency
to send out a request to the interconnect
CACHE_LATENCY: 1 ; Latency
to source data from a cache to the interconnect
MEMORY_LATENCY: 35 ; Latency
to source data from a memory module to the interconnect
DIRECTORY_LATENCY: 1 ; Latency
of directory lookup
NETWORK_LINK_LATENCY: 1 ; Latency
for a single node-to-node hop in the interconnect
SEQUENCER_TO_CONTROLLER_LATENCY: 8 ; Latency
added by sequencer to requests to cache controller
TRANSITIONS_PER_RUBY_CYCLE: 32 ; Maximum
transitions per cycle for all SLICC state machines
SEQUENCER_OUTSTANDING_REQUESTS: 20 ; Number of
outstanding requests per sequencer
My questions are:
A) I am positive about the L2 cache latency and memory latency. It
is supposed to be 10 and 35 ruby cycles, which means 20 and 70 cpu
cycles. Am I right?
This depends on how far the L2 bank is located wrt to the requestor.
The latency will vary depending on the number of hops and number of
routers that the request has to go through.
B) are these numbers realistic? I mean do they match the ones are
in real products?
The following paper has detailed latency numbers from the Intel
Nehalem and AMD Shanghai chips.
Comparing Cache Architectures and Coherency Protocols on x86-64
Multicore SMP Systems (MICRO'09)
C) For big cores like 16-core or even 32-core, how should these
numbers change? I guess when we have more cores the inter
connection latency and memory latency will also increase? Also I am
not sure whether NETWORK_LINK_LATENCY: 1 is too
small.
The per-hop interconnection latency and memory latency (memory look up
time) should remain unchanged here. Again, as mentioned in A), the
overall (average) latency would increase due to increased
interconnection diameter.
Byn
Thank you in advance!
--
************************************************
Junli Gu--谷俊丽
Coordinated Science Lab
University of Illinois at Urbana-Champaign
************************************************
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
---
Byn Choi
Ph.D. Candidate in Computer Science
University of Illinois, Urbana-Champaign
|