Re: [Gems-users] Simulation of the Communication Bottleneck in Cache Coherency Memory Access
What I meant by bandwidth-limiting behavior in the network: Each network link is modeled as a physical channel. Only one packet can be transmitted across the link at a time (even through there are multiple virtual channels/networks using that physical channel). So all requests coming into a Directory (for example) will be serialized by that incoming link.
Getting a timestamp for a memory request is simple. Call g_EventQueue_ptr->getTime
in Sequencer::makeRequest().
Not sure how this fixes your problem, but you can certainly add a timestamp to a memory request message. There are also timestamps associated with entries in a MessageBuffer, which tell the time at which the message entered the buffer.
I'm not positive, but I think you'll find this is a little more involved than what you're proposing. The time at which a memory operation is initiated has little relation to when it arrives at a remote cache or directory. I'm not sure what you mean by "there's no timing relevant to synchronization within SLICC". Look at the generated code to see what the Wakeup method for a SLICC component actually does. (The timestamp of the message at the head of a MessageBuffer indicates whether it's "ready", and the message will not be removed from the queue until the global time is greater than or equal to that timestamp.)
Also, be aware that Ruby time and Simics time are not the same. Ruby keeps its own cycle count, accessible through the getTime() method mentioned above.
Thanks for confirming that what I'm seeing in the code is correct.
Sorry, I was stating the worst case scenario for behaviour and I
didn't specify that. A fixed latency is added for each type of
transaction between memory components, so I assume that's what you
mean by the bandwidth limiting behaviour.
I think I can resolve this issue in SLICC, but it requires time
stamps for each memory access. There's no timing relevant to
synchronisation within the SLICC, only a latency calculation for the
memory access.
My current direction for solving the issue is checking to see if the
interface between SIMICS and Ruby contains time stamp information
for when memory accesses occur. With time stamp information it would
be an easy solution to calculate the busy time for each port queue.
Greg, can you confirm whether Ruby can obtain the time stamp for
memory accesses from the SIMICS interface? I would hearten me
greatly if you could tell me it's possible, or speed things up if
it's not possible. I still haven't digested all the documentation
for the SIMICS model builder and how Ruby interfaces to SIMICS.
Regards,
John Shield
On 04/27/2011 06:08 PM, Greg Byrd wrote:
I agree with your description of the problem. I don't
think it's quite as severe as you describe, because there is
bandwidth-limiting behavior in the network. So you won't get a
burst of requests arriving simultaneously at the Directory (or
wherever). But I agree that "busy time" of the resource is not
modeled, and multiple actions can overlap.
The best way to fix this would be to change SLICC (or, more
precisely, the C++ code generated by SLICC) to include a notion
of busy time for a component. The components would not process
any in_port actions until the busy time has expired.
You can probably model this without changing SLICC by adding
a "delay" function to MessageBuffer, to increase the timestamps
of messages waiting in the queue, so that they are not "seen"
until the component is not busy. (Be careful to avoid
starvation on the incoming queues, since they are checked in a
fixed order by the Wakeup method.)
Firstly, can anyone confirm that the SLICC cache coherency
policies do not consider the wait time caused by other
accesses sent to the Directory system (modelled in the
protocols).
When going through the protocols, it appears that cache
messages do not compete for the resources of the Directory.
There are queues in the SLICC description, but the wait time
for earlier messages in the queue doesn't seem to make a
difference for the latency of later messages. This is would
also be a problem for each individual cache being able to
process an unlimited number of external requests in the time
to takes to do 1 request.
To make the problem clear, all the components have have
infinite parallel bandwidth including the main memory.
This behaviour means that the additional latency caused by
multiple messages at the same time are ignored. Ten messages
arriving at Directory are processed with the same latency as
a single message, because the latency of ten messages
competing for communication bandwidth doesn't occur.
Secondly, if what I'm seeing is correct (lack of bottleneck
calculation) does anyone know a way this problem can be
fixed? I think I could fix this problem myself if I could
access the relative timing of cache requests then I could
add up the bottleneck latencies of the input queues as part
of the coherency protocol.
The communication bottleneck is the main problem research
needs to solve in the design of cache coherency
architecture. Without some kind of bottleneck behaviour
being modelled, the cache coherency results would be poor.
The results would be for an infinite bandwidth system, and
system would not model the performance losses of a badly
designed cache coherency system.
Background of my own work:
I wanting to build some non-standard coherency policies, to
relieve communication bottleneck problems. However, to do
this I need the simulation to simulate the bottleneck
problems. Furthermore, I was planning on adding a "main
memory" description to simulate the main memory bottleneck
and to allow for writeback (currently not supported in the
ruby coherency protocols). Writeback is also nesscary to fix
the modelling problem of infinite cache size in the SLICC
description.