Re: [Gems-users] Simulation of the Communication Bottleneck in Cache Coherency Memory Access


Date: Wed, 27 Apr 2011 19:21:56 +0200
From: John Shield <john.shield@xxxxxxxxxxx>
Subject: Re: [Gems-users] Simulation of the Communication Bottleneck in Cache Coherency Memory Access
Hi Greg,

Thanks for confirming that what I'm seeing in the code is correct. Sorry, I was stating the worst case scenario for behaviour and I didn't specify that. A fixed latency is added for each type of transaction between memory components, so I assume that's what you mean by the bandwidth limiting behaviour.

I think I can resolve this issue in SLICC, but it requires time stamps for each memory access. There's no timing relevant to synchronisation within the SLICC, only a latency calculation for the memory access.

My current direction for solving the issue is checking to see if the interface between SIMICS and Ruby contains time stamp information for when memory accesses occur. With time stamp information it would be an easy solution to calculate the busy time for each port queue.

Greg, can you confirm whether Ruby can obtain the time stamp for memory accesses from the SIMICS interface? I would hearten me greatly if you could tell me it's possible, or speed things up if it's not possible. I still haven't digested all the documentation for the SIMICS model builder and how Ruby interfaces to SIMICS.

Regards,

John Shield

On 04/27/2011 06:08 PM, Greg Byrd wrote:
I agree with your description of the problem.  I don't think it's quite as severe as you describe, because there is bandwidth-limiting behavior in the network.  So you won't get a burst of requests arriving simultaneously at the Directory (or wherever).  But I agree that "busy time" of the resource is not modeled, and multiple actions can overlap.

The best way to fix this would be to change SLICC (or, more precisely, the C++ code generated by SLICC) to include a notion of busy time for a component.  The components would not process any in_port actions until the busy time has expired.

You can probably model this without changing SLICC by adding a "delay" function to MessageBuffer, to increase the timestamps of messages waiting in the queue, so that they are not "seen" until the component is not busy.  (Be careful to avoid starvation on the incoming queues, since they are checked in a fixed order by the Wakeup method.)

...Greg
 



On Wed, Apr 27, 2011 at 11:58 AM, John Shield <john.shield@xxxxxxxxxxx> wrote:
Dear all,

I'm going to ask two things.

Firstly, can anyone confirm that the SLICC cache coherency policies do not consider the wait time caused by other accesses sent to the Directory system (modelled in the protocols).

When going through the protocols, it appears that cache messages do not compete for the resources of the Directory. There are queues in the SLICC description, but the wait time for earlier messages in the queue doesn't seem to make a difference for the latency of later messages. This is would also be a problem for each individual cache being able to process an unlimited number of external requests in the time to takes to do 1 request.

To make the problem clear, all the components have have infinite parallel bandwidth including the main memory.

This behaviour means that the additional latency caused by multiple messages at the same time are ignored. Ten messages arriving at Directory are processed with the same latency as a single message, because the latency of ten messages competing for communication bandwidth doesn't occur.

Secondly, if what I'm seeing is correct (lack of bottleneck calculation) does anyone know a way this problem can be fixed? I think I could fix this problem myself if I could access the relative timing of cache requests then I could add up the bottleneck latencies of the input queues as part of the coherency protocol.

The communication bottleneck is the main problem research needs to solve in the design of cache coherency architecture. Without some kind of bottleneck behaviour being modelled, the cache coherency results would be poor. The results would be for an infinite bandwidth system, and system would not model the performance losses of a badly designed cache coherency system.


Background of my own work:
I wanting to build some non-standard coherency policies, to relieve communication bottleneck problems. However, to do this I need the simulation to simulate the bottleneck problems. Furthermore, I was planning on adding a "main memory" description to simulate the main memory bottleneck and to allow for writeback (currently not supported in the ruby coherency protocols). Writeback is also nesscary to fix the modelling problem of infinite cache size in the SLICC description.


I would appreciate any assistance,

John Shield

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.


_______________________________________________ Gems-users mailing list Gems-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/gems-users Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.

[← Prev in Thread] Current Thread [Next in Thread→]