Re: [Gems-users] Simulation of the Communication Bottleneck in Cache Coherency Memory Access


Date: Thu, 19 May 2011 21:21:10 +0200
From: John Shield <john.shield@xxxxxxxxxxx>
Subject: Re: [Gems-users] Simulation of the Communication Bottleneck in Cache Coherency Memory Access
Thanks Greg,

You saved me a fair bit of time with solving the problem of the device busy situation.

( I took a bit of a detour to develop write my new cache policy in SLICC. Hence, the time delay between my previous email. )

When I came to solving this problem I found out that it was already solved for the randomisation case.

---------------------------------------------------------------------------
----- Starting Line: 221 in ruby/buffers/MessageBuffer.C
----- ORIGINAL -------------------------------------------------------
    // No randomization
    arrival_time = current_time + delta;
----- MODIFICATION ------------------------------------------------
    // No randomization
    if (m_strict_fifo) {
      if (m_last_arrival_time < current_time) {
        m_last_arrival_time = current_time;
      }
      arrival_time = m_last_arrival_time + delta;
    } else {
        arrival_time = current_time + delta;
    }
---------------------------------------------------------------------------

The modification required to simulate device busy time, is just a copy of the randomization case only 3 lines below in the MessageBuffer file.

Regards,

John

On 04/27/2011 08:58 PM, John Shield wrote:
Thank you Greg,

That's very informative and does tell me that GEMS can do what I wish to do.

I already knew it was going to be very involved. I'm actually needing to
do a fair bit more than I've mentioned here. So getting into the code was
going to be a given anyway.

Again thanks, I'm going to follow up on what you said.

Regards,

John

What I meant by bandwidth-limiting behavior in the network:  Each network
link is modeled as a physical channel.  Only one packet can be transmitted
across the link at a time (even through there are multiple virtual
channels/networks using that physical channel).  So all requests coming
into
a Directory (for example) will be serialized by that incoming link.

Getting a timestamp for a memory request is simple.  Call
g_EventQueue_ptr->getTime
  in Sequencer::makeRequest().

Not sure how this fixes your problem, but you can certainly add a
timestamp
to a memory request message.  There are also timestamps associated with
entries in a MessageBuffer, which tell the time at which the message
entered
the buffer.

I'm not positive, but I think you'll find this is a little more involved
than what you're proposing.  The time at which a memory operation is
initiated has little relation to when it arrives at a remote cache or
directory.  I'm not sure what you mean by "there's no timing relevant to
synchronization within SLICC".  Look at the generated code to see what the
Wakeup method for a SLICC component actually does.  (The timestamp of the
message at the head of a MessageBuffer indicates whether it's "ready", and
the message will not be removed from the queue until the global time is
greater than or equal to that timestamp.)

Also, be aware that Ruby time and Simics time are not the same.  Ruby
keeps
its own cycle count, accessible through the getTime() method mentioned
above.

...Greg


On Wed, Apr 27, 2011 at 1:21 PM, John Shield
<john.shield@xxxxxxxxxxx>wrote:

  Hi Greg,

Thanks for confirming that what I'm seeing in the code is correct.
Sorry, I
was stating the worst case scenario for behaviour and I didn't specify
that.
A fixed latency is added for each type of transaction between memory
components, so I assume that's what you mean by the bandwidth limiting
behaviour.

I think I can resolve this issue in SLICC, but it requires time stamps
for
each memory access. There's no timing relevant to synchronisation within
the
SLICC, only a latency calculation for the memory access.

My current direction for solving the issue is checking to see if the
interface between SIMICS and Ruby contains time stamp information for
when
memory accesses occur. With time stamp information it would be an easy
solution to calculate the busy time for each port queue.

Greg, can you confirm whether Ruby can obtain the time stamp for memory
accesses from the SIMICS interface? I would hearten me greatly if you
could
tell me it's possible, or speed things up if it's not possible. I still
haven't digested all the documentation for the SIMICS model builder and
how
Ruby interfaces to SIMICS.

Regards,

John Shield


On 04/27/2011 06:08 PM, Greg Byrd wrote:

I agree with your description of the problem.  I don't think it's quite
as
severe as you describe, because there is bandwidth-limiting behavior in
the
network.  So you won't get a burst of requests arriving simultaneously
at
the Directory (or wherever).  But I agree that "busy time" of the
resource
is not modeled, and multiple actions can overlap.

  The best way to fix this would be to change SLICC (or, more precisely,
the C++ code generated by SLICC) to include a notion of busy time for a
component.  The components would not process any in_port actions until
the
busy time has expired.

  You can probably model this without changing SLICC by adding a "delay"
function to MessageBuffer, to increase the timestamps of messages
waiting in
the queue, so that they are not "seen" until the component is not busy.
(Be
careful to avoid starvation on the incoming queues, since they are
checked
in a fixed order by the Wakeup method.)

  ...Greg




On Wed, Apr 27, 2011 at 11:58 AM, John Shield
<john.shield@xxxxxxxxxxx>wrote:

Dear all,

I'm going to ask two things.

Firstly, can anyone confirm that the SLICC cache coherency policies do
not
consider the wait time caused by other accesses sent to the Directory
system
(modelled in the protocols).

When going through the protocols, it appears that cache messages do not
compete for the resources of the Directory. There are queues in the
SLICC
description, but the wait time for earlier messages in the queue
doesn't
seem to make a difference for the latency of later messages. This is
would
also be a problem for each individual cache being able to process an
unlimited number of external requests in the time to takes to do 1
request.

To make the problem clear, all the components have have infinite
parallel
bandwidth including the main memory.

This behaviour means that the additional latency caused by multiple
messages at the same time are ignored. Ten messages arriving at
Directory
are processed with the same latency as a single message, because the
latency
of ten messages competing for communication bandwidth doesn't occur.

Secondly, if what I'm seeing is correct (lack of bottleneck
calculation)
does anyone know a way this problem can be fixed? I think I could fix
this
problem myself if I could access the relative timing of cache requests
then
I could add up the bottleneck latencies of the input queues as part of
the
coherency protocol.

The communication bottleneck is the main problem research needs to
solve
in the design of cache coherency architecture. Without some kind of
bottleneck behaviour being modelled, the cache coherency results would
be
poor. The results would be for an infinite bandwidth system, and system
would not model the performance losses of a badly designed cache
coherency
system.


Background of my own work:
I wanting to build some non-standard coherency policies, to relieve
communication bottleneck problems. However, to do this I need the
simulation
to simulate the bottleneck problems. Furthermore, I was planning on
adding a
"main memory" description to simulate the main memory bottleneck and to
allow for writeback (currently not supported in the ruby coherency
protocols). Writeback is also nesscary to fix the modelling problem of
infinite cache size in the SLICC description.


I would appreciate any assistance,

John Shield

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:
https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


_______________________________________________
Gems-users mailing
listGems-users@xxxxxxxxxxxxxxxx://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.




_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:
https://lists.cs.wisc.edu/archive/gems-users/"; to your search.



_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.



_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


[← Prev in Thread] Current Thread [Next in Thread→]
  • Re: [Gems-users] Simulation of the Communication Bottleneck in Cache Coherency Memory Access, John Shield <=