Re: [Gems-users] State transitions per cycle and Instruction Profiling


Date: Tue, 28 Aug 2007 11:19:14 -0500 (CDT)
From: Mike Marty <mikem@xxxxxxxxxxx>
Subject: Re: [Gems-users] State transitions per cycle and Instruction Profiling

zz_recycle* is used as a crude hack so that the entire incoming request queue does not block when one request for a given cache block is outstanding (in a transient state). What recycle does is remove the blocked request from the incoming queue and re-enqueue it towards the end. This allows other requests that are not blocked to be handled. If you want to model more realistic hardware to handle this functionality, you could do so (i.e., have a seperate holding buffer for blocked requests). The reason why zz_recycle* and *TRANSITIONS_PER_RUBY_CYCLE don't play well is that real hardware can have more efficient wakeup logic on blocked requests.

In the past, I have limited the number of snoops/cycle by adding the notion of "busy banks". There are many ways to do this. One way is to build some kind of TimerTable structure used by each controller instance (one controller instance per cache bank). When a snoop occurs, you do something so that getState() returns a BUSY state for _all_ addresses. Then, when the bank is no longer busy X cycles later, a wakeup occurs which clears the global BUSY state.

Yes, it is a good idea to add your own profiling logic especially when playing with enabling/disabling fast path. For a fast-path hit, a request will not reach the mandatory queue.

--Mike

Hi,
I have a couple of questions about Ruby:

I've seen that the variable Lx_CACHE_TRANSITIONS_PER_RUBY_CYCLE is set to 32 and that is recommended to set it higher if it's a protocol that uses zz_recycle... actions. I would like to set it to a more realistic value (smaller) to limit the number of snoops/cycle but I don't see why this is not desirable if I use the zz_recycle... actions.

The second question is about profiling instructions. I already have the values for the data cache with REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH=true but it seems that for instructions the profiler only takes into account the misses and I allways get a 100% miss rate for the instruction cache. That's why I added a call to the counter in the doRequest function of the Sequencer. The call is activated when (hit && request.getType() == CacheRequestType_IFETCH) == true. If I understood well this requests never go to the mandatory queue. Do you think it's correct?

Thank you for your help!

Enric






____________________________________________________________________________________
Sé un Mejor Amante del Cine ¿Quieres saber cómo? ¡Deja que otras personas te ayuden!
http://advision.webevents.yahoo.com/reto/entretenimiento.html
[← Prev in Thread] Current Thread [Next in Thread→]