Dear Users of GEMS,
I have a signal processing and computer science background and am new to
hardware simulations. I want to know the opinion of GEMS users if GEMS
might be a good fit for my simulations. A short problem statement is
provided in the following paragraphs.
In specific, I have a very large speech recognition system (more than 50
thousand words) and I have a serial and its parallel version (using
pthreads) on CMP. When the serial code is not optimized, the parallel code
shows runtime improvement. However, when the serial code is optimized (for
cache performance), the parallel version of the optimized system shows a
performance drop (on a dual core AMD Opetron). Preliminary investigations
point to cache-coherency and memory bandwidth. The memory footprint of the
application is around 1 GB and takes around 2-3 minutes to run a test
case.
Now, given the problem, the question that I'm trying to find an answer to
is what cache-architecture, interconnects, cache-coherency protocol would
make the parallel system run faster than the serial system? After doing
some preliminary research/reading on GEMS, I find that I can use Ruby +
Simics + Opal for simulating a dual-core UltraSparc iii. Hence, I can
benchmark my system baseline performance (runtime + cache misses, etc.) on
this hardware configuration. Next, performance of parallel system can be
benchmarked. I would expect the performance to drop in a similar fashion
if not exactly the same to one observed on AMD Dual-core Opetron. Next,
changes can be made to cache design, etc. and benchmarked.
Does this plan sound reasonable?
Cheers,
Naveen Parihar
Ph.D. Student (www.ece.msstate.edu/~np1)
|