[Gems-users] gems suitability


Date: Sun, 16 Nov 2008 10:31:29 -0600 (CST)
From: Naveen Parihar <np1@xxxxxxxxxxxxxxx>
Subject: [Gems-users] gems suitability

Dear Users of GEMS,

I have a signal processing and computer science background and am new to hardware simulations. I want to know the opinion of GEMS users if GEMS might be a good fit for my simulations. A short problem statement is provided in the following paragraphs.

In specific, I have a very large speech recognition system (more than 50 thousand words) and I have a serial and its parallel version (using pthreads) on CMP. When the serial code is not optimized, the parallel code shows runtime improvement. However, when the serial code is optimized (for cache performance), the parallel version of the optimized system shows a performance drop (on a dual core AMD Opetron). Preliminary investigations point to cache-coherency and memory bandwidth. The memory footprint of the application is around 1 GB and takes around 2-3 minutes to run a test case.

Now, given the problem, the question that I'm trying to find an answer to is what cache-architecture, interconnects, cache-coherency protocol would make the parallel system run faster than the serial system? After doing some preliminary research/reading on GEMS, I find that I can use Ruby + Simics + Opal for simulating a dual-core UltraSparc iii. Hence, I can benchmark my system baseline performance (runtime + cache misses, etc.) on this hardware configuration. Next, performance of parallel system can be benchmarked. I would expect the performance to drop in a similar fashion if not exactly the same to one observed on AMD Dual-core Opetron. Next, changes can be made to cache design, etc. and benchmarked.

Does this plan sound reasonable?

Cheers,
Naveen Parihar
Ph.D. Student (www.ece.msstate.edu/~np1)
[← Prev in Thread] Current Thread [Next in Thread→]