Re: [Gems-users] gems suitability


Date: Sun, 16 Nov 2008 12:45:06 -0600
From: "Dan Gibson" <degibson@xxxxxxxx>
Subject: Re: [Gems-users] gems suitability
Hello Naveen,
Your problem sounds fascinating. I forsee one very challenging problem: A test case that requires 2-3 minutes of execution on a real host will run unbearably long in GEMS -- in any timing simulator (i.e. on the order of months). If you decide to go ahead and use GEMS, you're going to want to isolate a meaningful test case that runs in no more than a second or so. Bear in mind that setup and teardown need not be simulated with timing accuracy.

If you can indeed isolate a meaningful test case, then GEMS is a good platform to answer your questions about interconnect and cache configuration.

Regards,
Dan

On Sun, Nov 16, 2008 at 10:31 AM, Naveen Parihar <np1@xxxxxxxxxxxxxxx> wrote:

Dear Users of GEMS,

I have a signal processing and computer science background and am new to
hardware simulations. I want to know the opinion of GEMS users if GEMS
might be a good fit for my simulations. A short problem statement is
provided in the following paragraphs.

In specific, I have a very large speech recognition system (more than 50
thousand words) and I have a serial and its parallel version (using
pthreads) on CMP. When the serial code is not optimized, the parallel code
shows runtime improvement. However, when the serial code is optimized (for
cache performance), the parallel version of the optimized system shows a
performance drop (on a dual core AMD Opetron). Preliminary investigations
point to cache-coherency and memory bandwidth. The memory footprint of the
application is around 1 GB and takes around 2-3 minutes to run a test
case.

Now, given the problem, the question that I'm trying to find an answer to
is what cache-architecture, interconnects, cache-coherency protocol would
make the parallel system run faster than the serial system? After doing
some preliminary research/reading on GEMS, I find that I can use Ruby +
Simics + Opal for simulating a dual-core UltraSparc iii. Hence, I can
benchmark my system baseline performance (runtime + cache misses, etc.) on
this hardware configuration. Next, performance of parallel system can be
benchmarked. I would expect the performance to drop in a similar fashion
if not exactly the same to one observed on AMD Dual-core Opetron. Next,
changes can be made to cache design, etc. and benchmarked.

Does this plan sound reasonable?

Cheers,
Naveen Parihar
Ph.D. Student (www.ece.msstate.edu/~np1)
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.




--
http://www.cs.wisc.edu/~gibson [esc]:wq!
[← Prev in Thread] Current Thread [Next in Thread→]