Re: [Gems-users] Question about execution slowdown due to Ruby


Date: Fri, 01 Dec 2006 16:48:12 -0600
From: Dan Gibson <degibson@xxxxxxxx>
Subject: Re: [Gems-users] Question about execution slowdown due to Ruby
Mark,

We've actually looked extensively at this.

Background: the 10x number is for a much earlier version of Simics (pre-2.0 if I am not mistaken) and an earlier version of Ruby, with cpu-switch-time 1000. 10x is about right for the timing of the Tourmaline functional simulator... Ruby is now much slower relative to Simics... because:

Firstly, Simics 3+ runs in "fast" mode by default, whereas Simics 2.X and earlier ran in "stall" mode by default. For that and due to other optimizations, Simics 3+ is significantly faster than Simics 2.X in standalone mode (eg without timing modules).

Second, I'm not entirely familiar with the cashew target... (not familiar at all, in fact). There might be something to do with that particular target that creates slowdown.

Third, the majority of the added slowdown actually occurs *within Simics*, not within Ruby. We discovered this when when profiling Ruby's execution time. Our theory is that Simics behaves differently when alien modules (eg Ruby) are installed. Moreover, Simics does not easily optimize stall times of large sizes. You can test this theory yourself by building a trivial timing module and examining its behavior. The best performance we ever attained with a trivial timing module was around 10x for Simics 2.0.x, around 30x for Simics 2.2.x, and around 100x for Simics 3.0, both with 0-cycle memory accesses.

There are a few things you can do to improve your simulator performance:
1) Limit Simics' memory to just below the available space of your host. set-memory-limit is the appropriate simics command
2) Set cpu-switch-time to something large... this sacrifices some fidelity.
3) Use smaller latencies -- host execution time is roughly proportional to target execution time.


Regards,
Dan Gibson

Mark Gebhart wrote:
Hi,

I am running some tests using MOSI_SMP_bcast with a 4 processor target
machine using cashew-common.simics.  I have set the memory size in both Ruby
and the simics config file to 4 GB.

I wrote a short vector add test program and compared the time to simulate on
simics with and without Ruby loaded.  Ruby+simics is between 2000 and 3000
times slower than just simics.  I changed all of the cache latencies to be 1
cycle and then Ruby+simcs was between 200 and 300 times slower than just
simcs.  Are these slowdowns in line with what others have observed?  I saw
in an archived message on this list that perhaps a slowdown closer to 10x
would be expected.

I am using simics 3.0-22 and GEMS 1.3, my host machine is a 2.8 GHZ P4 with
1G of RAM and nothing else running during simulation.   The simics process
stays at 100% utilization throughout the execution.

I have compiled Ruby with the following optimization flags:
-O2 -finline-functions -DNO_VECTOR_BOUNDS_CHECKS -DMULTIFACET_NO_OPT_WARN

I use the -stall flag to simics and then issue the following commands:

read-configuration ../../checkpoints-u3/linux-a.out-sun-4p.check
instruction-fetch-mode instruction-fetch-trace
istc-disable
dstc-disable
cpu-switch-time 1
load-module ruby
ruby0.setparam g_NUM_PROCESSORS 4
ruby0.init
con0.input "/usr/mark/a.out\n"
c

Did I perhaps misconfigure something that is causing a large slowdown?  Any
insight or advice would be greatly appreciated.

Thanks,
Mark

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


--
http://www.cs.wisc.edu/~gibson [esc]:wq!

[← Prev in Thread] Current Thread [Next in Thread→]