Re: [Gems-users] Barrnes CPI question


Date: Wed, 16 Apr 2008 10:39:39 -0500
From: Dan Gibson <degibson@xxxxxxxx>
Subject: Re: [Gems-users] Barrnes CPI question
Try simulating the parallel section only. I think thats why you're seeing unexpected results. It happens a lot on this list. You'll also find that simulator time is noticeably reduced, as well.

Regards,
Dan

KS Chow wrote:
The entire thing - but the results are really unexpected.

Thanks/Regards.

Chow Kian Sim
National University of Singapore

-----Original Message-----
From: gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx]
On Behalf Of Dan Gibson
Sent: Wednesday, 16 April, 2008 11:35 PM
To: Gems Users
Subject: Re: [Gems-users] Barrnes CPI question

Are you simulating the entire execution or just the parallel section? The entire execution includes thread creation and destruction, which could account for the slowdown.

Regards,
Dan

KS Chow wrote:
I looked at ruby_cycles:

1P: 36,420,634

2P: 64,956,443

Shouldn't it be going down? Or is there something wrong with my simulation?

Thanks/Regards.

Chow Kian Sim

National University of Singapore

*From:* gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx] *On Behalf Of *Dan Gibson
*Sent:* Wednesday, 16 April, 2008 10:56 PM
*To:* Gems Users
*Subject:* Re: [Gems-users] Barrnes CPI question

KS Chow wrote:

I have a question regarding the CPI I got from a ruby dump file.

I am comparing the CPI between 2 simulations: a single core machine and a dual-core - all the necessary configurations done in Simics and Ruby (not using Opal).

For the 1-core I ran Barnes with the param indicating 1 processor; the dual-core is run with the param indicating 2 processors.

1-core CPI = 3.27789

2-core CPI = 0.621735

The CPI difference could be due to spinning... see below.

Is it normal to have such a huge drop in CPI going from 1 to 2 cores? Both have 16Mbs L2 cache and equal amount of L1.

I also noticed that in the 1-core 10,974,780 instructions were executed vs 194,051,043 in the 2-core - why such a big difference in num instructions executed?

Is this caused by the num processors param in Barnes?

SPLASH-2 synchronization is hot-spin intensive. Both CPUs are probably spending a lot of time spinning waiting for the other to set a flag or hit a barrier. Generally we prefer wall clock time (i.e. RUBY_CYCLES) rather than CPI as a performance metric for multiprocessor runs.

Thanks/Regards.

Chow Kian Sim

National University of Singapore

Regards,
Dan

------------------------------------------------------------------------


_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx <mailto:Gems-users@xxxxxxxxxxx>
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


--
http://www.cs.wisc.edu/~gibson <http://www.cs.wisc.edu/%7Egibson>
[esc]:wq!
------------------------------------------------------------------------

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


--
http://www.cs.wisc.edu/~gibson [esc]:wq!

[← Prev in Thread] Current Thread [Next in Thread→]