Re: [Gems-users] Number of instructions executed per core


Date: Thu, 25 Feb 2010 14:02:43 -0600
From: Dan Gibson <degibson@xxxxxxxx>
Subject: Re: [Gems-users] Number of instructions executed per core
I believe what you are observing is inherent to the simulation, and to real executions, for the following reasons:

1. Lack of explicit locking does not imply lack of spinning. There are implicit barriers at the end of most OpenMP #pragmas, and looking at art's source code, I see that there are several #pragmas without barrier elision. Moreover, it is possible to spin in the OS, or in other processes.
2. It is entirely possible that something other than art is running on your cores. Look into pset_bind, and processor_bind. With openMP, I'd recommend pset_create instead of explicit binding.
3. /Simics/ does not choose what code runs on which core. The operating system does that. Look for ways to affect the OS, not Simics.
4. I'm sure its been said on this list before (because I have said it) that instruction count is a BAD metric for multithreaded code.
5. art on my serengeti target takes a LOT of TLB misses (one almost every iteration). I'm not sure if individual cores would react differently or not to TLB misses.
6. art uses a dynamically-scheduled parallel section. Load imbalance in those iterations would cause one core to lag or complete early.

Regards,
Dan

On Thu, Feb 25, 2010 at 1:51 PM, <ubaid001@xxxxxxx> wrote:

Since there are no mutex locks, no processor is spinning. I have only my benchmark running on my Simics Target Machine (Serengeti). Is there a possiblity that the faulty core is running some other program rather than the art thread?

Also is there any way in Simics, to bind a thread to a particular processor?
So that i know for sure that all my processors are running the user thread.



Suhail



On Feb 25 2010, ubaid001@xxxxxxx wrote:

Hi,

I had brought this issue earlier. I am running Spec Open MP benchmark (art)
on a 4 core CMP system (OPAL + RUBY).

But there seems to be a huge difference in the number of instructions executed between one processor and the rest. I know that there are no mutex
locks in the code. Infact I load opal and ruby only from the parallel section of the program.

The one core either lags behind or leads the other processor. And this happens on every single simulation.

Can anyone shed more light on this?

Suhail

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.




--
http://www.cs.wisc.edu/~gibson [esc]:wq!
[← Prev in Thread] Current Thread [Next in Thread→]