Re: [Gems-users] Time spent in synchronization


Date: Fri, 14 Dec 2007 13:05:02 -0500
From: Hemayet Hossain <hossain@xxxxxxxxxxxxxxxx>
Subject: Re: [Gems-users] Time spent in synchronization
Hi Dan,
I am not sure but I guess I learned that in Solaris, processor 0 can't assigned to any processor set. If that is true then by using pset_bind() we can't bind any thread to processor 0 and loosing one cpu (which will create problem for maintaining 2^x conditions requirement in many cases). Am I missing something?
Thanks,
Hemayet

Dan Gibson wrote:
Solaris options for processor binding are processor_bind() and 
pset_create()/pset_bind(). We generally prefer pset_bind(). On Linux, 
you should look at sched_setaffinity(). I honestly don't know how well 
sched_setaffinity() is enforced -- I wouldn't be surprised if it wasn't 
enforced at all.

Hemayet Hossain wrote:
  
Hi Soohong,
Can you plz try some splash2 benchmarks like lu/water or so? SPEC OMP 
may have less synchronizations overhead.

I am not sure why the thread will migrate to another CPU once it has 
binded to one CPU. How did you detect that? BTW, I used following code 
to bind current running thread to a processor (bindId).

 LOCK(global->bindIdLock,(int)tid);
  bindId = global->bindId++;
  UNLOCK(global->bindIdLock,(int)tid);

  while(processor_bind(P_LWPID, P_MYID, bindId , NULL) != 0){  
       LOCK(global->bindIdLock,(int)tid);
       bindId = global->bindId++;
       UNLOCK(global->bindIdLock,(int)tid);
  }


Thanks,
Hemayet
soohong p kim wrote:
    
Hemayet,

I have not see any significant increase in synchronization overhead in my
Simics+Ruby (x86+Tango-based Linux target) for SPEC OMP (OpenMP) benchmarks.


BTW, based on my observation, threads migrated from one physical CPU core to
another in a single-CMP Simics target.  How did you handle thread-processor
affinity?  Could you tell us how did you bind a thread to a specific CPU or
CPU core?  And did thread-CPU affinity impact synchronization overhead?

Soohong

Hemayet Hossain wrote:
  
      
... I have binded each thread to a specific processor (one-to-one)...
    
        
-----Original Message-----
From: gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx]
On Behalf Of Dan Gibson
Sent: Friday, December 14, 2007 3:58 AM
To: Gems Users
Subject: Re: [Gems-users] Time spent in synchronization

Its really hard to say if it is reasonable or not without lots of 
details of the real machine and the workloads. However, if the workload 
1) fits in the cache (16MB seems like it might hold a lot of SPLASH-2) 
and 2) Sees a lot of multicycle ALU operations then its reasonably 
likely that synchronization will start to dominate by effectively 
shortening non-synchronization time.

Unfortunately, its rather hard to determine how much increase is really 
reasonable. You can try playing around with the target's cache sizes 
(perhaps sizing the shared cache to the size of a single private cache 
in your SunFire -- /usr/platform/sun4u/sbin/prtdiag -v should do the 
trick), but it is rather hard to make up for the IPC=1 core assumption.

Regards,
Dan

Hemayet Hossain wrote:
  
      
Hi Dan,
Thanks a lot. Yes, my real system is a 16 processors Sun-FireSystem. I 
agree with your explanation; but I was wondering whether the 
difference can be that  much. Like, I am getting  12% -->42%, 
28%->75%, 54%->82%, 13%->60% for four different applications (from 
RUBY_CYCLES). Do you think this is reasonable?
Thanks once again,
Hemayet


Dan Gibson wrote:
    
        
I think that is the right approach for measuring simulation time. But is 
the increased synchronization time all that surprising? I see from your 
response to Mike that your real machine is a 16-processor v9-based 
(sun4u -- is it a SunFire of some sort?), and your target is a 
16-processor CMP with a big shared L2.

Depending on the performance disparity between real and target machines, 
it might be correct to show increased synchronization time under 
simulation. (Recall that Simics+Ruby uses a very simple processor model 
(IPC=1 for all non-memory operations). This can artificially inflate the 
apparent performance of a processor by abstracting away pipeline
      
          
details.)
  
      
Regards,
Dan

Hemayet Hossain wrote:
  
      
          
Hi Dan,
I have collected the time through RUBY_CYCLES also. For that I have 
passed the binded proc id (kept in an array in program) with Magic 
calls for start and finish of lock/barriers calls and used that proc 
id for keeping track of which proc is in synchronization and which one 
is not. Do you think my approach is wrong?

Thanks,
Hemayet

Dan Gibson wrote:
    
        
            
gethrtime() is bogus under simulation. Solaris's view of time is 
horribly skewed under Simics alone or Simics+Ruby. Try measuring using 
RUBY_CYCLES instead.

Hemayet Hossain wrote:
  
      
          
              
Hi All,
I am simulating some splash2 benchmarks by using ruby with simics
            
                
2.2.19 
  
      
(Solaris 10) and to characterize the time spent in synchronization, I 
have instrumented the synchronization calls like locks and barrier. I 
have binded each thread to a specific processor (one-to-one) and 
collecting the time by calling high resolution timer gethrtime().  In 
real machine run (having 16 processors) for 16 threads I get around
            
                
19% 
  
      
time spent on synchronization for a program. If I run the same program
            
                
  
      
in simics without ruby, I also get similar percentage of time spent in
            
                
  
      
synchronization.

But If I run the same program in simics with ruby, the time spent in 
synchronization is much higher (goes around 75% of total).  I have 
collected the time from both programs and from ruby. Both are getting 
almost same percentage number. I am using MESI_SCMP_directory like 
protocol having 2 cycles for L1 and 14 cycles for L2 access.

Does anyone have any idea what's going on? What wrong with my setup? I
            
                
  
      
would really appreciate your reply.
Thanks,
Hemayet

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
            
                
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
  
      
  
    
        
            
                
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
          
              
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
  
      
  
      
          
              
------------------------------------------------------------------------

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
        
            
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
  
      
  
    
        
            
  
      
          
------------------------------------------------------------------------

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
    
        
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
  
      
  
    
        
  
      
------------------------------------------------------------------------

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.

  
    

  
[← Prev in Thread] Current Thread [Next in Thread→]