Re: [Gems-users] Time spent in synchronization


Date: Sat, 15 Dec 2007 09:03:09 +0900
From: "soohong p kim" <spkim@xxxxxxxxxx>
Subject: Re: [Gems-users] Time spent in synchronization
Hi Hemayet,

I've created new instructions to support inter-thread synchronization and
created simics decoder module to decode new instructions. From the simics
decoder module, I was able to find out the physical CPU ID that executes the
current instruction.  Since I passed thread ID as an operand for debugging
purpose, I've noticed that instructions from a thread were executed in
various CPU cores in the system.  

Thanks,
Soohong

Hemayet Hossain wrote:
> I am not sure why the thread will migrate to another CPU once it has 
> binded to one CPU. How did you detect that?


-----Original Message-----
From: gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx]
On Behalf Of Dan Gibson
Sent: Saturday, December 15, 2007 3:13 AM
To: Gems Users
Subject: Re: [Gems-users] Time spent in synchronization

No, you're not missing anything. Its actually not possible to always 
have ALL cpus -- one is always needed to service interrupts. You can 
still run jobs on cpu 0, you can still processor_bind() to cpu 0, but 
you're correct in that you can't dedicate it to a thread. You simply 
have to rely on the common case for maintaining your 2^x conditions -- 
in the absence of interrupts, etc., you will have all your CPUs and CPU 
0 will almost always be used to run your one 'unbound' thread.

Regards,
Dan

Hemayet Hossain wrote:
> Hi Dan,
> I am not sure but I guess I learned that in Solaris, processor 0 can't 
> assigned to any processor set. If that is true then by using 
> pset_bind() we can't bind any thread to processor 0 and loosing one 
> cpu (which will create problem for maintaining 2^x conditions 
> requirement in many cases). Am I missing something?
> Thanks,
> Hemayet
>
> Dan Gibson wrote:
>> Solaris options for processor binding are processor_bind() and 
>> pset_create()/pset_bind(). We generally prefer pset_bind(). On Linux, 
>> you should look at sched_setaffinity(). I honestly don't know how well 
>> sched_setaffinity() is enforced -- I wouldn't be surprised if it wasn't 
>> enforced at all.
>>
>> Hemayet Hossain wrote:
>>   
>>> Hi Soohong,
>>> Can you plz try some splash2 benchmarks like lu/water or so? SPEC OMP 
>>> may have less synchronizations overhead.
>>>
>>> I am not sure why the thread will migrate to another CPU once it has 
>>> binded to one CPU. How did you detect that? BTW, I used following code 
>>> to bind current running thread to a processor (bindId).
>>>
>>>  LOCK(global->bindIdLock,(int)tid);
>>>   bindId = global->bindId++;
>>>   UNLOCK(global->bindIdLock,(int)tid);
>>>
>>>   while(processor_bind(P_LWPID, P_MYID, bindId , NULL) != 0){  
>>>        LOCK(global->bindIdLock,(int)tid);
>>>        bindId = global->bindId++;
>>>        UNLOCK(global->bindIdLock,(int)tid);
>>>   }
>>>
>>>
>>> Thanks,
>>> Hemayet
>>> soohong p kim wrote:
>>>     
>>>> Hemayet,
>>>>
>>>> I have not see any significant increase in synchronization overhead in
my
>>>> Simics+Ruby (x86+Tango-based Linux target) for SPEC OMP (OpenMP)
benchmarks.
>>>>
>>>>
>>>> BTW, based on my observation, threads migrated from one physical CPU
core to
>>>> another in a single-CMP Simics target.  How did you handle
thread-processor
>>>> affinity?  Could you tell us how did you bind a thread to a specific
CPU or
>>>> CPU core?  And did thread-CPU affinity impact synchronization overhead?
>>>>
>>>> Soohong
>>>>
>>>> Hemayet Hossain wrote:
>>>>   
>>>>       
>>>>> ... I have binded each thread to a specific processor (one-to-one)...
>>>>>     
>>>>>         
>>>> -----Original Message-----
>>>> From: gems-users-bounces@xxxxxxxxxxx
[mailto:gems-users-bounces@xxxxxxxxxxx]
>>>> On Behalf Of Dan Gibson
>>>> Sent: Friday, December 14, 2007 3:58 AM
>>>> To: Gems Users
>>>> Subject: Re: [Gems-users] Time spent in synchronization
>>>>
>>>> Its really hard to say if it is reasonable or not without lots of 
>>>> details of the real machine and the workloads. However, if the workload

>>>> 1) fits in the cache (16MB seems like it might hold a lot of SPLASH-2) 
>>>> and 2) Sees a lot of multicycle ALU operations then its reasonably 
>>>> likely that synchronization will start to dominate by effectively 
>>>> shortening non-synchronization time.
>>>>
>>>> Unfortunately, its rather hard to determine how much increase is really

>>>> reasonable. You can try playing around with the target's cache sizes 
>>>> (perhaps sizing the shared cache to the size of a single private cache 
>>>> in your SunFire -- /usr/platform/sun4u/sbin/prtdiag -v should do the 
>>>> trick), but it is rather hard to make up for the IPC=1 core assumption.
>>>>
>>>> Regards,
>>>> Dan
>>>>
>>>> Hemayet Hossain wrote:
>>>>   
>>>>       
>>>>> Hi Dan,
>>>>> Thanks a lot. Yes, my real system is a 16 processors Sun-FireSystem. I

>>>>> agree with your explanation; but I was wondering whether the 
>>>>> difference can be that  much. Like, I am getting  12% -->42%, 
>>>>> 28%->75%, 54%->82%, 13%->60% for four different applications (from 
>>>>> RUBY_CYCLES). Do you think this is reasonable?
>>>>> Thanks once again,
>>>>> Hemayet
>>>>>
>>>>>
>>>>> Dan Gibson wrote:
>>>>>     
>>>>>         
>>>>>> I think that is the right approach for measuring simulation time. But
is 
>>>>>> the increased synchronization time all that surprising? I see from
your 
>>>>>> response to Mike that your real machine is a 16-processor v9-based 
>>>>>> (sun4u -- is it a SunFire of some sort?), and your target is a 
>>>>>> 16-processor CMP with a big shared L2.
>>>>>>
>>>>>> Depending on the performance disparity between real and target
machines, 
>>>>>> it might be correct to show increased synchronization time under 
>>>>>> simulation. (Recall that Simics+Ruby uses a very simple processor
model 
>>>>>> (IPC=1 for all non-memory operations). This can artificially inflate
the 
>>>>>> apparent performance of a processor by abstracting away pipeline
>>>>>>       
>>>>>>           
>>>> details.)
>>>>   
>>>>       
>>>>>> Regards,
>>>>>> Dan
>>>>>>
>>>>>> Hemayet Hossain wrote:
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>> Hi Dan,
>>>>>>> I have collected the time through RUBY_CYCLES also. For that I have 
>>>>>>> passed the binded proc id (kept in an array in program) with Magic 
>>>>>>> calls for start and finish of lock/barriers calls and used that proc

>>>>>>> id for keeping track of which proc is in synchronization and which
one 
>>>>>>> is not. Do you think my approach is wrong?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Hemayet
>>>>>>>
>>>>>>> Dan Gibson wrote:
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>> gethrtime() is bogus under simulation. Solaris's view of time is 
>>>>>>>> horribly skewed under Simics alone or Simics+Ruby. Try measuring
using 
>>>>>>>> RUBY_CYCLES instead.
>>>>>>>>
>>>>>>>> Hemayet Hossain wrote:
>>>>>>>>   
>>>>>>>>       
>>>>>>>>           
>>>>>>>>               
>>>>>>>>> Hi All,
>>>>>>>>> I am simulating some splash2 benchmarks by using ruby with simics
>>>>>>>>>             
>>>>>>>>>                 
>>>> 2.2.19 
>>>>   
>>>>       
>>>>>>>>> (Solaris 10) and to characterize the time spent in
synchronization, I 
>>>>>>>>> have instrumented the synchronization calls like locks and
barrier. I 
>>>>>>>>> have binded each thread to a specific processor (one-to-one) and 
>>>>>>>>> collecting the time by calling high resolution timer gethrtime().
In 
>>>>>>>>> real machine run (having 16 processors) for 16 threads I get
around
>>>>>>>>>             
>>>>>>>>>                 
>>>> 19% 
>>>>   
>>>>       
>>>>>>>>> time spent on synchronization for a program. If I run the same
program
>>>>>>>>>             
>>>>>>>>>                 
>>>>   
>>>>       
>>>>>>>>> in simics without ruby, I also get similar percentage of time
spent in
>>>>>>>>>             
>>>>>>>>>                 
>>>>   
>>>>       
>>>>>>>>> synchronization.
>>>>>>>>>
>>>>>>>>> But If I run the same program in simics with ruby, the time spent
in 
>>>>>>>>> synchronization is much higher (goes around 75% of total).  I have

>>>>>>>>> collected the time from both programs and from ruby. Both are
getting 
>>>>>>>>> almost same percentage number. I am using MESI_SCMP_directory like

>>>>>>>>> protocol having 2 cycles for L1 and 14 cycles for L2 access.
>>>>>>>>>
>>>>>>>>> Does anyone have any idea what's going on? What wrong with my
setup? I
>>>>>>>>>             
>>>>>>>>>                 
>>>>   
>>>>       
>>>>>>>>> would really appreciate your reply.
>>>>>>>>> Thanks,
>>>>>>>>> Hemayet
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gems-users mailing list
>>>>>>>>> Gems-users@xxxxxxxxxxx
>>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
>>>>>>>>> Use Google to search the GEMS Users mailing list by adding
>>>>>>>>>             
>>>>>>>>>                 
>>>> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>>>>   
>>>>       
>>>>>>>>>   
>>>>>>>>>     
>>>>>>>>>         
>>>>>>>>>             
>>>>>>>>>                 
>>>>>>>> _______________________________________________
>>>>>>>> Gems-users mailing list
>>>>>>>> Gems-users@xxxxxxxxxxx
>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
>>>>>>>> Use Google to search the GEMS Users mailing list by adding
>>>>>>>>           
>>>>>>>>               
>>>> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>>>>   
>>>>       
>>>>>>>>   
>>>>>>>>       
>>>>>>>>           
>>>>>>>>               
>>>>>>>
------------------------------------------------------------------------
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gems-users mailing list
>>>>>>> Gems-users@xxxxxxxxxxx
>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
>>>>>>> Use Google to search the GEMS Users mailing list by adding
>>>>>>>         
>>>>>>>             
>>>> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>>>>   
>>>>       
>>>>>>>   
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>
------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Gems-users mailing list
>>>>> Gems-users@xxxxxxxxxxx
>>>>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
>>>>> Use Google to search the GEMS Users mailing list by adding
>>>>>     
>>>>>         
>>>> "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>>>>   
>>>>       
>>>>>   
>>>>>     
>>>>>         
>>>>   
>>>>       
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Gems-users mailing list
>>> Gems-users@xxxxxxxxxxx
>>> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
>>> Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>>>
>>>   
>>>     
>>
>>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Gems-users mailing list
> Gems-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.
>
>   

-- 
http://www.cs.wisc.edu/~gibson [esc]:wq!

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


[← Prev in Thread] Current Thread [Next in Thread→]