No, you're not missing anything. Its actually not possible to always
have ALL cpus -- one is always needed to service interrupts. You can
still run jobs on cpu 0, you can still processor_bind() to cpu 0, but
you're correct in that you can't dedicate it to a thread. You simply
have to rely on the common case for maintaining your 2^x conditions --
in the absence of interrupts, etc., you will have all your CPUs and CPU
0 will almost always be used to run your one 'unbound' thread.
Regards,
Dan
Hemayet Hossain wrote:
Hi Dan,
I am not sure but I guess I learned that in Solaris, processor 0 can't
assigned to any processor set. If that is true then by using
pset_bind() we can't bind any thread to processor 0 and loosing one
cpu (which will create problem for maintaining 2^x conditions
requirement in many cases). Am I missing something?
Thanks,
Hemayet
Dan Gibson wrote:
Solaris options for processor binding are processor_bind() and
pset_create()/pset_bind(). We generally prefer pset_bind(). On Linux,
you should look at sched_setaffinity(). I honestly don't know how well
sched_setaffinity() is enforced -- I wouldn't be surprised if it wasn't
enforced at all.
Hemayet Hossain wrote:
Hi Soohong,
Can you plz try some splash2 benchmarks like lu/water or so? SPEC OMP
may have less synchronizations overhead.
I am not sure why the thread will migrate to another CPU once it has
binded to one CPU. How did you detect that? BTW, I used following code
to bind current running thread to a processor (bindId).
LOCK(global->bindIdLock,(int)tid);
bindId = global->bindId++;
UNLOCK(global->bindIdLock,(int)tid);
while(processor_bind(P_LWPID, P_MYID, bindId , NULL) != 0){
LOCK(global->bindIdLock,(int)tid);
bindId = global->bindId++;
UNLOCK(global->bindIdLock,(int)tid);
}
Thanks,
Hemayet
soohong p kim wrote:
Hemayet,
I have not see any significant increase in synchronization overhead in my
Simics+Ruby (x86+Tango-based Linux target) for SPEC OMP (OpenMP) benchmarks.
BTW, based on my observation, threads migrated from one physical CPU core to
another in a single-CMP Simics target. How did you handle thread-processor
affinity? Could you tell us how did you bind a thread to a specific CPU or
CPU core? And did thread-CPU affinity impact synchronization overhead?
Soohong
Hemayet Hossain wrote:
... I have binded each thread to a specific processor (one-to-one)...
-----Original Message-----
From: gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx]
On Behalf Of Dan Gibson
Sent: Friday, December 14, 2007 3:58 AM
To: Gems Users
Subject: Re: [Gems-users] Time spent in synchronization
Its really hard to say if it is reasonable or not without lots of
details of the real machine and the workloads. However, if the workload
1) fits in the cache (16MB seems like it might hold a lot of SPLASH-2)
and 2) Sees a lot of multicycle ALU operations then its reasonably
likely that synchronization will start to dominate by effectively
shortening non-synchronization time.
Unfortunately, its rather hard to determine how much increase is really
reasonable. You can try playing around with the target's cache sizes
(perhaps sizing the shared cache to the size of a single private cache
in your SunFire -- /usr/platform/sun4u/sbin/prtdiag -v should do the
trick), but it is rather hard to make up for the IPC=1 core assumption.
Regards,
Dan
Hemayet Hossain wrote:
Hi Dan,
Thanks a lot. Yes, my real system is a 16 processors Sun-FireSystem. I
agree with your explanation; but I was wondering whether the
difference can be that much. Like, I am getting 12% -->42%,
28%->75%, 54%->82%, 13%->60% for four different applications (from
RUBY_CYCLES). Do you think this is reasonable?
Thanks once again,
Hemayet
Dan Gibson wrote:
I think that is the right approach for measuring simulation time. But is
the increased synchronization time all that surprising? I see from your
response to Mike that your real machine is a 16-processor v9-based
(sun4u -- is it a SunFire of some sort?), and your target is a
16-processor CMP with a big shared L2.
Depending on the performance disparity between real and target machines,
it might be correct to show increased synchronization time under
simulation. (Recall that Simics+Ruby uses a very simple processor model
(IPC=1 for all non-memory operations). This can artificially inflate the
apparent performance of a processor by abstracting away pipeline
details.)
Regards,
Dan
Hemayet Hossain wrote:
Hi Dan,
I have collected the time through RUBY_CYCLES also. For that I have
passed the binded proc id (kept in an array in program) with Magic
calls for start and finish of lock/barriers calls and used that proc
id for keeping track of which proc is in synchronization and which one
is not. Do you think my approach is wrong?
Thanks,
Hemayet
Dan Gibson wrote:
gethrtime() is bogus under simulation. Solaris's view of time is
horribly skewed under Simics alone or Simics+Ruby. Try measuring using
RUBY_CYCLES instead.
Hemayet Hossain wrote:
Hi All,
I am simulating some splash2 benchmarks by using ruby with simics
2.2.19
(Solaris 10) and to characterize the time spent in synchronization, I
have instrumented the synchronization calls like locks and barrier. I
have binded each thread to a specific processor (one-to-one) and
collecting the time by calling high resolution timer gethrtime(). In
real machine run (having 16 processors) for 16 threads I get around
19%
time spent on synchronization for a program. If I run the same program
in simics without ruby, I also get similar percentage of time spent in
synchronization.
But If I run the same program in simics with ruby, the time spent in
synchronization is much higher (goes around 75% of total). I have
collected the time from both programs and from ruby. Both are getting
almost same percentage number. I am using MESI_SCMP_directory like
protocol having 2 cycles for L1 and 14 cycles for L2 access.
Does anyone have any idea what's going on? What wrong with my setup? I
would really appreciate your reply.
Thanks,
Hemayet
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
------------------------------------------------------------------------
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
------------------------------------------------------------------------
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
------------------------------------------------------------------------
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
------------------------------------------------------------------------
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
--
http://www.cs.wisc.edu/~gibson [esc]:wq!
|