I used 8 cores on 4 thread simulation of 3 spawned and 1 master.
Previously, I used 3 bound threads and 1 unbound master thread.
The new simulation takes time to complete, but it seems that Polina’s
suggestion is working beyond the error point.
As sjafri said, context switching must have been the problem and
now all the threads including the master threads are bound, there must not be context
switching.
Thanks Polina and sjafri.
From:
gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx] On
Behalf Of Polina Dudnik
Sent: Saturday, March 21, 2009 7:02 PM
To: Gems Users
Subject: Re: [Gems-users] An assertion error from LogTM and thread
binding
You are precisely right. An in
my earlier email I mentioned that the number of threads must be at least one
fewer than the number of processors. Then you won't have context switches. So,
you shouldn't be executing N threads on N-1 processors. Only N threads on
>=N+1 processors
On Sat, Mar 21, 2009 at 8:59 PM, <sjafri@xxxxxxxxxx> wrote:
if N threads are being executed (bound to) on N-1
processors, there would be a
context switch. I have confirmed this through instruction traces
Quoting Polina Dudnik <pdudnik@xxxxxxxxx>:
> In a sense that context switches should not occur within a transaction if
> everything is set up correctly.
>
> On Sat, Mar 21, 2009 at 8:55 PM, Polina Dudnik <pdudnik@xxxxxxxxx> wrote:
>
> > You shouldn't have a problem with context switches if the threads are
> bound
> > correctly and there is no system calls inside the transaction.
> >
> >
> > On Sat, Mar 21, 2009 at 8:53 PM, <sjafri@xxxxxxxxxx> wrote:
> >
> >> I think its a contest switch.
> >>
> >> LogTM-SE uses a transaction level to check if a processor is
currently
> >> executing
> >> a transaction. Suppose you are running a thread which is not
executing
> >> transactional code, transaction level would be < 1. Then if
there is a
> >> context
> >> switch, logTM would be unaware of it. Suppose further that the
new thread
> >> is in
> >> the middle of transaction. It would call commit and you
would get the
> >> assertion
> >> violation that transaction level is < 1
> >>
> >>
> >>
> >>
> >>
> >> Quoting BYONG WU CHONG <bernard.chong@xxxxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > I ran SPLASH2 barnes on GEMS TM simulator configured in EE
LogTM. I got
> >> this
> >> > error.
> >> >
> >> > --------------------------- Error Msg Begin
---------------------------
> >> > commitTransaction ERROR NOT IN XACT proc =3 logical_proc = 3
xid = 0
> >> isOpen =
> >> > 0 tid = -1 pc = [0x13db4, line 0x13d80] level = 0 time =
31050951
> >> > simics-common: log_tm/TransactionInterfaceManager.C:243:
void
> >> > TransactionInterfaceManager::commitTransaction(int, int,
bool):
> >> Assertion
> >> > `m_transactionLevel[thread] >= 1' failed.
> >> > Abort (SIGABRT) in main thread
> >> > The simulation state has been corrupted. Simulation cannot
continue.
> >> > Please restart Simics.
> >> > Starting command line. (May have skipped commands in script
files.)
> >> > [cpu3] v:0x0000000000013db4 p:0x000e0813db4 magic
(sethi 0x800, %g0)
> >> > Setting new inspection cpu: cpu3
> >> > Traceback (most recent call last):
> >> > File "../../../gen-scripts/mfacet.py", line
308, in
> >> > console_branch_internal
> >> > wait_for_string(get_console(), __prompt)
> >> > File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/text_console_common.py",
> >> > line 10, in wait_for_string
> >> > wait_for_obj_hap("Xterm_Break_String",
obj, break_id)
> >> > File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py",
> >> > line 3374, in wait_for_obj_hap
> >> > return wait_for_hap_common([hap_name, name,
idx0])
> >> > File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py",
> >> > line 3352, in wait_for_hap_common
> >> > raise SimExc_Break, "Script branch
interrupted"
> >> > sim_core.SimExc_Break: Script branch interrupted
> >> > Exception in python branch
> >> > simics>
> >> > --------------------------- Error Msg End
---------------------------
> >> >
> >> > It sounds like commitTransaction() has been called without
prior call
> of
> >> > beginTransaction(). But how?
> >> > I checked that every UNLOCK macros are always called after a
LOCK
> macro.
> >> >
> >> > I searched Google and I found this a-year-old thread which
had no
> >> answer.
> >> > https://lists.cs.wisc.edu/archive/gems-users/2008-March/msg00124.shtml
> >> >
> >> > Since I cannot bind a working cpu to a processor set, it
seems that
> >> unbound
> >> > thread is running around beginning a transaction here and
commiting the
> >> same
> >> > transaction there causing a trouble.
> >> >
> >> > Information about thread binding is here
> >> >
> >> https://lists.cs.wisc.edu/archive/gems-users/2007-October/msg00049.shtml
> >> > According to this answer, I cannot bind the last thread to
the last
> >> remaining
> >> > cpu.
> >> >
> >> > Should I bind the last thread to one of N-1 processor set?
But as far
> as
> >> I
> >> > know, two threads binding to one cpu isn't a good idea.
> >> >
> >> > The Simics script I used for barnes simulation is this.
> >> >
> >> > --------------------------- barnes.simics Begin
> >> ---------------------------
> >> > @sys.path.append("../../../gen-scripts")
> >> > @cwd = os.getcwd()
> >> > @work_name = "barnes"
> >> > @binary = "BARNES_local"
> >> > @mb_dir = "benchmarks/SPLASH2/%s" % work_name
> >> > @lib_path = "../../../libs/Solaris_SPARC"
> >> > @import mfacet, tm_ee
> >> > @from mfacet import *
> >> >
> >> > @num_proc = SIM_number_processors()
> >> > #@if num_proc > 1:
> >> > # num_proc = num_proc - 1
> >> >
> >> >
> >> > # These commands are useful.
> >> > # "isainfo -b\n",
> >> > # "isalist\n",
> >> > # "psrinfo\n",
> >> >
> >> > magic-break-enable
> >> > @console_commands(("ulimit -c 0\n",
> >> >
"psrset -c 0\n",
> >> >
"psrset -c 1\n",
> >> >
"psrset -c 2\n",
> >> >
"mount /host\n",
> >> >
"export LD_LIBRARY_PATH=%s\n" % lib_path,
> >> >
"cd /host/%s/../../../%s \n" % (cwd, mb_dir),
> >> >
"./%s < input%02d \n" % (binary, num_proc),
> >> >
), "#")
> >> > c
> >> > # Note that this is the first magic breakpoint
> >> > @tm_ee.start_TM()
> >> > @conf.sim.cpu_switch_time = 1
> >> > # This is the second magic breakpoint
> >> > c
> >> > @mfacet.run_sim_command("ruby0.dump-stats
SPLASH2_%s_LLTM_%02d.stats" %
> >> > (work_name, num_proc))
> >> > --------------------------- barnes.simics End
> >> ---------------------------
> >> >
> >> > Could someone help me why I am having this assertion error?
> >> >
> >> > - Bernard
> >> >
> >>
> >>
> >> _______________________________________________
> >> Gems-users mailing list
> >> Gems-users@xxxxxxxxxxx
> >> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> >> Use Google to search the GEMS Users mailing list by adding
"site:
> >> https://lists.cs.wisc.edu/archive/gems-users/" to your
search.
> >>
> >>
> >
>
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"
to your search.
|
|