Re: [Gems-users] An assertion error from LogTM and thread binding


Date: Sat, 21 Mar 2009 21:02:24 -0400
From: Polina Dudnik <pdudnik@xxxxxxxxx>
Subject: Re: [Gems-users] An assertion error from LogTM and thread binding
You are precisely right. An in my earlier email I mentioned that the number of threads must be at least one fewer than the number of processors. Then you won't have context switches. So, you shouldn't be executing N threads on N-1 processors. Only N threads on >=N+1 processors

On Sat, Mar 21, 2009 at 8:59 PM, <sjafri@xxxxxxxxxx> wrote:
if N threads are being executed (bound to) on N-1 processors, there would be a
context switch. I have confirmed this through instruction traces


Quoting Polina Dudnik <pdudnik@xxxxxxxxx>:

> In a sense that context switches should not occur within a transaction if
> everything is set up correctly.
>
> On Sat, Mar 21, 2009 at 8:55 PM, Polina Dudnik <pdudnik@xxxxxxxxx> wrote:
>
> > You shouldn't have a problem with context switches if the threads are
> bound
> > correctly and there is no system calls inside the transaction.
> >
> >
> > On Sat, Mar 21, 2009 at 8:53 PM, <sjafri@xxxxxxxxxx> wrote:
> >
> >> I think its a contest switch.
> >>
> >> LogTM-SE uses a transaction level to check if a processor is currently
> >> executing
> >> a transaction. Suppose you are running a thread which is not executing
> >> transactional code, transaction level would be < 1. Then if there is a
> >> context
> >> switch, logTM would be unaware of it. Suppose further that the new thread
> >> is in
> >> the middle of  transaction. It would call commit and you would get the
> >> assertion
> >> violation that transaction level is < 1
> >>
> >>
> >>
> >>
> >>
> >> Quoting BYONG WU CHONG <bernard.chong@xxxxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > I ran SPLASH2 barnes on GEMS TM simulator configured in EE LogTM. I got
> >> this
> >> > error.
> >> >
> >> > --------------------------- Error Msg Begin ---------------------------
> >> > commitTransaction ERROR NOT IN XACT proc =3 logical_proc = 3 xid = 0
> >> isOpen =
> >> > 0 tid = -1 pc = [0x13db4, line 0x13d80] level = 0 time = 31050951
> >> > simics-common: log_tm/TransactionInterfaceManager.C:243: void
> >> > TransactionInterfaceManager::commitTransaction(int, int, bool):
> >> Assertion
> >> > `m_transactionLevel[thread] >= 1' failed.
> >> > Abort (SIGABRT) in main thread
> >> > The simulation state has been corrupted. Simulation cannot continue.
> >> > Please restart Simics.
> >> > Starting command line. (May have skipped commands in script files.)
> >> > [cpu3] v:0x0000000000013db4 p:0x000e0813db4  magic (sethi 0x800, %g0)
> >> > Setting new inspection cpu: cpu3
> >> > Traceback (most recent call last):
> >> >   File "../../../gen-scripts/mfacet.py", line 308, in
> >> > console_branch_internal
> >> >     wait_for_string(get_console(), __prompt)
> >> >   File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/text_console_common.py",
> >> > line 10, in wait_for_string
> >> >     wait_for_obj_hap("Xterm_Break_String", obj, break_id)
> >> >   File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py",
> >> > line 3374, in wait_for_obj_hap
> >> >     return wait_for_hap_common([hap_name, name, idx0])
> >> >   File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py",
> >> > line 3352, in wait_for_hap_common
> >> >     raise SimExc_Break, "Script branch interrupted"
> >> > sim_core.SimExc_Break: Script branch interrupted
> >> > Exception in python branch
> >> > simics>
> >> > --------------------------- Error Msg End ---------------------------
> >> >
> >> > It sounds like commitTransaction() has been called without prior call
> of
> >> > beginTransaction(). But how?
> >> > I checked that every UNLOCK macros are always called after a LOCK
> macro.
> >> >
> >> > I searched Google and I found this a-year-old thread which had no
> >> answer.
> >> > https://lists.cs.wisc.edu/archive/gems-users/2008-March/msg00124.shtml
> >> >
> >> > Since I cannot bind a working cpu to a processor set, it seems that
> >> unbound
> >> > thread is running around beginning a transaction here and commiting the
> >> same
> >> > transaction there causing a trouble.
> >> >
> >> > Information about thread binding is here
> >> >
> >> https://lists.cs.wisc.edu/archive/gems-users/2007-October/msg00049.shtml
> >> > According to this answer, I cannot bind the last thread to the last
> >> remaining
> >> > cpu.
> >> >
> >> > Should I bind the last thread to one of N-1 processor set? But as far
> as
> >> I
> >> > know, two threads binding to one cpu isn't a good idea.
> >> >
> >> > The Simics script I used for barnes simulation is this.
> >> >
> >> > --------------------------- barnes.simics Begin
> >> ---------------------------
> >> > @sys.path.append("../../../gen-scripts")
> >> > @cwd = os.getcwd()
> >> > @work_name = "barnes"
> >> > @binary = "BARNES_local"
> >> > @mb_dir = "benchmarks/SPLASH2/%s" % work_name
> >> > @lib_path = "../../../libs/Solaris_SPARC"
> >> > @import mfacet, tm_ee
> >> > @from mfacet import *
> >> >
> >> > @num_proc = SIM_number_processors()
> >> > #@if num_proc > 1:
> >> > #    num_proc = num_proc - 1
> >> >
> >> >
> >> > # These commands are useful.
> >> > # "isainfo -b\n",
> >> > # "isalist\n",
> >> > # "psrinfo\n",
> >> >
> >> > magic-break-enable
> >> > @console_commands(("ulimit -c 0\n",
> >> >                    "psrset -c 0\n",
> >> >                    "psrset -c 1\n",
> >> >                    "psrset -c 2\n",
> >> >                    "mount /host\n",
> >> >                    "export LD_LIBRARY_PATH=%s\n" % lib_path,
> >> >                    "cd /host/%s/../../../%s \n" % (cwd, mb_dir),
> >> >                    "./%s < input%02d \n" % (binary, num_proc),
> >> >                    ), "#")
> >> > c
> >> > # Note that this is the first magic breakpoint
> >> > @tm_ee.start_TM()
> >> > @conf.sim.cpu_switch_time = 1
> >> > # This is the second magic breakpoint
> >> > c
> >> > @mfacet.run_sim_command("ruby0.dump-stats SPLASH2_%s_LLTM_%02d.stats" %
> >> > (work_name, num_proc))
> >> > --------------------------- barnes.simics End
> >> ---------------------------
> >> >
> >> > Could someone help me why I am having this assertion error?
> >> >
> >> > - Bernard
> >> >
> >>
> >>
> >> _______________________________________________
> >> Gems-users mailing list
> >> Gems-users@xxxxxxxxxxx
> >> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> >> Use Google to search the GEMS Users mailing list by adding "site:
> >> https://lists.cs.wisc.edu/archive/gems-users/" to your search.
> >>
> >>
> >
>


_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.


[← Prev in Thread] Current Thread [Next in Thread→]