Re: [Gems-users] An assertion error from LogTM and thread binding


Date: Sat, 21 Mar 2009 23:34:51 -0600
From: BYONG WU CHONG <bernard.chong@xxxxxxxx>
Subject: Re: [Gems-users] An assertion error from LogTM and thread binding

I used 8 cores on 4 thread simulation of 3 spawned and 1 master.

Previously, I used 3 bound threads and 1 unbound master thread.

 

The new simulation takes time to complete, but it seems that Polina’s suggestion is working beyond the error point.

As sjafri said, context switching must have been the problem and now all the threads including the master threads are bound, there must not be context switching.

 

Thanks Polina and sjafri.

 

 

From: gems-users-bounces@xxxxxxxxxxx [mailto:gems-users-bounces@xxxxxxxxxxx] On Behalf Of Polina Dudnik
Sent: Saturday, March 21, 2009 7:02 PM
To: Gems Users
Subject: Re: [Gems-users] An assertion error from LogTM and thread binding

 

You are precisely right. An in my earlier email I mentioned that the number of threads must be at least one fewer than the number of processors. Then you won't have context switches. So, you shouldn't be executing N threads on N-1 processors. Only N threads on >=N+1 processors

On Sat, Mar 21, 2009 at 8:59 PM, <sjafri@xxxxxxxxxx> wrote:

if N threads are being executed (bound to) on N-1 processors, there would be a
context switch. I have confirmed this through instruction traces



Quoting Polina Dudnik <pdudnik@xxxxxxxxx>:

> In a sense that context switches should not occur within a transaction if
> everything is set up correctly.
>
> On Sat, Mar 21, 2009 at 8:55 PM, Polina Dudnik <pdudnik@xxxxxxxxx> wrote:
>
> > You shouldn't have a problem with context switches if the threads are
> bound
> > correctly and there is no system calls inside the transaction.
> >
> >
> > On Sat, Mar 21, 2009 at 8:53 PM, <sjafri@xxxxxxxxxx> wrote:
> >
> >> I think its a contest switch.
> >>
> >> LogTM-SE uses a transaction level to check if a processor is currently
> >> executing
> >> a transaction. Suppose you are running a thread which is not executing
> >> transactional code, transaction level would be < 1. Then if there is a
> >> context
> >> switch, logTM would be unaware of it. Suppose further that the new thread
> >> is in
> >> the middle of  transaction. It would call commit and you would get the
> >> assertion
> >> violation that transaction level is < 1
> >>
> >>
> >>
> >>
> >>
> >> Quoting BYONG WU CHONG <bernard.chong@xxxxxxxx>:
> >>
> >> > Hi,
> >> >
> >> > I ran SPLASH2 barnes on GEMS TM simulator configured in EE LogTM. I got
> >> this
> >> > error.
> >> >
> >> > --------------------------- Error Msg Begin ---------------------------
> >> > commitTransaction ERROR NOT IN XACT proc =3 logical_proc = 3 xid = 0
> >> isOpen =
> >> > 0 tid = -1 pc = [0x13db4, line 0x13d80] level = 0 time = 31050951
> >> > simics-common: log_tm/TransactionInterfaceManager.C:243: void
> >> > TransactionInterfaceManager::commitTransaction(int, int, bool):
> >> Assertion
> >> > `m_transactionLevel[thread] >= 1' failed.
> >> > Abort (SIGABRT) in main thread
> >> > The simulation state has been corrupted. Simulation cannot continue.
> >> > Please restart Simics.
> >> > Starting command line. (May have skipped commands in script files.)
> >> > [cpu3] v:0x0000000000013db4 p:0x000e0813db4  magic (sethi 0x800, %g0)
> >> > Setting new inspection cpu: cpu3
> >> > Traceback (most recent call last):
> >> >   File "../../../gen-scripts/mfacet.py", line 308, in
> >> > console_branch_internal
> >> >     wait_for_string(get_console(), __prompt)
> >> >   File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/text_console_common.py",
> >> > line 10, in wait_for_string
> >> >     wait_for_obj_hap("Xterm_Break_String", obj, break_id)
> >> >   File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py",
> >> > line 3374, in wait_for_obj_hap
> >> >     return wait_for_hap_common([hap_name, name, idx0])
> >> >   File
> >> >
> >>
> >>
>
"/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py",
> >> > line 3352, in wait_for_hap_common
> >> >     raise SimExc_Break, "Script branch interrupted"
> >> > sim_core.SimExc_Break: Script branch interrupted
> >> > Exception in python branch
> >> > simics>
> >> > --------------------------- Error Msg End ---------------------------
> >> >
> >> > It sounds like commitTransaction() has been called without prior call
> of
> >> > beginTransaction(). But how?
> >> > I checked that every UNLOCK macros are always called after a LOCK
> macro.
> >> >
> >> > I searched Google and I found this a-year-old thread which had no
> >> answer.
> >> > https://lists.cs.wisc.edu/archive/gems-users/2008-March/msg00124.shtml
> >> >
> >> > Since I cannot bind a working cpu to a processor set, it seems that
> >> unbound
> >> > thread is running around beginning a transaction here and commiting the
> >> same
> >> > transaction there causing a trouble.
> >> >
> >> > Information about thread binding is here
> >> >
> >> https://lists.cs.wisc.edu/archive/gems-users/2007-October/msg00049.shtml
> >> > According to this answer, I cannot bind the last thread to the last
> >> remaining
> >> > cpu.
> >> >
> >> > Should I bind the last thread to one of N-1 processor set? But as far
> as
> >> I
> >> > know, two threads binding to one cpu isn't a good idea.
> >> >
> >> > The Simics script I used for barnes simulation is this.
> >> >
> >> > --------------------------- barnes.simics Begin
> >> ---------------------------
> >> > @sys.path.append("../../../gen-scripts")
> >> > @cwd = os.getcwd()
> >> > @work_name = "barnes"
> >> > @binary = "BARNES_local"
> >> > @mb_dir = "benchmarks/SPLASH2/%s" % work_name
> >> > @lib_path = "../../../libs/Solaris_SPARC"
> >> > @import mfacet, tm_ee
> >> > @from mfacet import *
> >> >
> >> > @num_proc = SIM_number_processors()
> >> > #@if num_proc > 1:
> >> > #    num_proc = num_proc - 1
> >> >
> >> >
> >> > # These commands are useful.
> >> > # "isainfo -b\n",
> >> > # "isalist\n",
> >> > # "psrinfo\n",
> >> >
> >> > magic-break-enable
> >> > @console_commands(("ulimit -c 0\n",
> >> >                    "psrset -c 0\n",
> >> >                    "psrset -c 1\n",
> >> >                    "psrset -c 2\n",
> >> >                    "mount /host\n",
> >> >                    "export LD_LIBRARY_PATH=%s\n" % lib_path,
> >> >                    "cd /host/%s/../../../%s \n" % (cwd, mb_dir),
> >> >                    "./%s < input%02d \n" % (binary, num_proc),
> >> >                    ), "#")
> >> > c
> >> > # Note that this is the first magic breakpoint
> >> > @tm_ee.start_TM()
> >> > @conf.sim.cpu_switch_time = 1
> >> > # This is the second magic breakpoint
> >> > c
> >> > @mfacet.run_sim_command("ruby0.dump-stats SPLASH2_%s_LLTM_%02d.stats" %
> >> > (work_name, num_proc))
> >> > --------------------------- barnes.simics End
> >> ---------------------------
> >> >
> >> > Could someone help me why I am having this assertion error?
> >> >
> >> > - Bernard
> >> >
> >>
> >>
> >> _______________________________________________
> >> Gems-users mailing list
> >> Gems-users@xxxxxxxxxxx
> >> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> >> Use Google to search the GEMS Users mailing list by adding "site:
> >> https://lists.cs.wisc.edu/archive/gems-users/" to your search.
> >>
> >>
> >
>


_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.

 

[← Prev in Thread] Current Thread [Next in Thread→]