[Gems-users] An assertion error from LogTM and thread binding


Date: Sat, 21 Mar 2009 18:20:18 -0600
From: BYONG WU CHONG <bernard.chong@xxxxxxxx>
Subject: [Gems-users] An assertion error from LogTM and thread binding

Hi,

 

I ran SPLASH2 barnes on GEMS TM simulator configured in EE LogTM. I got this error.

 

--------------------------- Error Msg Begin ---------------------------

commitTransaction ERROR NOT IN XACT proc =3 logical_proc = 3 xid = 0 isOpen = 0 tid = -1 pc = [0x13db4, line 0x13d80] level = 0 time = 31050951

simics-common: log_tm/TransactionInterfaceManager.C:243: void TransactionInterfaceManager::commitTransaction(int, int, bool): Assertion `m_transactionLevel[thread] >= 1' failed.

Abort (SIGABRT) in main thread

The simulation state has been corrupted. Simulation cannot continue.

Please restart Simics.

Starting command line. (May have skipped commands in script files.)

[cpu3] v:0x0000000000013db4 p:0x000e0813db4  magic (sethi 0x800, %g0)

Setting new inspection cpu: cpu3

Traceback (most recent call last):

  File "../../../gen-scripts/mfacet.py", line 308, in console_branch_internal

    wait_for_string(get_console(), __prompt)

  File "/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/text_console_common.py", line 10, in wait_for_string

    wait_for_obj_hap("Xterm_Break_String", obj, break_id)

  File "/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py", line 3374, in wait_for_obj_hap

    return wait_for_hap_common([hap_name, name, idx0])

  File "/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py", line 3352, in wait_for_hap_common

    raise SimExc_Break, "Script branch interrupted"

sim_core.SimExc_Break: Script branch interrupted

Exception in python branch

simics>

--------------------------- Error Msg End ---------------------------

 

It sounds like commitTransaction() has been called without prior call of beginTransaction(). But how?

I checked that every UNLOCK macros are always called after a LOCK macro.

 

I searched Google and I found this a-year-old thread which had no answer.

https://lists.cs.wisc.edu/archive/gems-users/2008-March/msg00124.shtml

 

Since I cannot bind a working cpu to a processor set, it seems that unbound thread is running around beginning a transaction here and commiting the same transaction there causing a trouble.

 

Information about thread binding is here

https://lists.cs.wisc.edu/archive/gems-users/2007-October/msg00049.shtml

According to this answer, I cannot bind the last thread to the last remaining cpu.

 

Should I bind the last thread to one of N-1 processor set? But as far as I know, two threads binding to one cpu isn't a good idea.

 

The Simics script I used for barnes simulation is this.

 

--------------------------- barnes.simics Begin ---------------------------

@sys.path.append("../../../gen-scripts")

@cwd = os.getcwd()

@work_name = "barnes"

@binary = "BARNES_local"

@mb_dir = "benchmarks/SPLASH2/%s" % work_name

@lib_path = "../../../libs/Solaris_SPARC"

@import mfacet, tm_ee

@from mfacet import *

 

@num_proc = SIM_number_processors()

#@if num_proc > 1:

#    num_proc = num_proc - 1

 

 

# These commands are useful.   

# "isainfo -b\n",

# "isalist\n",

# "psrinfo\n",

 

magic-break-enable

@console_commands(("ulimit -c 0\n",

                   "psrset -c 0\n",

                   "psrset -c 1\n",

                   "psrset -c 2\n",

                   "mount /host\n",

                   "export LD_LIBRARY_PATH=%s\n" % lib_path,

                   "cd /host/%s/../../../%s \n" % (cwd, mb_dir),

                   "./%s < input%02d \n" % (binary, num_proc),

                   ), "#")

c

# Note that this is the first magic breakpoint

@tm_ee.start_TM()

@conf.sim.cpu_switch_time = 1

# This is the second magic breakpoint

c

@mfacet.run_sim_command("ruby0.dump-stats SPLASH2_%s_LLTM_%02d.stats" % (work_name, num_proc))

--------------------------- barnes.simics End ---------------------------

 

Could someone help me why I am having this assertion error?

 

- Bernard

[← Prev in Thread] Current Thread [Next in Thread→]