Re: [Gems-users] An assertion error from LogTM and thread binding


Date: Sat, 21 Mar 2009 20:49:31 -0400
From: Polina Dudnik <pdudnik@xxxxxxxxx>
Subject: Re: [Gems-users] An assertion error from LogTM and thread binding
How many processors are in your checkpoint and how many threads are you trying to run? The number of threads should always be one fewer than the number of processors because you need to save one processor for the main thread.

On Sat, Mar 21, 2009 at 8:20 PM, BYONG WU CHONG <bernard.chong@xxxxxxxx> wrote:

Hi,

 

I ran SPLASH2 barnes on GEMS TM simulator configured in EE LogTM. I got this error.

 

--------------------------- Error Msg Begin ---------------------------

commitTransaction ERROR NOT IN XACT proc =3 logical_proc = 3 xid = 0 isOpen = 0 tid = -1 pc = [0x13db4, line 0x13d80] level = 0 time = 31050951

simics-common: log_tm/TransactionInterfaceManager.C:243: void TransactionInterfaceManager::commitTransaction(int, int, bool): Assertion `m_transactionLevel[thread] >= 1' failed.

Abort (SIGABRT) in main thread

The simulation state has been corrupted. Simulation cannot continue.

Please restart Simics.

Starting command line. (May have skipped commands in script files.)

[cpu3] v:0x0000000000013db4 p:0x000e0813db4  magic (sethi 0x800, %g0)

Setting new inspection cpu: cpu3

Traceback (most recent call last):

  File "../../../gen-scripts/mfacet.py", line 308, in console_branch_internal

    wait_for_string(get_console(), __prompt)

  File "/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/text_console_common.py", line 10, in wait_for_string

    wait_for_obj_hap("Xterm_Break_String", obj, break_id)

  File "/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py", line 3374, in wait_for_obj_hap

    return wait_for_hap_common([hap_name, name, idx0])

  File "/uusoc/facility/res/arch/tools/corvus/simics-3.0.31/x86-linux/lib/python/cli_impl.py", line 3352, in wait_for_hap_common

    raise SimExc_Break, "Script branch interrupted"

sim_core.SimExc_Break: Script branch interrupted

Exception in python branch

simics>

--------------------------- Error Msg End ---------------------------

 

It sounds like commitTransaction() has been called without prior call of beginTransaction(). But how?

I checked that every UNLOCK macros are always called after a LOCK macro.

 

I searched Google and I found this a-year-old thread which had no answer.

https://lists.cs.wisc.edu/archive/gems-users/2008-March/msg00124.shtml

 

Since I cannot bind a working cpu to a processor set, it seems that unbound thread is running around beginning a transaction here and commiting the same transaction there causing a trouble.

 

Information about thread binding is here

https://lists.cs.wisc.edu/archive/gems-users/2007-October/msg00049.shtml

According to this answer, I cannot bind the last thread to the last remaining cpu.

 

Should I bind the last thread to one of N-1 processor set? But as far as I know, two threads binding to one cpu isn't a good idea.

 

The Simics script I used for barnes simulation is this.

 

--------------------------- barnes.simics Begin ---------------------------

@sys.path.append("../../../gen-scripts")

@cwd = os.getcwd()

@work_name = "barnes"

@binary = "BARNES_local"

@mb_dir = "benchmarks/SPLASH2/%s" % work_name

@lib_path = "../../../libs/Solaris_SPARC"

@import mfacet, tm_ee

@from mfacet import *

 

@num_proc = SIM_number_processors()

#@if num_proc > 1:

#    num_proc = num_proc - 1

 

 

# These commands are useful.   

# "isainfo -b\n",

# "isalist\n",

# "psrinfo\n",

 

magic-break-enable

@console_commands(("ulimit -c 0\n",

                   "psrset -c 0\n",

                   "psrset -c 1\n",

                   "psrset -c 2\n",

                   "mount /host\n",

                   "export LD_LIBRARY_PATH=%s\n" % lib_path,

                   "cd /host/%s/../../../%s \n" % (cwd, mb_dir),

                   "./%s < input%02d \n" % (binary, num_proc),

                   ), "#")

c

# Note that this is the first magic breakpoint

@tm_ee.start_TM()

@conf.sim.cpu_switch_time = 1

# This is the second magic breakpoint

c

@mfacet.run_sim_command("ruby0.dump-stats SPLASH2_%s_LLTM_%02d.stats" % (work_name, num_proc))

--------------------------- barnes.simics End ---------------------------

 

Could someone help me why I am having this assertion error?

 

- Bernard


_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.



[← Prev in Thread] Current Thread [Next in Thread→]