Hi again,
looking a bit more at the code it seems that the problem is
restartTranscationCallback finds the wrong values for new_xact_level and
probably new_log_size. Maybe something going wrong during the abort?
Kind regards,
Kostis
Hi Jayaram,
I have sent an email with the output file, but it is too big probably to
go through and waits for a moderator's confirmation. I could upload it
somewhere else if you like to avoid sending it to everyone on the list.
In the meantime we are trying to find the problem ourselves and we are
stuck in the following case. We have thread 0 in proc 1 starting a
CLOSED transaction and the output is :
41181831 1 [1,0] ADD XACT FRAME oldLogFramePointer: [0x2d9020, line
0x2d9000] newLogFramePointer: [0x2d9020, line 0x2d9000] 1
41181831 1 [1,0] BEGIN XACT: TID 0 XID 10 XACT_LEVEL: 1 PC: [0x137fc,
line 0x137c0]
If we understand Ruby's code correctly, at this point the
TransactionVersionManager will call beginTransaction, which will call
takeCheckpoint, which will execute
m_registers[thread=0][transactionLevel-1 = 0]->takeCheckpoint()
Later this thread decides to abort :
41183071 1 [1,0] SETTING ABORT FLAG ADDR = [0x38002218, line
0x38002200] PC = [0x13880, line 0x13880] NPC = [0x13884, line 0x13880]
41183074 1 [1,0] ISOLATE XACT STORE [0x3b7e3740, line 0x3b7e3740] XACT
LEVEL: 1 PC = [0x13880, line 0x13880]
41183077 2 [2,0] ISOLATE XACT STORE [0x38002200, line 0x38002200] XACT
LEVEL: 1 PC = [0x12dcc, line 0x12dc0]
41183077 2 [2,0] LOGGING STORE: [0x2ae200, line 0x2ae200] 1 PC =
[0x12dcc, line 0x12dc0]
**** Log. proc. num: 2: m_logSize: 1632 m_maxLogSize: 781
41183077 2 [2,0] ADD UNDO LOG ENTRY: [0x2ae200, line 0x2ae200]
[0x38002200, line 0x38002200] LogAddress: [0x3a163c, line 0x3a1600] 1
41183082 2 [2,0] ISOLATE XACT STORE [0x38002200, line 0x38002200] XACT
LEVEL: 1 PC = [0x12dd0, line 0x12dc0]
41183082 2 [2,0] LOGGING STORE: [0x2ae200, line 0x2ae200] 0 PC =
[0x12dd0, line 0x12dc0]
41183091 2 [2,0] ISOLATE XACT LOAD VA: [0xfeffbec0, line 0xfeffbec0] PA:
[0x3c543ec0, line 0x3c543ec0] XACT LEVEL: 1 PC = [0x13244, line 0x13240]
41183091 1 [1,0] TRAP TO HANDLER: TID: 0 TRAP_TYPE 1 TRAP ADDRESS
0x38002218 NUM_RETRIES 0 LOG_SIZE 1360 XACT_LEVEL 1
XACT_LOWEST_CONFLICT_LEVEL 1 Handler Address = [0x1b39c, line
0x1b380] PC = [0x100707c, line 0x1007040]
41183091 1 [1,0] Begin ESCAPE ACTION - ESCAPE DEPTH: 1 PC [0x100707c,
line 0x1007040]
Begin exposed action for thread 0 of proc 1 PC [0x1b39c, line 0x1b380]
41183092 1 [1,0] Begin ESCAPE ACTION - ESCAPE DEPTH: 2 PC [0x1b39c,
line 0x1b380]
which will release isolation accordingly and restart the transaction.
End exposed action for thread 0 of proc 1 PC [0x1b3dc, line 0x1b3c0]
41194048 1 [1,0] END ESCAPE ACTION - ESCAPE DEPTH: 1 PC [0x1b3dc, line
0x1b3c0]
Restart transaction for thread 0 of proc 1
restartTransactionCallback proc = 1 thread = 0 time = 41194049
41194049 1 [1,0] END ESCAPE ACTION - ESCAPE DEPTH: 0 PC [0x1b3e4, line
0x1b3c0]
1 [1,0] TID 0 RESTART TRANSACTION AT XACT LEVEL: 1 LOG_SIZE: 1360
Segmentation fault (SIGSEGV) in main thread
So, according to the debug output, thread 0 will restart its transaction
and the new xact level is 1. So
TransactionInterfaceManager:restartTransactionCallback executes:
getXactVersionManager()->restartTransaction(thread = 0, new_xact_level=1)
which will go and call :
m_registers[0][1]->restoreCheckpoint()
which causes the SEG FAULT, because the original transaction took the
checkpoint for m_registers[0][0]!
It seems too elementary to be a real bug, so I guess we are missing
something in the code.
Kind regards,
Kostis
The segmentation fault seems to occur since ruby does not find the register
checkpoint for the processor that is trying to restart its transaction...
#0 RegisterState::restoreCheckpoint (this=0x0, m_proc=1) at
/home/users/anastop/gems/gems-2.1//common/Vector.h:92
#1 0x00002aaab066bc5d in
TransactionVersionManager::restartTransaction
(this=0xa341340, thread=0, xact_level=1) at
Can get more debug output by setting XACT_DEBUG and XACT_DEBUG_LEVEL?
Jayaram
Konstantinos Nikas wrote:
The code we are running is a transactional workload that we have
developed and we set it up according to the directions provided in the
wiki (bind threads, call set_transaction_registers, etc).
The protocol is MESI_CMP_filter_directory as it is the only one LogTM
can use (at least in the latest version of GEMS).
Kind regards,
Kostis
What benchmark are you running and what protocol?
Polina
On Thu, Feb 5, 2009 at 12:47 PM, Konstantinos Nikas
<knikas@xxxxxxxxxxxxxxxxx <mailto:knikas@xxxxxxxxxxxxxxxxx>> wrote:
Hi all,
we have an 8-core CMP and a transactional workload which only uses 2
threads. We bind the 2 threads to 2 specific processors (avoiding
always
core 0). When we set XACT_LOG_BUFFER_SIZE=2048 everything works fine.
For smaller values (0, 256, 1024) though the simulation fails.
At first we used to get the following warning messages :
45936462 2 [2,0] endEscapeAction WARNING escape depth < 1. Depth = 0
Searching the mailing list we came across a post which suggested
adding
a beginEscapeAction() call into hardwareAbort(). We included this
in our
code and the warning messages went away. However, the simulations
still
fail with a segmentation fault. Gdb reported the following :
#0 RegisterState::restoreCheckpoint (this=0x0, m_proc=1) at
/home/users/anastop/gems/gems-2.1//common/Vector.h:92
#1 0x00002aaab066bc5d in
TransactionVersionManager::restartTransaction
(this=0xa341340, thread=0, xact_level=1) at
/home/users/anastop/gems/gems-2.1//common/Vector.h:109
#2 0x00002aaab0656b89 in
TransactionInterfaceManager::restartTransactionCallback
(this=0xa341230,
thread=0) at log_tm/TransactionInterfaceManager.C:751
#3 0x00002aaaad20fb70 in ?? () from
/home/simics/academic/simics-3.0.31/amd64-linux/lib/sparc-u3.so
#4 0x00002aaaad1aed99 in ?? () from
/home/simics/academic/simics-3.0.31/amd64-linux/lib/sparc-u3.so
#5 0x00002aaaad1aec9a in ?? () from
/home/simics/academic/simics-3.0.31/amd64-linux/lib/sparc-u3.so
#6 0x00002b1b49bc2eaf in SIM_continue () from
/home/simics/academic/simics-3.0.31/amd64-linux/bin/libsimics-common.so
#7 0x00002b1b49b83a9c in ?? () from
/home/simics/academic/simics-3.0.31/amd64-linux/bin/libsimics-common.so
#8 0x00002b1b4aaf739c in PyCFunction_Call (func=0x2aaaaab26560,
arg=0x2aaaac9f6a50, kw=0x0) at /home/packages/python-2.4.2 .......
Any ideas? Or suggestions how to debug more efficiently?
Kind regards,
Kostis
PS: A similar situation occurs when we run the same 2 threads on a
4-core machine. It works fine for XACT_LOG_BUFFER_SIZE=0,256,1024,2048
and fails for size=32!
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx <mailto:Gems-users@xxxxxxxxxxx>
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding
"site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
------------------------------------------------------------------------
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
|