Hi Jayaram,
thanks for that info.
Actually, I found that it is not a problem with the size of the
occupy_stack but that the compiler sees that occupy_stack is not used
and eliminates it. This happens for both gcc and Sun Studio's C
compilers for O2 (and higher) optimization levels. So the solution is
either to specify lower levels or to make sure the elimination doesn't
happen. We opted for the second option, so we pass the occupy_stack's
address as input to an external dummy function (which does nothing). In
this case the compiler creates a big stack for tm_trap_handler and
everything seems to work fine (at least the configuration that used to
crash, now finishes correctly :-) ).
Kind regards,
Kostis
Konstis,
Great job on tracking down the bug. There is a dummy 'occupy_stack'
variable in the software
handlers that is supposed to ensure that the software handler runs way
up in the stack and reduce
the chance of it interfering with transactional execution. Unfortunately
in this case the buffer provided
by occupy_stack doesn't seem to be sufficient. Increasing it could work...
Ideally, the handlers will run off their own stack but we haven't
implemented that.
Jayaram
Konstantinos Nikas wrote:
Hi ,
I think I found out now what is happening (although I have no clue if it
is "normal"). Thread 0 starts a transaction and I see the following :
41510444 1 [1,0] ISOLATE XACT STORE [0x1f247ec0, line 0x1f247ec0] XACT
LEVEL: 1 PC = [0x13228, line 0x13200]
41510444 1 [1,0] LOGGING STORE: [0xff0fbec0, line 0xff0fbec0] 1 PC =
[0x13228, line 0x13200]
41510444 1 [1,0] ADD UNDO LOG ENTRY: [0xff0fbec0, line 0xff0fbec0]
[0x1f247ec0, line 0x1f247ec0] LogAddress: [0x2d9174, line 0x2d9140] 1
The transaction moves on and some point it needs to abort. The software
handler kicks in to unroll the log and undoes the log entries which
include the 64 bytes that start at 0xFF0FBEC0.
However, (for some reason), during this invocation of the software
handler %fp + 0x44 = 0xFF0FBECC, which is used to store the value needed
to access the right threadTransContext structure. When the line is
restored in the tm_unroll_log_entry, this value is lost and the software
handler saves the new xact_level in the wrong location.
In the previous invocations of the software handler, %fp+0x44 =
0xFF0FBDFC. This means, that the handler stores the new values of
xact_level and xact_log_size in the right location as the memory line is
not undone and the transaction can be correctly restarted.
Obviously, there shouldn't be any conflicts between the addresses used
by the software handler and those included inside a transaction. I have
followed all the instructions for preparing the workloads and hopefully
I haven't missed anything.
Any workarounds?
Kind regards,
Kostis
|