I'm trying to use transactions in place of locks in the Barnes application
from the splash2 suite, with the logTM code. When I run the converted
application with 2 application threads on a 2-CPU Simics configuration, it
runs ok. But, when I try the same binary on a 4-CPU configuration, with 4
application threads, the Barnes application crashes. The simulated system
or Simics do not crash. At this point, I'm able to save & copy out the
core file from the crash. Examining the core file (offline, on a different
system) does not offer any clues - the stack itself seems to be corrupted.
I use gems1.3 and Simics3.0.15.
I turned on logTM debug messages, but didn't turn out to be very useful.
I'm now trying to capture & analyze cache debug messages as well as logTM.
But, I'd like to make sure that I'm doing the transactional-izing of the
barnes code accurately.
Below is the transaction code from barnes (load.C) that causes the crash:
(the commented out ALOCK and AULOCK calls are the old locks code).
..............
(loadtree() function)
if (*qptr == NULL) {
/* lock the parent cell */
//ALOCK(CellLock->CL, ((cellptr) mynode)->seqnum % MAXLOCK);
BEGIN_TRANSACTION(3);
if (*qptr == NULL) {
le = InitLeaf((cellptr) mynode, ProcessId);
Parent(p) = (nodeptr) le;
Level(p) = l;
ChildNum(p) = le->num_bodies;
ChildNum(le) = kidIndex;
Bodyp(le)[le->num_bodies++] = p;
*qptr = (nodeptr) le;
//printf("Xn 3 Level %d\n", l);
flag = FALSE;
}
//AULOCK(CellLock->CL, ((cellptr) mynode)->seqnum % MAXLOCK);
COMMIT_TRANSACTION(3);
/* unlock the parent cell */
}
if (flag && *qptr && (Type(*qptr) == LEAF)) {
/* reached a "leaf"? */
//ALOCK(CellLock->CL, ((cellptr) mynode)->seqnum % MAXLOCK);
BEGIN_TRANSACTION(4);
/* lock the parent cell */
if (Type(*qptr) == LEAF) { /* still a "leaf"? */
le = (leafptr) *qptr;
if (le->num_bodies == MAX_BODIES_PER_LEAF) {
*qptr = (nodeptr) SubdivideLeaf(le, (cellptr) mynode, l,
ProcessId);
}
else {
Parent(p) = (nodeptr) le;
Level(p) = l;
ChildNum(p) = le->num_bodies;
Bodyp(le)[le->num_bodies++] = p;
flag = FALSE;
}
//printf("Xn 4 Level %d\n", l);
}
//AULOCK(CellLock->CL, ((cellptr) mynode)->seqnum % MAXLOCK);
COMMIT_TRANSACTION(4);
/* unlock the node */
}
............
Here is the definition I have for the transaction macros:
#define NEW_RUBY_MAGIC_CALL( service ) \
__asm__ __volatile__ \
( "sethi %1, %%g0 !magic service\n\t" \
: /* no outputs */ \
: "r" (0), "i" ((service)<<16) \
: "l0" /* clobber register */ \
);
#define BEGIN_TRANSACTION(id) NEW_RUBY_MAGIC_CALL((id + 20))
#define COMMIT_TRANSACTION(id) NEW_RUBY_MAGIC_CALL((id + 40))
............
Hoping someone who's done this before can tell me if there's something
obviously wrong here, in the way I convert to transactions?
Thanks for any help.
- Deepa
|