Re: [Gems-users] logtm/tourmaline with ultrasparc II


Date: Wed, 06 Dec 2006 11:21:32 -0600
From: Dan Gibson <degibson@xxxxxxxx>
Subject: Re: [Gems-users] logtm/tourmaline with ultrasparc II
RE: LogTM-Solaris10-USIII+

We are now running our own internal version of LogTM (and tourmaline) on Solaris 10-based checkpoints -- I know of no special effort to port LogTM to Solaris 10. I suspect everything would work just fine.

RE: OS-Aborts
The trick is that Solaris (and probably Linux) doesn't like to be aborted while in the OS -- its nontrivial to rollback OS events, like TLB actions. Notably, disabling interrupts does not disable traps (eg TLB and Register Window events), so aborts in the OS are rare, but can occur.

RE: Register save/restore
That said, we discovered which registers to save/restore by trial and error. When the save-restore yielded correct results for abort intensive benchmarks, we moved on to other parts of the implementation. The implementation of the save/restore in the simulator is less important than the overall concept of a register checkpoint, so we saw no need to make an elegant, suits-all-needs solution. You will probably have to repeat the trial-and-error process with your own abort-intensive microbenchmark for US-II processors.

The original code is OK for US-III+, despite the unmapped registers, since (obviously) the simulation does not crash due to save/restore... hence saving/restoring nonexistent registers seems to have no ill effects.

Running with Ruby drastically distorts tick and tick_cmpr anyway, so we do not make any special efforts on their behalf.

I would expect a much more useful resource than the SPARCv9 manual would be the UltraSPARC-II manual, which you can probably find somewhere on Sun's website. The US-II is a superset of SPARC, so you'll want to have a look at the right documentation. You might also find the Solaris Internals book useful, or the Solaris soruce code at opensolaris.org.

I also would not be surprised at all if US-II processors broke Opal extensively, though I have no direct experience in the matter. Bagel users: Does Opal run on Bagel??

Regards,
Dan Gibson

郭锐 wrote:
Thanks very much for your reply.

Solaris is now freely available from Sun for research purposes. You can
make your own USIII+/Solaris checkpoints if you want from the ISOs of
Solaris and the scripts that come with Simics. However... if you're
determined to use your Suse7.3/USII checkpoints, read on.
I'm far more familiar with Linux than Solaris. In fact, I never got my hands

on it before. So it's my preferred solution to make Suse7.3/USII checkpoints

work, if I can.
True. The general idea is to perform a "context save" on transaction
begin and a "context restore" on transaction abort. Notably, you
If your transaction model is to execute the transaction speculatively, just as the case of TCC, I would say that a complete save/restore should be needed. But it's of course not the case in LogTM, where every
modification
is steady (stands even with an abort).

shouldn't do anything with various control registers... but there is a
tricky issue that arises during a context restore to a processor that is
currently exeucting within the OS.
Do you mean that tricky things only occur when an abort happens in kernel
mode?
But remember that you have turned interrupts off, so one can only jump in to

kernel by making system calls. And I'm not sure I didn't make any syscall in
 transaction region. How did the kernel panic come out?
Here's the original code in "RegisterStateWindowed.C" that hardcoded the
control register numbers:
 for(i=0; i < 126 ; i++){
   if((i <= 31) ||
      (i == 39 || (i == 43)) ||  // tick/stick
      (i == 45) ||               // pstate
      (i >= 53 && i <= 57) ||    // invalid
      (i >= 63 && i <= 67) ||    // invalid
      (i >= 73 && i <= 77) ||    // invalid
      (i >= 83 && i <= 87) ||    // invalid
      //(i >= 91 && i <= 95) ||    // window state
      (i >= 100 && i <= 110) || // interrupt status (just added)
      (i >= 111 && i <= 119))   // interrupt address
     {
       continue;
     }
   m_controlRegisterNumbers.insertAtBottom(i);
 }

After comparing the register numbering for the two processors in Simics,
I
modified it to this:
 for(i=0; i < 111 ; i++){
   if((i <= 31) ||
      (i == 39) ||               // tick
      (i == 43) ||               // pstate
      (i >= 51 && i <= 55) ||    // invalid
      (i >= 61 && i <= 65) ||    // invalid
      (i >= 71 && i <= 75) ||    // invalid
      (i >= 81 && i <= 85) ||    // invalid
      (i >= 97 && i <= 102) || // interrupt status (just added)
      (i >= 103 && i <= 106))   // interrupt address
     {
       continue;
     }
   m_controlRegisterNumbers.insertAtBottom(i);
 }

I'm not quite sure of this modification, because I'm a stranger to SPARC.
It
does work sometimes, but it can cause [[kernel panic]] or deadlock in my
benchmark too. Can anybody check it for me?


Hey Users! Anyone out there using USII's and LogTM or Tourmaline?

If you're unsure about which registers to save/restore on USII, you can
always take an educated guess. That is, unfortunately, probably the best
way to figure it out -- this isn't *exactly* the same as a context
switch, after all.
I have no idea in what situation a control register be used. I made the
modification base on a comparison between the register numbering of the two processors,
and the original code.
But I just realized that may be even the original code is not reliable --
the register using pattern should be OS depending. What a bad news!

Should I leave along all the privileged registers, consider that transactions are currently user-space only and interrupt disabled? How did you choose the set of control registers to save/restore? By guess?
What's the relation between tick and tick_cmpr? The latter is undocumented
in SPARCV9 manual.
And, these are not documented too:
softint			94
upa_config		95
ecache_error_enable		96
asynchronous_fault_status	97
asynchronous_fault_address	98
out_intr_data0			99
out_intr_data1			100
out_intr_data2			101
intr_dispatch_status		102
in_intr_data0			103
in_intr_data1			104
in_intr_data2			105
intr_receive			106
serial_id			107
pic			108
pcr			109
mid			110

PS: I wonder why the original code could work, it tries to read the
registers 124 and 125, which doesn't exists even in UltraSPARC III+.

Thats a good question indeed. The fact that Simics doesn't kill
execution with an error suggests there might be something in slots 124
and 125 after all... though I cannot imagine what. What does
SIM_register_name return for those values?
It should be a '(NULL)' and raise exception number 6, based on my experiment
on USII.

One irrelevant question: Will Opal suffer from the similar problem? I don't understand the retirement code clearly, especially those handling traps.
And one more: I think the assertion failure bug (the two in
SimicsProcessor::hitCallBack and one in isReady(request)) has been fixed according to the release note of
GEMS1.3.
Why should I run into it? Sorry for this question before I investigate it
myself.
G.R.

_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"; to your search.


--
http://www.cs.wisc.edu/~gibson [esc]:wq!

[← Prev in Thread] Current Thread [Next in Thread→]