Re: [Gems-users] livelock/deadlock after loading opal.


Date: Tue, 11 Oct 2005 09:26:43 -0600
From: Liqun Cheng <liqun.cheng@xxxxxxxxx>
Subject: Re: [Gems-users] livelock/deadlock after loading opal.
Luke,

I haven't tested some big benchmarks in the splash2, but all small ones seem to suffer from the same problem. Since those big ones take around 8-10 hours with ruby alone, I assumed they might take day with opal and ruby.

Is there a way to make sure opal is making progress? With ruby alone, I can simply ctrl-C, dump stat, and continue. But if I break opal using ctrl-C, "continue" or "opal.sim-step" are not working.

Do you write opal according to MAI, or is it possible to drive ruby with other OoO moduls provided by simics?

thanks
Legion

On 10/11/05, Luke Yen <lyen@xxxxxxxxxxx> wrote:
Legion,

  Setting PSEQ_MAX_UNCHECKED to a large number is not recommended, since
Opal is not 100% functionally correct, and the longer you delay checking
with Simics, the higher the probability of going too far down the wrong
path due to a functional error.

  Is this a problem with all of the SPLASH benchmarks or just specific
ones?

  Luke

On Mon, 10 Oct 2005, Liqun Cheng wrote:

> Luke,
>  I made the changes you suggested, and varied the size of PSEQ_MAX_UNCHECKED
> from 10 to 1000. Opal made some progress, but still ran into infinite loops
> half way through the benchmarks. It printed out a never-ending stream. The
> indepent addresses in each loop seem to be the same as the value of
> PSEQ_MAX_UNCHECKED.
> Could I just set this number to infinite to avoid opal checking with simics?
>  thanks
> Legion
>
>
>
>  On 9/21/05, Luke Yen < lyen@xxxxxxxxxxx> wrote:
> >
> > I am not sure about the problem with your test program, but in regards
> > to the Splash benchmarks not finishing with Opal, I believe I have found a
> > solution (with the help of another GEMS user experiencing a similar
> > problem).
> >
> > We believe that there may be a case when running with Opal that a
> > processor is stalling when it is supposed to be processing an interrupt.
> > I believe this can be solved by increasing the number of instructions
> > committed in Opal before checking with Simics. Here are the steps to try
> > this solution out:
> >
> > 1) Inside opal/system/pstate.C, in simcontinue(), change the following
> > "1" parameter of SIM_time_post_cycle to "numsteps", like this:
> >
> > SIM_time_post_cycle( m_cpu[proc], numsteps, Sim_Sync_Processor,
> > pstate_breakpoint_handler, (void *) this );
> >
> > 2) In the opal config file (in opal/config/config.defaults) change the
> > PSEQ_MAX_UNCHECKED from 1 to a higher value, say 10.
> >
> > 3) Recompile Opal and rerun the benchmarks (using the new Opal config
> > parameter).
> >
> > I believe this should solve your problem. Let us know if it doesn't
> > work.
> >
> > Luke
> >
> > On Wed, 21 Sep 2005, Liqun Cheng wrote:
> >
> > > Hi
> > >
> > > The splash2 benchmarks can't not finish after I load the opal module. I
> > have
> > > also tried a tiny "Hello world" program, but the problem remains. I add
> > two
> > > magic instructions: one before printf, and one after printf. Opal and
> > Ruby
> > > are loaded at the callback of the first magic instruction, everything is
> > > performed accordingly to the quickstart page, but the simics still
> > hasn't
> > > reached the second magic instruction after a whole night on a dual-core
> > Xeon
> > > 2.0G box with 4G memory. So I suspust there is something wrong with
> > opal.
> > >
> > > BTW, everything is fine on simics and simics+ruby, although I did have
> > some
> > > warnings at opal0.init(),
> > > pstate_t: warning: control register #0 == "(null)" has simics name "g0".
> > > pstate_t: warning: control register #1 == "(null)" has simics name "g1".
> > > pstate_t: warning: control register #2 == "(null)" has simics name "g2".
> > > .....................
> > > Is this the problem?
> > >
> > > What's more, the opal continuously reports something like
> > > patch PC: 0x44c7e0 0x409800
> > > patch NPC: 0x533764 0x409804
> > > patch PC: 0x533768 0x409800
> > > patch NPC: 0x44c678 0x409804
> > > patch PC: 0x44c6dc 0x409800
> > > patch NPC: 0x44c4c4 0x409804
> > > [Turbo] Trampoline found at block start.
> > > [Turbo] Trampoline found at block start.
> > > [Turbo] Trampoline found at block start.
> > > [Turbo] Trampoline found at block start.
> > > [Turbo] Trampoline found at block start.
> > > [Turbo] Trampoline found at block start.
> > > [Turbo] Trampoline found at block start.
> > > ............
> > >
> > > Advices are appreciated!
> > > Legion
> > >
> >
>

[← Prev in Thread] Current Thread [Next in Thread→]