Re: [Gems-users] livelock/deadlock after loading opal.


Date: Mon, 10 Oct 2005 22:19:59 -0600
From: Liqun Cheng <liqun.cheng@xxxxxxxxx>
Subject: Re: [Gems-users] livelock/deadlock after loading opal.
Luke,
 
I made the changes you suggested, and varied the size of PSEQ_MAX_UNCHECKED from 10 to 1000. Opal made some progress,  but still ran into infinite loops half way through the benchmarks. It printed out a never-ending stream. The indepent addresses in each loop seem to be the same as the value of PSEQ_MAX_UNCHECKED.
Could I just set this number to infinite to avoid opal checking with simics?
 
thanks
Legion

 

 
On 9/21/05, Luke Yen <lyen@xxxxxxxxxxx> wrote:
  I am not sure about the problem with your test program, but in regards
to the Splash benchmarks not finishing with Opal, I believe I have found a
solution (with the help of another GEMS user experiencing a similar
problem).

We believe that there may be a case when running with Opal that a
processor is stalling when it is supposed to be processing an interrupt.
I believe this can be solved by increasing the number of instructions
committed in Opal before checking with Simics.  Here are the steps to try
this solution out:

1)  Inside opal/system/pstate.C, in simcontinue(), change the following
"1" parameter of SIM_time_post_cycle to "numsteps", like this:

   SIM_time_post_cycle( m_cpu[proc], numsteps, Sim_Sync_Processor,
                      pstate_breakpoint_handler, (void *) this );

2) In the opal config file (in opal/config/config.defaults) change the
PSEQ_MAX_UNCHECKED from 1 to a higher value, say 10.

3) Recompile Opal and rerun the benchmarks (using the new Opal config
parameter).

  I believe this should solve your problem.  Let us know if it doesn't
work.

Luke

On Wed, 21 Sep 2005, Liqun Cheng wrote:

> Hi
>
> The splash2 benchmarks can't not finish after I load the opal module. I have
> also tried a tiny "Hello world" program, but the problem remains. I add two
> magic instructions: one before printf, and one after printf. Opal and Ruby
> are loaded at the callback of the first magic instruction, everything is
> performed accordingly to the quickstart page, but the simics still hasn't
> reached the second magic instruction after a whole night on a dual-core Xeon
> 2.0G box with 4G memory. So I suspust there is something wrong with opal.
>
> BTW, everything is fine on simics and simics+ruby, although I did have some
> warnings at opal0.init(),
> pstate_t: warning: control register #0 == "(null)" has simics name "g0".
> pstate_t: warning: control register #1 == "(null)" has simics name "g1".
> pstate_t: warning: control register #2 == "(null)" has simics name "g2".
> .....................
> Is this the problem?
>
> What's more, the opal continuously reports something like
> patch PC: 0x44c7e0 0x409800
> patch NPC: 0x533764 0x409804
> patch PC: 0x533768 0x409800
> patch NPC: 0x44c678 0x409804
> patch PC: 0x44c6dc 0x409800
> patch NPC: 0x44c4c4 0x409804
> [Turbo] Trampoline found at block start.
> [Turbo] Trampoline found at block start.
> [Turbo] Trampoline found at block start.
> [Turbo] Trampoline found at block start.
> [Turbo] Trampoline found at block start.
> [Turbo] Trampoline found at block start.
> [Turbo] Trampoline found at block start.
> ............
>
> Advices are appreciated!
> Legion
>

[← Prev in Thread] Current Thread [Next in Thread→]