Re: [Gems-users] Change in cpu frequency


Date: Thu, 16 Aug 2007 19:54:00 -0500
From: "Niket Agarwal (niketa@xxxxxxxxxxxxx)" <niketa@xxxxxxxxxxxxx>
Subject: Re: [Gems-users] Change in cpu frequency
 Since the topology is 2D Torus and the buffer size = 2, I think the deadlock is occurring within a virtual network.
I suspect you are not doing anything with the weights to ensure that the routing is e-cube. With the default routing, cyclic dependencies can occur with finite buffers and hence the deadlock.

I also suspect that with infinite buffering a deadlock might or might not occur. Because of the cyclic dependency, messages might get stuck and may or may not get out of the loop. You might get away with no deadlock and prolonged execution time.

I would suggest trying e-cube routing.

- Niket



> I run the ocean checkpoint today, and got the deadlock problem again. The following is what Ruby complains:
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:105: Possible Deadlock detected
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:105: Possible Deadlock detected
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:106: request is [CacheMsg: Address=[0x79460d1, line 0x79460c0] Type=LD ProgramCounter=[0x10487e0, line 0x10487c0] AccessMode=SupervisorMode Size=1 Prefetch=No Version=0 Aborted=0 Time=116385848 ]
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:106: request is [CacheMsg: Address=[0x79460d1, line 0x79460c0] Type=LD ProgramCounter=[0x10487e0, line 0x10487c0] AccessMode=SupervisorMode Size=1 Prefetch=No Version=0 Aborted=0 Time=116385848 ]
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:107: m_chip_ptr->getID() is 13
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:107: m_chip_ptr->getID() is 13
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:108: m_version is 0
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:108: m_version is 0
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:109: keys.size() is 1
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:109: keys.size() is 1
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:110: current_time is 116439510
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:110: current_time is 116439510
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:111: request.getTime() is 116385848
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:111: request.getTime() is 116385848
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:112: current_time - request.getTime() is 53662
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:112: current_time - request.getTime () is 53662
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:113: *m_readRequestTable_ptr is [ [0x79460c0, line 0x79460c0]=[CacheMsg: Address=[0x79460d1, line 0x79460c0] Type=LD ProgramCounter=[0x10487e0, line 0x10487c0] AccessMode=SupervisorMode Size=1 Prefetch=No Version=0 Aborted=0 Time=116385848 ] ]
> Warning: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:113: *m_readRequestTable_ptr is [ [0x79460c0, line 0x79460c0]=[CacheMsg: Address=[0x79460d1, line 0x79460c0] Type=LD ProgramCounter=[0x10487e0, line 0x10487c0] AccessMode=SupervisorMode Size=1 Prefetch=No Version=0 Aborted=0 Time=116385848 ] ]
> Fatal Error: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:114: Aborting
> Fatal Error: in fn virtual void Sequencer::wakeup() in system/Sequencer.C:114: Aborting
> ***  Simics getting shaky, switching to 'safe' mode.
> ***  Simics (main thread) received an abort signal, probably an assertion.
>
> I am sorry I cannot provide any detailed information. I configured Ruby like this:
> Protocol: MOESI_CMP_directory
> Topology: TORUS_2D. I set PROCS_PER_CHIP 1 but reduced the inter-chip latency to make all the 16 processors communicate like on a single chip.
> FINITE_BUFFER_SIZE=2. I also set the on-chip link bw as 64B/cycle, off-chip as 10B/cycle.
> Again, I also added some counters to record the message delay cycles.
>
> I hope you can give me some hint about this. Thank you !
>
> Lide
>
>
> On 8/16/07, Dan Gibson <degibson@xxxxxxxx> wrote: I would agree with your evaluation that the 75 MHz checkpoints are OK.
>
> As for the deadlock, all I can say is that Ruby is a very complicated
> piece of code, and minor changes can have major impacts... I know that
> unmodified Ruby runs Apache/Jbb/OLTP/Zeus for a LONG time without any
> deadlock concerns here at Wisconsin. Which protocol are you using, by
> the way? Protocol bugs have been discovered like this in the past...
>
> Regards,
> Dan
>
> Lide Duan wrote:
> > Thank you both for the replies !
> >
> > So usually we don't care much about the actual cpu frequency in Simics
> > as long as the related parameters in GEMS are set to reasonable values
> > which are measured in cycles, right? If so, I would like to use the
> > original checkpoints instead of the 5ghz ones.
> >
> > About the dead lock problem, the jbb and barnes workloads ( I mean the
> > original checkpoints at 75mhz ) were running fine with my modified
> > ruby, but I did encounter a possible dead lock when running ocean
> > checkpoint. I don't know if it was due to my modification to ruby, but
> > I did nothing but added some counters to measure something, which
> > should not affect the behaviors of ruby. I will keep looking, and post
> > the problem if it comes again, but thank you anyway.
> >
> > Lide
> >
> > On 8/15/07, *Niket Agarwal (niketa@xxxxxxxxxxxxx
> > <mailto:niketa@xxxxxxxxxxxxx>)* < niketa@xxxxxxxxxxxxx
> > <mailto:niketa@xxxxxxxxxxxxx>> wrote:
> >
> >     What is the topology that you are using and what is the finite
> >     buffer size you have ? I might help you out with the deadlock.
> >
> >     - Niket
> >
> >     > GEMS is pretty much divorced from what Simics thinks is "one
> >     cycle". Simics generally uses an artificially low CPU speed to
> >     effectively speed up I/O -- recall that Simics models a CPI 1.0
> >     processor anyway when no timing models are installed. Ruby, on the
> >     other hand, maintains its own notion of time, namely every
> >     [SIMICS_RUBY_MULTIPLIER] simics cycles is equivalent to one Simics
> >     cycle, modulo some transient behaviors. Thus, a 75MHz and a 5GHz
> >     processor both observe the same *relative* memory system latencies
> >     with Ruby, when measured in processor cycles.
> >     > To put it another way, changing Simics's MHz doesn't affect Ruby.
> >     >
> >     >
> >     > That said, let me address your other issues (long simulation
> >     time and apparent deadlock). The long simulation time is
> >     undoubtedly an artifact of the changing of the relative I/O speed.
> >     In order to simulate a "disk access", Simics has to run about 100x
> >     longer when simulating a 5 GHz processor than it does w/ 75 MHz.
> >     Hence, we generally opt for a "slow" CPU speed to improve
> >     simulator performance due to compulsory I/O behavior. Since all
> >     latencies are relative, changing Simics's CPU speed only affects
> >     I/O, not memory latency. As for the deadlock concern, I don't know
> >     what might be causing that. Whereabouts in Ruby's code is the
> >     error arising?
> >     > Regards,
> >     > Dan
> >     >
> >     > Lide Duan wrote:
> >     >
> >     >
> >     >     I have got some problems when I was trying to change the cpu
> >     frequency of my checkpoints. Basically what I have done is: I
> >     added some counters to the ruby network code to record the delay
> >     cycles of each kind of messages, and dumped the results every some
> >     number of ruby cycles. I run the modified ruby on some
> >     checkpoints, e.g. SPECjbb2005, barnes, ocean, etc. The results
> >     seemed to be reasonable: some of the messages encountered some
> >     delays, and the delay cycles would increase if I reduce the
> >     bandwidth of the links or the finite buffer size. However, these
> >     checkpoints were created with cpu frequency at 75MHZ which was too
> >     low for the modern machines. I recreated some checkpoints with cpu
> >     frequency at 5GHZ, and supposed that the delay cycles would be
> >     much larger than those checkpoints at 75MHZ due to the much higher
> >     frequency. However, strange things happened. For the jbb_5ghz
> >     checkpoint, I have run it for several days with ruby loaded, but
> >     it never reached the first magi
> >     c instruction which was used to end the ruby warm up phase and
> >     start the real workload. For barnes_5ghz, I got the warm up
> >     checkpoint, but the simulation results from that were quite
> >     strange: the delay cycles of the messages are almost zero, and the
> >     simulation also stopped soon with a "Possible Deadlock detected"
> >     complain from GEMS. I am pretty confused because I didn't make
> >     more modification to ruby after changing the checkpoints, the only
> >     difference is the cpu frequency of the checkpoints. So I am
> >     wondering is there any restriction on the cpu frequency that GEMS
> >     can support? I didn't find anything related in ruby configuration.
> >     How does GEMS deal with the cpu frequency? Also, what's the
> >     reasonable value of cpu frequency for current research? Is 75MHZ
> >     too low or 5GHZ too high?
> >     >
> >     >     Thanks,
> >     >     Lide
> >     >
> >     ------------------------------------------------------------------------
> >     >
> >     >     _______________________________________________
> >     >     Gems-users mailing list
> >     >     Gems-users@xxxxxxxxxxx
> >     >     https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> >     >     Use Google to search the GEMS Users mailing list by adding
> >     "site: https://lists.cs.wisc.edu/archive/gems-users/ "; to your search.
> >     >
> >     >
> >     >
> >     > --
> >     > http://www.cs.wisc.edu/~gibson
> >     < http://www.cs.wisc.edu/%7Egibson> [esc]:wq!
> >     >
> >     >
> >
> >     _______________________________________________
> >     Gems-users mailing list
> >     Gems-users@xxxxxxxxxxx <mailto:Gems-users@xxxxxxxxxxx>
> >     https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> >     Use Google to search the GEMS Users mailing list by adding
> >     "site:https://lists.cs.wisc.edu/archive/gems-users/
> >     < https://lists.cs.wisc.edu/archive/gems-users/>" to your search.
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Gems-users mailing list
> > Gems-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> > Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/ " to your search.
> >
> >
>
> --
> http://www.cs.wisc.edu/~gibson [esc]:wq!
>
> _______________________________________________
> Gems-users mailing list
> Gems-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> Use Google to search the GEMS Users mailing list by adding "site: https://lists.cs.wisc.edu/archive/gems-users/"; to your search.

[← Prev in Thread] Current Thread [Next in Thread→]