Actually, the lack of instruction timing is just a corollary of the
lack of the stalling model. The "stalling" model is, as far as we can
tell without a lot of direct input from Virtutech, a significantly
different operating mode withing Simics. Obviously, Simics is still
"stallable" without it, but probably lacks some features... after all,
we *are* hooking into a codebase that we do not actually control.
Thomas De Schampheleire wrote:
So, the difference of about one hour and a half, would be solely due
to the instruction cache?
The conclusion stays the same, right: the results without -stall are
not correct or not useful.
Would compiling ruby with optimization flags have a lot of influence?
Which improvement factor can I approximately expect?
Ruby should compile with architecture and generic optimizations
in-place already. We've done performance profiling, and a lot of the
execution time (90%+ in many cases) actually occurs within the Simics
executable, not within Ruby. We conclude that Simics behaves
differently (slower certainly, perhaps more conservatively?) when it
interfaces with a stalling memory timer. You can test this observation
yourself with the PERFECT_MEMORY_SYSTEM flags, which effectively turn
Ruby into a near-zero-execution-time model.
Regards,
Dan
Thanks, Thomas
On 3/22/07, Dan Gibson <degibson@xxxxxxxx> wrote:
Aha! The difference between Ruby/Stall and Ruby/NoStall becomes clear:
Ruby/Stall:
L1I_cache cache stats:
L1I_cache_total_misses: 234
L1I_cache_total_demand_misses: 234
L1I_cache_total_prefetches: 0
L1I_cache_total_sw_prefetches: 0
L1I_cache_total_hw_prefetches: 0
L1I_cache_misses_per_transaction: 234
L1I_cache_misses_per_instruction: 9.00187e-05
L1I_cache_instructions_per_misses: 11108.8
Ruby/NoStall:
L1I_cache cache stats:
L1I_cache_total_misses: 0
L1I_cache_total_demand_misses: 0
L1I_cache_total_prefetches: 0
L1I_cache_total_sw_prefetches: 0
L1I_cache_total_hw_prefetches: 0
L1I_cache_misses_per_transaction: 0
L1I_cache_misses_per_instruction: 0
L1I_cache_instructions_per_misses: NaN
It would seem that failing to specify -stall causes the instruction
hierarchy to become unstallable (aka perfect). This behaviour is
actually somewhat like the x86 target...
Regards,
Dan
Thomas De Schampheleire wrote:
Hey,
I already replied, but the message is awaiting moderator approval
because it had a large attachment. I already sent you the short stats,
which I think are most important.
----------
Hey,
First of all sorry for my last post which seemed to include the rest
of the digest as well.
I ran a simulation for 750000 cycles, using a fixed seed,
cpu-switch-time of 1, and both with as without -stall flag.
I attached the complete log. The log includes the simics commands from
my script, I think you will have no problems in reading it. It first
has the nostall experiment, including dump-stats and dump-short-stats,
followed by the stall experiment, again with dump-stats and then
dump-short-stats.
The ruby_cycles for both experiments are the same, however the virtual
time clearly is different. I am not yet familiar with the stats ruby
outputs, but I hope this is of any use to you.
Thanks, Thomas
---
SHORT Profiler Stats (for "fast" simulation)
--------------
Virtual_time_in_seconds: 16.95
Virtual_time_in_minutes: 0.2825
Virtual_time_in_hours: 0.00470833
Virtual_time_in_days: 0.00470833
Ruby_current_time: 375000
Ruby_start_time: 1
Ruby_cycles: 374999
Total_misses: 419
total_misses: 419 [ 338 27 27 27 ]
user_misses: 0 [ 0 0 0 0 ]
supervisor_misses: 419 [ 338 27 27 27 ]
instruction_executed: 2653888 [ 470693 727765 727780 727650 ]
cycles_per_instruction: 0.565207 [ 0.796696 0.515275 0.515264 0.515356 ]
misses_per_thousand_instructions: 0.157882 [ 0.71809 0.0370999
0.0370991 0.0371058 ]
transactions_started: 0 [ 0 0 0 0 ]
transactions_ended: 0 [ 0 0 0 0 ]
instructions_per_transaction: 0 [ 0 0 0 0 ]
cycles_per_transaction: 0 [ 0 0 0 0 ]
misses_per_transaction: 0 [ 0 0 0 0 ]
L1D_cache cache stats:
L1D_cache_total_misses: 419
L1D_cache_total_demand_misses: 419
L1D_cache_total_prefetches: 0
L1D_cache_total_sw_prefetches: 0
L1D_cache_total_hw_prefetches: 0
L1D_cache_misses_per_transaction: 419
L1D_cache_misses_per_instruction: 0.000157882
L1D_cache_instructions_per_misses: 6333.85
L1D_cache_request_type_LD: 80.1909%
L1D_cache_request_type_ST: 19.3317%
L1D_cache_request_type_ATOMIC: 0.477327%
L1D_cache_access_mode_type_SupervisorMode: 419 100%
L1D_cache_request_size: [binsize: log2 max: 8 count: 419 average:
6.05489 | standard deviation: 2.55747 | 0 19 55 88 257 ]
L1I_cache cache stats:
L1I_cache_total_misses: 0
L1I_cache_total_demand_misses: 0
L1I_cache_total_prefetches: 0
L1I_cache_total_sw_prefetches: 0
L1I_cache_total_hw_prefetches: 0
L1I_cache_misses_per_transaction: 0
L1I_cache_misses_per_instruction: 0
L1I_cache_instructions_per_misses: NaN
L1I_cache_request_size: [binsize: log2 max: 0 count: 0 average: NaN
|standard deviation: NaN | 0 ]
L2_cache cache stats:
L2_cache_total_misses: 419
L2_cache_total_demand_misses: 419
L2_cache_total_prefetches: 0
L2_cache_total_sw_prefetches: 0
L2_cache_total_hw_prefetches: 0
L2_cache_misses_per_transaction: 419
L2_cache_misses_per_instruction: 0.000157882
L2_cache_instructions_per_misses: 6333.85
L2_cache_request_type_LD: 80.1909%
L2_cache_request_type_ST: 19.3317%
L2_cache_request_type_ATOMIC: 0.477327%
L2_cache_access_mode_type_SupervisorMode: 419 100%
L2_cache_request_size: [binsize: log2 max: 8 count: 419 average:
6.05489 | standard deviation: 2.55747 | 0 19 55 88 257 ]
---------
SHORT Profiler Stats (for stalled simulation)
--------------
Virtual_time_in_seconds: 67.87
Virtual_time_in_minutes: 1.13117
Virtual_time_in_hours: 0.0188528
Virtual_time_in_days: 0.0188528
Ruby_current_time: 375000
Ruby_start_time: 1
Ruby_cycles: 374999
Total_misses: 484
total_misses: 484 [ 355 43 43 43 ]
user_misses: 0 [ 0 0 0 0 ]
supervisor_misses: 484 [ 355 43 43 43 ]
instruction_executed: 2599462 [ 456314 714418 714451 714279 ]
cycles_per_instruction: 0.577041 [ 0.8218 0.524901 0.524877 0.525004 ]
misses_per_thousand_instructions: 0.186192 [ 0.777973 0.0601889
0.0601861 0.0602006 ]
transactions_started: 0 [ 0 0 0 0 ]
transactions_ended: 0 [ 0 0 0 0 ]
instructions_per_transaction: 0 [ 0 0 0 0 ]
cycles_per_transaction: 0 [ 0 0 0 0 ]
misses_per_transaction: 0 [ 0 0 0 0 ]
L1D_cache cache stats:
L1D_cache_total_misses: 250
L1D_cache_total_demand_misses: 250
L1D_cache_total_prefetches: 0
L1D_cache_total_sw_prefetches: 0
L1D_cache_total_hw_prefetches: 0
L1D_cache_misses_per_transaction: 250
L1D_cache_misses_per_instruction: 9.61739e-05
L1D_cache_instructions_per_misses: 10397.8
L1D_cache_request_type_LD: 67.2%
L1D_cache_request_type_ST: 32%
L1D_cache_request_type_ATOMIC: 0.8%
L1D_cache_access_mode_type_SupervisorMode: 250 100%
L1D_cache_request_size: [binsize: log2 max: 8 count: 250 average:
5.62 | standard deviation: 2.85035 | 0 19 55 33 143 ]
L1I_cache cache stats:
L1I_cache_total_misses: 234
L1I_cache_total_demand_misses: 234
L1I_cache_total_prefetches: 0
L1I_cache_total_sw_prefetches: 0
L1I_cache_total_hw_prefetches: 0
L1I_cache_misses_per_transaction: 234
L1I_cache_misses_per_instruction: 9.00187e-05
L1I_cache_instructions_per_misses: 11108.8
L1I_cache_request_type_IFETCH: 100%
L1I_cache_access_mode_type_SupervisorMode: 234 100%
L1I_cache_request_size: [binsize: log2 max: 4 count: 234 average:
4 | standard deviation: 0 | 0 0 0 234 ]
L2_cache cache stats:
L2_cache_total_misses: 484
L2_cache_total_demand_misses: 484
L2_cache_total_prefetches: 0
L2_cache_total_sw_prefetches: 0
L2_cache_total_hw_prefetches: 0
L2_cache_misses_per_transaction: 484
L2_cache_misses_per_instruction: 0.000186193
L2_cache_instructions_per_misses: 5370.78
L2_cache_request_type_LD: 34.7107%
L2_cache_request_type_ST: 16.5289%
L2_cache_request_type_ATOMIC: 0.413223%
L2_cache_request_type_IFETCH: 48.3471%
L2_cache_access_mode_type_SupervisorMode: 484 100%
L2_cache_request_size: [binsize: log2 max: 8 count: 484 average:
4.83678 | standard deviation: 2.20154 | 0 19 55 267 143 ]
On 3/15/07, Dan Gibson <degibson@xxxxxxxx> wrote:
Good work. I look forward to the remainder of the data.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
--
http://www.cs.wisc.edu/~gibson [esc]:wq!
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
|
|