I've seen similar issues reported in the past, but could not find a
good answer. I would appreciate it if someone could explain the
results below.
I'm running Splash-2 programs on a 16-core RUBY+OPAL simulation
setting. Some of the important Ruby parameters are as follows:
protocol: MOESI_SMP_directory
simics_version: Simics 3.0.31
OPAL_RUBY_MULTIPLIER: 2
L1_CACHE_ASSOC: 2 // 2KB
L1_CACHE_NUM_SETS_BITS: 4
L2_CACHE_ASSOC: 4 // 16KB
L2_CACHE_NUM_SETS_BITS: 6
g_NUM_PROCESSORS: 16
g_NUM_L2_BANKS: 16
g_NUM_MEMORIES: 16
g_NUM_CHIPS: 16
g_NETWORK_TOPOLOGY: FILE_SPECIFIED // 4x4 mesh
g_GARNET_NETWORK: true
g_DETAIL_NETWORK: true
When I grep SuperviserMode access types in the ruby stat files, almost
all Splash-2 programs show excessively high superviser/user access
ratio as you can see.
BARNES/ruby.BARNES.16k.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 266621884 85.2083%
BARNES/ruby.BARNES.16k.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 15695183 82.9426%
BARNES/ruby.BARNES.16k.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 145082595 83.6042%
CHOLESKY/ruby.CHOLESKY.tk15.O.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 12393234 76.1279%
CHOLESKY/ruby.CHOLESKY.tk15.O.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 10213059 88.0104%
CHOLESKY/ruby.CHOLESKY.tk15.O.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 5567772 82.6523%
FFT/ruby.FFT.64k.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 4967929 87.4088%
FFT/ruby.FFT.64k.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 353405 90.6429%
FFT/ruby.FFT.64k.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 2537530 92.726%
FMM/ruby.FMM.16k.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 285541702 76.1547%
FMM/ruby.FMM.16k.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 9056962 74.9888%
FMM/ruby.FMM.16k.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 109348107 93.9731%
LUcon/ruby.FMM.16k.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 285541702 76.1547%
LUcon/ruby.FMM.16k.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 9056962 74.9888%
LUcon/ruby.FMM.16k.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 109348107 93.9731%
OCEAN_CONTIGUOUS/ruby.OCEAN_CONTIGUOUS.258.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 175987256 91.1572%
OCEAN_CONTIGUOUS/ruby.OCEAN_CONTIGUOUS.258.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 12561588 85.7416%
OCEAN_CONTIGUOUS/ruby.OCEAN_CONTIGUOUS.258.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 91914021 89.7628%
RADIOSITY/ruby.RADIOSITY.room.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 208131190 94.4257%
RADIOSITY/ruby.RADIOSITY.room.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 3574778 28.1978%
RADIOSITY/ruby.RADIOSITY.room.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 99881703 97.481%
RADIX/ruby.RADIX.1M.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 8949117 63.3698%
RADIX/ruby.RADIX.1M.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 711276 93.5587%
RADIX/ruby.RADIX.1M.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 4191291 67.1359%
RAYTRACE/ruby.RAYTRACE.car.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 5046186 42.2043%
RAYTRACE/ruby.RAYTRACE.car.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 791814 8.57802%
RAYTRACE/ruby.RAYTRACE.car.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 2442022 31.3737%
VOLREND/ruby.VOLREND.head.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 7130501 9.01353%
VOLREND/ruby.VOLREND.head.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 5619630 37.7412%
VOLREND/ruby.VOLREND.head.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 3889090 71.005%
WATER-SPATIAL/ruby.WATER-SPATIAL.512.16p.16m.stats:
L1D_cache_access_mode_type_SupervisorMode: 10752580 53.7889%
WATER-SPATIAL/ruby.WATER-SPATIAL.512.16p.16m.stats:
L1I_cache_access_mode_type_SupervisorMode: 1149841 30.0857%
WATER-SPATIAL/ruby.WATER-SPATIAL.512.16p.16m.stats:
L2_cache_access_mode_type_SupervisorMode: 5518409 91.074%
Then I turned on PROFILE_HOT_LINES flag, and got the following stat for FFT.
Hot Data Blocks
---------------
Total_entries_block_address: 75145
Total_data_misses_block_address: 2474143
total | load store atomic | user supervisor | sharing | touched-by
block_address | 7.99368 % [0x300c0c0, line 0x300c0c0] 197775 | 197648
79 48 | 0 197775 | 0 | 16
block_address | 7.98301 % [0x1b8920c0, line 0x1b8920c0] 197511 |
197349 115 47 | 0 197511 | 0 | 16
block_address | 7.90132 % [0x1b0620c0, line 0x1b0620c0] 195490 |
195252 126 112 | 0 195490 | 0 | 16
block_address | 7.82303 % [0x1b3c40c0, line 0x1b3c40c0] 193553 |
193443 73 37 | 0 193553 | 0 | 16
block_address | 5.06996 % [0x1b0780c0, line 0x1b0780c0] 125438 |
125354 58 26 | 0 125438 | 0 | 16
block_address | 4.83691 % [0xb6e080, line 0xb6e080] 119672 | 119672 0
0 | 0 119672 | 0 | 16
....
Hot Instructions
----------------
Total_entries_pc_address: 3889
Total_data_misses_pc_address: 2474143
total | load store atomic | user supervisor | sharing | touched-by
pc_address | 30.4843 % [0x1055454, line 0x1055440] 754224 | 754224 0 0
| 0 754224 | 0 | 16
pc_address | 20.6657 % [0x1055458, line 0x1055440] 511298 | 511298 0 0
| 0 511298 | 0 | 16
pc_address | 19.1072 % [0x10554b4, line 0x1055480] 472740 | 472740 0 0
| 0 472740 | 0 | 16
pc_address | 5.30531 % [0x1053f50, line 0x1053f40] 131261 | 131261 0 0
| 0 131261 | 0 | 16
......
It seems like just a few instructions are executed on a few data
blocks by the OS, making the statistics for each benchmark program
almost meaningless. Any idea on what's happening inside the OS?
Thanks,
Ikhwan
|