Re: [Gems-users] L1 hit latency


Date: Sat, 11 Nov 2006 11:11:51 -0600 (CST)
From: Mike Marty <mikem@xxxxxxxxxxx>
Subject: Re: [Gems-users] L1 hit latency
make sure the REMOVE_SINGLE_CYCLE_DCACHE... flag is set to "true"

--Mike


> Hi,
> I am trying to increase the L1 hit latency in the MSI_MOSI_CMP_directory
> protocol. I did what Mike said in the following post:
> "
>  The L1_RESPONSE_LATENCY, like most of the specified latencies, are specific
> to an individual protocol.  Adjusting the L1 hit latency is unfortunately
> not at all straightforward.  By default, he L1 hit latency is always 1
> cycle.  This can be changed by turning off "fast path hits", controlled by
> the REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH flag.  A fast path hit is where the
> Ruby sequencer (ruby/sequencer.C) directly checks the permissions in the L1
> caches before actually issuing a request to Ruby.  If you turn this off, the
> L1 hit latency can be controlled by the SEQUENCER_TO_CONTROLLER_LATENCY
> parameter.
>
> Sorry this is confusing...hopefully we can clean this up in the future.
>
> --Mike
>  "
>
> However I notice nearly no change in Ruby_cycles when I increase
> SEQUENCER_TO_CONTROLLER_LATENCY from 2 to 11.
>
> Followings are my other parameters.( I have unrealistically set some delays
> to 1 to minimize their effect.)
>
>
>
> Ruby Configuration
> ------------------
> protocol: MSI_MOSI_CMP_directory
> simics_version: Simics 3.0.22
> compiled_at: 02:46:55, Nov 11 2006
> RUBY_DEBUG: false
> g_RANDOM_SEED: 1
> g_DEADLOCK_THRESHOLD: 50000
> g_FORWARDING_ENABLED: false
> RANDOMIZATION: false
> g_SYNTHETIC_DRIVER: false
> g_SYNTHETIC_GENERATOR: locks
> g_DETERMINISTIC_DRIVER: false
> g_FILTERING_ENABLED: false
> g_DISTRIBUTED_PERSISTENT_ENABLED: true
> g_DYNAMIC_TIMEOUT_ENABLED: true
> g_RETRY_THRESHOLD: 1
> g_FIXED_TIMEOUT_LATENCY: 300
> g_trace_warmup_length: 1000000
> g_bash_bandwidth_adaptive_threshold: 0.75
> g_tester_length: 0
> g_synthetic_locks: 2048
> g_deterministic_addrs: 1
> g_SpecifiedGenerator: DetermInvGenerator
> g_callback_counter: 0
> g_NUM_COMPLETIONS_BEFORE_PASS: 0
> g_think_time: 5
> g_hold_time: 5
> g_wait_time: 5
> PROTOCOL_DEBUG_TRACE: true
> DEBUG_FILTER_STRING: none
> DEBUG_VERBOSITY_STRING: none
> DEBUG_START_TIME: 0
> DEBUG_OUTPUT_FILENAME: none
> SIMICS_RUBY_MULTIPLIER: 2
> OPAL_RUBY_MULTIPLIER: 2
> TRANSACTION_TRACE_ENABLED: false
> USER_MODE_DATA_ONLY: false
> PROFILE_HOT_LINES: false
> PROFILE_ALL_INSTRUCTIONS: false
> PRINT_INSTRUCTION_TRACE: false
> BLOCK_STC: false
> PROTOCOL_DEBUG_TRACE: true
> DEBUG_FILTER_STRING: none
> DEBUG_VERBOSITY_STRING: none
> DEBUG_START_TIME: 0
> DEBUG_OUTPUT_FILENAME: none
> SIMICS_RUBY_MULTIPLIER: 2
> OPAL_RUBY_MULTIPLIER: 2
> TRANSACTION_TRACE_ENABLED: false
> USER_MODE_DATA_ONLY: false
> PROFILE_HOT_LINES: false
> PROFILE_ALL_INSTRUCTIONS: false
> PRINT_INSTRUCTION_TRACE: false
> BLOCK_STC: false
> PERFECT_MEMORY_SYSTEM: false
> PERFECT_MEMORY_SYSTEM_LATENCY: 0
> DATA_BLOCK: false
> REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH: true
> g_SIMICS: true
> L1_CACHE_ASSOC: 4
> L1_CACHE_NUM_SETS_BITS: 8
> L2_CACHE_ASSOC: 8
> L2_CACHE_NUM_SETS_BITS: 10
> g_MEMORY_SIZE_BYTES: 4294967296
> g_DATA_BLOCK_BYTES: 64
> g_PAGE_SIZE_BYTES: 4096
> g_NUM_PROCESSORS: 8
> g_NUM_L2_BANKS: 4
> g_NUM_MEMORIES: 4
> g_PROCS_PER_CHIP: 8
> g_NUM_CHIPS: 1
> g_NUM_CHIP_BITS: 0
> g_MEMORY_SIZE_BITS: 32
> g_DATA_BLOCK_BITS: 6
> g_PAGE_SIZE_BITS: 12
> g_NUM_PROCESSORS_BITS: 3
> g_PROCS_PER_CHIP_BITS: 3
> g_NUM_L2_BANKS_BITS: 2
> g_NUM_L2_BANKS_PER_CHIP_BITS: 2
> g_NUM_L2_BANKS_PER_CHIP: 4
> g_NUM_MEMORIES_BITS: 2
> g_NUM_MEMORIES_PER_CHIP: 4
> g_MEMORY_MODULE_BITS: 24
> g_MEMORY_MODULE_BLOCKS: 16777216
> MAP_L2BANKS_TO_LOWEST_BITS: true
> DIRECTORY_CACHE_LATENCY: 1
> NULL_LATENCY: 1
> ISSUE_LATENCY: 2
> CACHE_RESPONSE_LATENCY: 1
> L2_RESPONSE_LATENCY: 22
> L1_RESPONSE_LATENCY: 1
> COLLECTOR_REQUEST_LATENCY: 1
> MEMORY_RESPONSE_LATENCY_MINUS_2: 118
> DIRECTORY_LATENCY: 1
> NETWORK_LINK_LATENCY: 1
> COPY_HEAD_LATENCY: 1
> ON_CHIP_LINK_LATENCY: 1
> RECYCLE_LATENCY: 1
> L2_RECYCLE_LATENCY: 1
> TIMER_LATENCY: 10000
> TBE_RESPONSE_LATENCY: 1
> PERIODIC_TIMER_WAKEUPS: true
> LOG_BASE: 4294967296
> RETRY_LATENCY: 100
> RESTART_DELAY: 1000
> PROFILE_EXCEPTIONS: false
> PROFILE_XACT: false
> XACT_NUM_CURRENT: 0
> XACT_LAST_UPDATE: 0
> L1_REQUEST_LATENCY: 1
> L2_REQUEST_LATENCY: 1
> SINGLE_ACCESS_L2_BANKS: true
> SEQUENCER_TO_CONTROLLER_LATENCY: 11
> L1CACHE_TRANSITIONS_PER_RUBY_CYCLE: 32
> L2CACHE_TRANSITIONS_PER_RUBY_CYCLE: 32
> DIRECTORY_TRANSITIONS_PER_RUBY_CYCLE: 32
> COLLECTOR_TRANSITIONS_PER_RUBY_CYCLE: 32
> g_SEQUENCER_OUTSTANDING_REQUESTS: 16
> NUMBER_OF_TBES: 128
> NUMBER_OF_MATES: 4
> NUMBER_OF_L1_TBES: 32
> NUMBER_OF_L2_TBES: 32
> FINITE_BUFFERING: false
> FINITE_BUFFER_SIZE: 3
> PROCESSOR_BUFFER_SIZE: 10
> PROTOCOL_BUFFER_SIZE: 32
> TSO: false
> g_MASK_PREDICTOR_CONFIG: AlwaysBroadcast
> g_TOKEN_REISSUE_THRESHOLD: 2
> g_PERSISTENT_PREDICTOR_CONFIG: None
> g_NETWORK_TOPOLOGY: PT_TO_PT
> g_CACHE_DESIGN: NUCA
> g_endpoint_bandwidth: 1000
> g_adaptive_routing: true
> NUMBER_OF_VIRTUAL_NETWORKS: 5
> FAN_OUT_DEGREE: 4
> g_PRINT_TOPOLOGY: true
> g_NUM_DNUCA_BANK_SETS: 32
> g_NUM_DNUCA_BANK_SET_BITS: 0
> g_NUM_BANKS_IN_BANK_SET_BITS: 0
> g_NUM_BANKS_IN_BANK_SET: 0
> PERFECT_DNUCA_SEARCH: true
> g_NUCA_PREDICTOR_CONFIG: NULL
> ENABLE_MIGRATION: false
> ENABLE_REPLICATION: false
> COLLECTOR_HANDLES_OFF_CHIP_REQUESTS: false
> XACT_LENGTH: 0
> XACT_SIZE: 0
>
> By tracking down a specific trace in tester.exec, I noticed that
> L1_REQUEST_LATENCY and L1_RESPONSE_LATENCY are the delays between L1 and L2
> and have nothing to do with L1 hit latency itself. Is this correct?(I have
> tried increasing these two anyways, but I still didn't notice much
> difference in perfomance)
>
> Am I missing something here?
>
> One more thing. As one on the previous posts noted, I tried to get the L1
> miss rate by commenting out the follwoing line in system/Sequencer.C.
>
>  // if (!REMOVE_SINGLE_CYCLE_DCACHE_FAST_PATH) {
>       g_system_ptr->getProfiler()->addPrimaryStatSample(msg,
> m_chip_ptr->getID());
>  But the reported miss rates are pretty high on Splash2 benchmarks.(more
> than 90% !!) Is it possible that this is the source of my problem in L1 hit
> latency? If so, what should I do and how should I measure the actual miss
> rate?
>
>
>
> Thanks in advance,
>
> Mojtaba
>
[← Prev in Thread] Current Thread [Next in Thread→]