Greetings,
For CMP, the default abisko-common script creates N memory maps, one for every
board (max 4 procs)
simics> phys_mem.map
base object fn offs length
0x0000000000000000 memory 0 0x0 0x20000000
0x0000000800000000 memory 0 0x20000000 0x20000000
0x000007fff07ffff0 simicsfs 0 0x0 0x10
To correct this error, modify it
$SIMICS/targets/serengeti/serengeti-6800-system.include to allocate all memory
in board 0.
simics> phys_mem.map
base object fn offs length
0x0000000000000000 memory 0 0x0 0x40000000
0x000007fff07ffff0 simicsfs 0 0x0 0x10
#####
$board = 0
$cpus_left = $num_cpus
$cpus = (min 4 $cpus_left)
$cpubrd[$board] = ( $create_function num_cpus = $cpus
cpu_frequency = $freq_mhz
memory_megs = ($megs_per_cpu * $num_cpus))
$system.connect ("cpu-slot" + $board) $cpubrd[$board]
$board += 1
$cpus_left -= 4
while $cpus_left > 0 {
#####
Many thanks,
Pau
Quoting pauerola@xxxxxxxxxx:
>
> Our error was produced on
> // no DMA & IO
> if (IS_DEV_MEM_OP(mem_trans->s.ini_type) ||
> IS_OTH_MEM_OP(mem_trans->s.ini_type) ||
> mem_trans->s.physical_address >
> uinteger_t(RubyConfig::memorySizeBytes())
> ) {
> return true;
> }
> due that our simics checkpoints have a 1GB*#cores memory space but we had
> configured Ruby with only 4GB memory. Solaris allocates processes along the
> memory space and some memory accesses were wrong interpreted as DMA...
>
> [possible_cache_miss] PhAddr 87E583D84, LgAddr FFBFFD84, ini_ptr 4 Address
> 87E,583,D84 > Ruby 100,000,000 ? Unh nDup Ruby -> 0 latency
> [possible_cache_miss] PhAddr 7ECECC74, LgAddr 10C74, ini_ptr 4 Address
> 7ECECC74
> > Ruby 100000000 ? FHan nDup Ruby -> 0 latency
> [possible_cache_miss] PhAddr 7ECECC78, LgAddr 10C78, ini_ptr 4 Address
> 7ECECC78
> > Ruby 100000000 ? FHan nDup Ruby -> 0 latency
> [possible_cache_miss] PhAddr 7ECECC7C, LgAddr 10C7C, ini_ptr 4 Address
> 7E,CEC,C7C > Ruby 100,000,000 ? FHan nDup Ruby -> 0 latency
> [possible_cache_miss] PhAddr 7ECECC80, LgAddr 10C80, ini_ptr 4 Address
> 7ECECC80
> > Ruby 100000000 ? FHan nDup Ruby -> 0 latency
> [possible_cache_miss] PhAddr 7ECECC84, LgAddr 10C84, ini_ptr 4 Address
> 7ECECC84
> > Ruby 100000000 ? FHan nDup Ruby -> 0 latency
>
> Many thanks,
> Pau
>
>
> Quoting pauerola@xxxxxxxxxx:
>
> >
> > [ Follows thread
> > https://lists.cs.wisc.edu/archive/gems-users/2008-October/msg00088.shtml ]
> >
> >
> > Greetings,
> >
> > We don't have a solution to the problem described previously yet, but we
> have
> > more information obtained with diferents profiles. Any guidance, idea,...
> > anything will be grateful.
> >
> > We have launched a simple matrix bucle on cores 0, 2, 4 and 7 (with pbind)
> on
> > a
> > 8core CMP. The main assembler code is
> >
> > ld [%fp-40],%f6
> > ld [%fp-8],%l3
> > mov %i2,%l4
> > sll %l4,10,%l2
> > sll %l4,3,%l1
> > sub %l2,%l1,%l2
> > sll %l4,4,%l1
> > sub %l2,%l1,%l6
> > add %l6,%i4,%l0
> > sll %l0,2,%l1
> > ld [%l3+%l1],%f5
> > ld [%fp-12],%l3
> > sll %i4,10,%l2
> > sll %i4,3,%l1
> > sub %l2,%l1,%l2
> > sll %i4,4,%l1
> > sub %l2,%l1,%l0
> > mov %i3,%l2
> > add %l0,%l2,%l0
> > sll %l0,2,%l1
> > ld [%l3+%l1],%f4
> > fmuls %f5,%f4,%f4
> > fadds %f6,%f4,%f4
> > st %f4,[%fp-40]
> > add %i4,10,%i4
> > cmp %i4,1000
> > bl .L189
> > nop
> >
> > that can be seen executed in core 2 and in core 4 with the simics tracer
> >
> > **************
> > *** CORE 2 ***
> > **************
> > inst: [ 645] CPU 2 <v:0x0000000000010c6c> <p:0x0080f590c6c> cb04c011
> ld
> > [%l3 + %l1], %f5
> > data: [ 137] CPU 2 <v:0x00000000003070c8> <p:0x00078bb30c8> FP Read
> 4
> > bytes 0x0
> > inst: [ 649] CPU 2 <v:0x0000000000010c70> <p:0x0080f590c70> e607bff4
> > lduw
> > [%fp + -12], %l3
> > data: [ 140] CPU 2 <v:0x00000000ffbffd84> <p:0x008004a7d84> Nrml Read
> > 4
> > bytes 0x3f17c0
> > inst: [ 653] CPU 2 <v:0x0000000000010c74> <p:0x0080f590c74> a52f200a
> > sll
> > %i4, 10, %l2
> > inst: [ 657] CPU 2 <v:0x0000000000010c78> <p:0x0080f590c78> a32f2003
> > sll
> > %i4, 3, %l1
> > inst: [ 661] CPU 2 <v:0x0000000000010c7c> <p:0x0080f590c7c> a4248011
> > sub
> > %l2, %l1, %l2
> > inst: [ 665] CPU 2 <v:0x0000000000010c80> <p:0x0080f590c80> a32f2004
> > sll
> > %i4, 4, %l1
> > inst: [ 669] CPU 2 <v:0x0000000000010c84> <p:0x0080f590c84> a0248011
> > sub
> > %l2, %l1, %l0
> > inst: [ 673] CPU 2 <v:0x0000000000010c88> <p:0x0080f590c88> a416c000
> or
> > %i3, %g0, %l2
> > inst: [ 677] CPU 2 <v:0x0000000000010c8c> <p:0x0080f590c8c> a0040012
> > add
> > %l0, %l2, %l0
> > inst: [ 681] CPU 2 <v:0x0000000000010c90> <p:0x0080f590c90> a32c2002
> > sll
> > %l0, 2, %l1
> > inst: [ 685] CPU 2 <v:0x0000000000010c94> <p:0x0080f590c94> c904c011
> ld
> > [%l3 + %l1], %f4
> > data: [ 149] CPU 2 <v:0x0000000000787a88> <p:0x0086d833a88> FP Read
> 4
> > bytes 0x0
> > inst: [ 689] CPU 2 <v:0x0000000000010c98> <p:0x0080f590c98> 89a14924
> > fmuls
> > %f5, %f4, %f4
> > inst: [ 693] CPU 2 <v:0x0000000000010c9c> <p:0x0080f590c9c> 89a18824
> > fadds
> > %f6, %f4, %f4
> > inst: [ 697] CPU 2 <v:0x0000000000010ca0> <p:0x0080f590ca0> c927bfd8
> st
> > %f4, [%fp + -40]
> > data: [ 150] CPU 2 <v:0x00000000ffbffd68> <p:0x008004a7d68> FP Write
> 4
> > bytes 0x0
> > inst: [ 701] CPU 2 <v:0x0000000000010ca4> <p:0x0080f590ca4> b807200a
> > add
> > %i4, 10, %i4
> > inst: [ 705] CPU 2 <v:0x0000000000010ca8> <p:0x0080f590ca8> 80a723e8
> > cmp
> > %i4, 1000
> > inst: [ 709] CPU 2 <v:0x0000000000010cac> <p:0x0080f590cac> 06bfffe6
> bl
> > 0x10c44
> > inst: [ 713] CPU 2 <v:0x0000000000010cb0> <p:0x0080f590cb0> 01000000
> > nop
> > inst: [ 717] CPU 2 <v:0x0000000000010c44> <p:0x0080f590c44> cd07bfd8
> ld
> > [%fp + -40], %f6
> > data: [ 152] CPU 2 <v:0x00000000ffbffd68> <p:0x008004a7d68> FP Read
> 4
> > bytes 0x0
> > inst: [ 721] CPU 2 <v:0x0000000000010c48> <p:0x0080f590c48> e607bff8
> > lduw
> > [%fp + -8], %l3
> > data: [ 155] CPU 2 <v:0x00000000ffbffd88> <p:0x008004a7d88> Nrml Read
> > 4
> > bytes 0x20eb8
> > inst: [ 725] CPU 2 <v:0x0000000000010c4c> <p:0x0080f590c4c> a8168000
> or
> > %i2, %g0, %l4
> > inst: [ 730] CPU 2 <v:0x0000000000010c50> <p:0x0080f590c50> a52d200a
> > sll
> > %l4, 10, %l2
> > inst: [ 735] CPU 2 <v:0x0000000000010c54> <p:0x0080f590c54> a32d2003
> > sll
> > %l4, 3, %l1
> > inst: [ 739] CPU 2 <v:0x0000000000010c58> <p:0x0080f590c58> a4248011
> > sub
> > %l2, %l1, %l2
> > inst: [ 743] CPU 2 <v:0x0000000000010c5c> <p:0x0080f590c5c> a32d2004
> > sll
> > %l4, 4, %l1
> > inst: [ 747] CPU 2 <v:0x0000000000010c60> <p:0x0080f590c60> ac248011
> > sub
> > %l2, %l1, %l6
> > inst: [ 751] CPU 2 <v:0x0000000000010c64> <p:0x0080f590c64> a005801c
> > add
> > %l6, %i4, %l0
> > inst: [ 755] CPU 2 <v:0x0000000000010c68> <p:0x0080f590c68> a32c2002
> > sll
> > %l0, 2, %l1
> > inst: [ 759] CPU 2 <v:0x0000000000010c6c> <p:0x0080f590c6c> cb04c011
> ld
> > [%l3 + %l1], %f5
> > data: [ 160] CPU 2 <v:0x00000000003070f0> <p:0x00078bb30f0> FP Read
> 4
> > bytes 0x0
> > inst: [ 763] CPU 2 <v:0x0000000000010c70> <p:0x0080f590c70> e607bff4
> > lduw
> > [%fp + -12], %l3
> > data: [ 162] CPU 2 <v:0x00000000ffbffd84> <p:0x008004a7d84> Nrml Read
> > 4
> > bytes 0x3f17c0
> >
> > **************
> > *** CORE 4 ***
> > **************
> > inst: [ 2] CPU 4 <v:0x0000000000010cb0> <p:0x0007ececcb0> 01000000
> > nop
> > inst: [ 4] CPU 4 <v:0x0000000000010c44> <p:0x0007ececc44> cd07bfd8
> ld
> > [%fp + -40], %f6
> > data: [ 1] CPU 4 <v:0x00000000ffbffd68> <p:0x0087e583d68> FP Read
> 4
> > bytes 0x0
> > inst: [ 6] CPU 4 <v:0x0000000000010c48> <p:0x0007ececc48> e607bff8
> > lduw
> > [%fp + -8], %l3
> > data: [ 2] CPU 4 <v:0x00000000ffbffd88> <p:0x0087e583d88> Nrml Read
> > 4
> > bytes 0x20eb8
> > inst: [ 8] CPU 4 <v:0x0000000000010c4c> <p:0x0007ececc4c> a8168000
> or
> > %i2, %g0, %l4
> > inst: [ 10] CPU 4 <v:0x0000000000010c50> <p:0x0007ececc50> a52d200a
> > sll
> > %l4, 10, %l2
> > inst: [ 12] CPU 4 <v:0x0000000000010c54> <p:0x0007ececc54> a32d2003
> > sll
> > %l4, 3, %l1
> > inst: [ 14] CPU 4 <v:0x0000000000010c58> <p:0x0007ececc58> a4248011
> > sub
> > %l2, %l1, %l2
> > inst: [ 16] CPU 4 <v:0x0000000000010c5c> <p:0x0007ececc5c> a32d2004
> > sll
> > %l4, 4, %l1
> > inst: [ 18] CPU 4 <v:0x0000000000010c60> <p:0x0007ececc60> ac248011
> > sub
> > %l2, %l1, %l6
> > inst: [ 20] CPU 4 <v:0x0000000000010c64> <p:0x0007ececc64> a005801c
> > add
> > %l6, %i4, %l0
> > inst: [ 22] CPU 4 <v:0x0000000000010c68> <p:0x0007ececc68> a32c2002
> > sll
> > %l0, 2, %l1
> > inst: [ 24] CPU 4 <v:0x0000000000010c6c> <p:0x0007ececc6c> cb04c011
> ld
> > [%l3 + %l1], %f5
> > data: [ 4] CPU 4 <v:0x000000000029ccf0> <p:0x0087f424cf0> FP Read
> 4
> > bytes 0x0
> > inst: [ 26] CPU 4 <v:0x0000000000010c70> <p:0x0007ececc70> e607bff4
> > lduw
> > [%fp + -12], %l3
> > data: [ 6] CPU 4 <v:0x00000000ffbffd84> <p:0x0087e583d84> Nrml Read
> > 4
> > bytes 0x3f17c0
> > inst: [ 28] CPU 4 <v:0x0000000000010c74> <p:0x0007ececc74> a52f200a
> > sll
> > %i4, 10, %l2
> > inst: [ 30] CPU 4 <v:0x0000000000010c78> <p:0x0007ececc78> a32f2003
> > sll
> > %i4, 3, %l1
> > inst: [ 32] CPU 4 <v:0x0000000000010c7c> <p:0x0007ececc7c> a4248011
> > sub
> > %l2, %l1, %l2
> > inst: [ 34] CPU 4 <v:0x0000000000010c80> <p:0x0007ececc80> a32f2004
> > sll
> > %i4, 4, %l1
> > inst: [ 36] CPU 4 <v:0x0000000000010c84> <p:0x0007ececc84> a0248011
> > sub
> > %l2, %l1, %l0
> > inst: [ 38] CPU 4 <v:0x0000000000010c88> <p:0x0007ececc88> a416c000
> or
> > %i3, %g0, %l2
> > inst: [ 40] CPU 4 <v:0x0000000000010c8c> <p:0x0007ececc8c> a0040012
> > add
> > %l0, %l2, %l0
> > inst: [ 42] CPU 4 <v:0x0000000000010c90> <p:0x0007ececc90> a32c2002
> > sll
> > %l0, 2, %l1
> > inst: [ 44] CPU 4 <v:0x0000000000010c94> <p:0x0007ececc94> c904c011
> ld
> > [%l3 + %l1], %f4
> > data: [ 9] CPU 4 <v:0x0000000000484ae8> <p:0x0087f60cae8> FP Read
> 4
> > bytes 0x0
> > inst: [ 46] CPU 4 <v:0x0000000000010c98> <p:0x0007ececc98> 89a14924
> > fmuls
> > %f5, %f4, %f4
> > inst: [ 48] CPU 4 <v:0x0000000000010c9c> <p:0x0007ececc9c> 89a18824
> > fadds
> > %f6, %f4, %f4
> > inst: [ 50] CPU 4 <v:0x0000000000010ca0> <p:0x0007ececca0> c927bfd8
> st
> > %f4, [%fp + -40]
> > data: [ 10] CPU 4 <v:0x00000000ffbffd68> <p:0x0087e583d68> FP Write
> 4
> > bytes 0x0
> > inst: [ 52] CPU 4 <v:0x0000000000010ca4> <p:0x0007ececca4> b807200a
> > add
> > %i4, 10, %i4
> > inst: [ 54] CPU 4 <v:0x0000000000010ca8> <p:0x0007ececca8> 80a723e8
> > cmp
> > %i4, 1000
> > inst: [ 56] CPU 4 <v:0x0000000000010cac> <p:0x0007ececcac> 06bfffe6
> bl
> > 0x10c44
> > inst: [ 58] CPU 4 <v:0x0000000000010cb0> <p:0x0007ececcb0> 01000000
> > nop
> > inst: [ 60] CPU 4 <v:0x0000000000010c44> <p:0x0007ececc44> cd07bfd8
> ld
> > [%fp + -40], %f6
> > data: [ 13] CPU 4 <v:0x00000000ffbffd68> <p:0x0087e583d68> FP Read
> 4
> > bytes 0x0
> >
> >
> > These executions seem correct, but when we take a look at ruby debugger we
> > detect that only core 2 does his work. Core 4 executes ifetches but no data
> > have been loaded (adresses 0x87xxxxxxx).
> >
> > **************
> > *** CORE 2 ***
> > **************
> > 323 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb3040,
> > line
> > 0x78bb3040]
> > 373 0 2 L1Cache Use_Timeout M_W>M [0x78bb3040,
> > line
> > 0x78bb3040]
> > 383 0 2 L1Cache Load I>IS [0x78bb3080,
> > line
> > 0x78bb3080]
> > 698 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb3080,
> > line
> > 0x78bb3080]
> > 730 0 2 L1Cache Load I>IS [0x78bb30c0,
> > line
> > 0x78bb30c0]
> > 748 0 2 L1Cache Use_Timeout M_W>M [0x78bb3080,
> > line
> > 0x78bb3080]
> > 1048 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb30c0,
> > line
> > 0x78bb30c0]
> > 1098 0 2 L1Cache Use_Timeout M_W>M [0x78bb30c0,
> > line
> > 0x78bb30c0]
> > 1108 0 2 L1Cache Load I>IS [0x78bb3100,
> > line
> > 0x78bb3100]
> > 1425 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb3100,
> > line
> > 0x78bb3100]
> > 1457 0 2 L1Cache Load I>IS [0x78bb3140,
> > line
> > 0x78bb3140]
> > 1475 0 2 L1Cache Use_Timeout M_W>M [0x78bb3100,
> > line
> > 0x78bb3100]
> > 1771 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb3140,
> > line
> > 0x78bb3140]
> > 1821 0 2 L1Cache Use_Timeout M_W>M [0x78bb3140,
> > line
> > 0x78bb3140]
> > 1831 0 2 L1Cache Load I>IS [0x78bb3180,
> > line
> > 0x78bb3180]
> > 2145 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb3180,
> > line
> > 0x78bb3180]
> > 2170 0 2 L1Cache Store I>IM [0x79357740,
> > line
> > 0x79357740]
> > 2195 0 2 L1Cache Use_Timeout M_W>M [0x78bb3180,
> > line
> > 0x78bb3180]
> > 2487 0 2 L1Cache Exclusive_Data IM>OM [0x79357740,
> > line
> > 0x79357740]
> > 2488 0 2 L1Cache All_acks OM>MM_W [0x79357740,
> > line
> > 0x79357740]
> > 2511 0 2 L1Cache Load I>IS [0x78bb2200,
> > line
> > 0x78bb2200]
> > 2538 0 2 L1Cache Use_Timeout MM_W>MM [0x79357740,
> > line
> > 0x79357740]
> > 2825 0 2 L1Cache Exclusive_Data IS>M_W [0x78bb2200,
> > line
> > 0x78bb2200]
> > 2857 0 2 L1Cache Load I>IS [0x78bb2240,
> > line
> > 0x78bb2240]
> > 2875 0 2 L1Cache Use_Timeout M_W>M [0x78bb2200,
> > line
> > 0x78bb2200]
> > ...
> >
> > **************
> > *** CORE 4 ***
> > **************
> > 339 0 4 L1Cache Exclusive_Data IS>M_W [0x7ececc80,
> > line
> > 0x7ececc80]
> > 345 0 4 L1Cache Ifetch I>IS [0x7ececc40,
> > line
> > 0x7ececc40]
> > 389 0 4 L1Cache Use_Timeout M_W>M [0x7ececc80,
> > line
> > 0x7ececc80]
> > 659 0 4 L1Cache Exclusive_Data IS>M_W [0x7ececc40,
> > line
> > 0x7ececc40]
> > 709 0 4 L1Cache Use_Timeout M_W>M [0x7ececc40,
> > line
> > 0x7ececc40]
> > 722 0 4 L1Cache Ifetch I>IS [0x7ececcc0,
> > line
> > 0x7ececcc0]
> > 1038 0 4 L1Cache Exclusive_Data IS>M_W [0x7ececcc0,
> > line
> > 0x7ececcc0]
> > 1047 0 4 L1Cache Ifetch I>IS [0x7ececc00,
> > line
> > 0x7ececc00]
> > 1088 0 4 L1Cache Use_Timeout M_W>M [0x7ececcc0,
> > line
> > 0x7ececcc0]
> > 1364 0 4 L1Cache Exclusive_Data IS>M_W [0x7ececc00,
> > line
> > 0x7ececc00]
> > 1369 0 4 L1Cache Load I>IS [0x7ececd00,
> > line
> > 0x7ececd00]
> > 1414 0 4 L1Cache Use_Timeout M_W>M [0x7ececc00,
> > line
> > 0x7ececc00]
> > 1683 0 4 L1Cache Exclusive_Data IS>M_W [0x7ececd00,
> > line
> > 0x7ececd00]
> > 1733 0 4 L1Cache Use_Timeout M_W>M [0x7ececd00,
> > line
> > 0x7ececd00]
> >
> >
> > Profiling Sequencer.C we see the correctness of the core 2 and that core 4
> > executes iteratively ifetch.
> >
> > **************
> > *** CORE 2 ***
> > **************
> > Version 2, Address 78BB3118, Hit/Miss h
> > Version 2, Address 78BB3140, Hit/Miss h
> > Version 2, Address 78BB3168, Hit/Miss h
> > Version 2, Address 78BB3190, Hit/Miss h
> > Version 2, Address 79357A90, Hit/Miss M
> > Version 2, Address 78BB2218, Hit/Miss h
> > Version 2, Address 78BB2240, Hit/Miss h
> > Version 2, Address 78BB2268, Hit/Miss h
> > Version 2, Address 78BB2290, Hit/Miss M
> > Version 2, Address 78BB22B8, Hit/Miss h
> > Version 2, Address 78BB22E0, Hit/Miss h
> > Version 2, Address 78BB2308, Hit/Miss h
> > Version 2, Address 78BB2330, Hit/Miss h
> > Version 2, Address 78BB2358, Hit/Miss h
> > Version 2, Address 78BB2380, Hit/Miss h
> > Version 2, Address 78BB23A8, Hit/Miss h
> > Version 2, Address 78BB23D0, Hit/Miss h
> > Version 2, Address 78BB23F8, Hit/Miss h
> > Version 2, Address 78BB2420, Hit/Miss h
> > Version 2, Address 78BB2448, Hit/Miss h
> >
> > **************
> > *** CORE 4 ***
> > **************
> > Version 4, Address 7ECECC44, Hit/Miss h
> > Version 4, Address 7ECECC48, Hit/Miss h
> > Version 4, Address 7ECECC4C, Hit/Miss h
> > Version 4, Address 7ECECC50, Hit/Miss h
> > Version 4, Address 7ECECC54, Hit/Miss h
> > Version 4, Address 7ECECC58, Hit/Miss h
> > Version 4, Address 7ECECC5C, Hit/Miss h
> > Version 4, Address 7ECECC60, Hit/Miss h
> > Version 4, Address 7ECECC64, Hit/Miss h
> > Version 4, Address 7ECECC68, Hit/Miss h
> > Version 4, Address 7ECECC6C, Hit/Miss h
> > Version 4, Address 7ECECC70, Hit/Miss h
> > Version 4, Address 7ECECC74, Hit/Miss h
> > Version 4, Address 7ECECC78, Hit/Miss h
> > Version 4, Address 7ECECC7C, Hit/Miss h
> > Version 4, Address 7ECECC80, Hit/Miss h
> > Version 4, Address 7ECECC84, Hit/Miss h
> > Version 4, Address 7ECECC88, Hit/Miss h
> > Version 4, Address 7ECECC8C, Hit/Miss h
> > Version 4, Address 7ECECC90, Hit/Miss h
> > Version 4, Address 7ECECC94, Hit/Miss h
> > Version 4, Address 7ECECC98, Hit/Miss h
> > Version 4, Address 7ECECC9C, Hit/Miss h
> > Version 4, Address 7ECECCA0, Hit/Miss h
> > Version 4, Address 7ECECCA4, Hit/Miss h
> > Version 4, Address 7ECECCA8, Hit/Miss h
> > Version 4, Address 7ECECCAC, Hit/Miss h
> > Version 4, Address 7ECECCB0, Hit/Miss h
> > Version 4, Address 7ECECC44, Hit/Miss h
> > Version 4, Address 7ECECC48, Hit/Miss h
> > Version 4, Address 7ECECC4C, Hit/Miss h
> > Version 4, Address 7ECECC50, Hit/Miss h
> >
> >
> > Seems like the core 4 load instructions but never executes the code... we
> are
> > really confused, any idea will be valuable.
> > If you would repeat our experiment in your environment you can use the
> simics
> > script below to create our matrix.C code, compile it (modify you compiler
> > path
> > if you need) and create a checkpoint to then execute Ruby.
> >
> > Many thanks,
> > Pau
> >
> > ##############
> > ### matrix ###
> > ##############
> > con0.input "\n"
> > c 10000000
> >
> > con0.input "echo \"#include <stdlib.h>\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \"#include <stdio.h>\" >> matrix.c\n"
> > c 10000000
> >
> > con0.input "echo \"#define N 1000 \" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \"#define step 10 \" >> matrix.c\n"
> > c 10000000
> >
> > con0.input "echo \"int main(int argc, char** argv)\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \"{\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" float *A, *B, *C;\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" register int i, j, k, w;\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" register float s;\" >> matrix.c\n"
> > c 10000000
> >
> > con0.input "echo \" A = (float*) malloc(sizeof(float)*N*N);\" >>
> > matrix.c\n"
> > c 10000000
> > con0.input "echo \" B = (float*) malloc(sizeof(float)*N*N);\" >>
> > matrix.c\n"
> > c 10000000
> > con0.input "echo \" C = (float*) malloc(sizeof(float)*N*N);\" >>
> > matrix.c\n"
> > c 10000000
> >
> > con0.input "echo \" for (w=0; w<100000; w++) {\" >> matrix.c\n"
> > con0.input "echo \" for (i=0; i<N; i++) {\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" for (j=0; j<N; j+=step) {\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" s = (float)0;\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" for (k = 0; k < N; k+=step) {\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" s += ( A[i*N+k] * B[k*N+j] );\" >>
> matrix.c\n"
> > c 10000000
> > con0.input "echo \" }\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" C[i*N+j] = s;\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" }\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" }\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \" }\" >> matrix.c\n"
> > c 10000000
> >
> > con0.input "echo \" return 0;\" >> matrix.c\n"
> > c 10000000
> > con0.input "echo \"}\" >> matrix.c\n"
> > c 10000000
> >
> > ### modify your compiler path
> > ### con0.input "/opt/SUNWspro/prod/bin/cc matrix.c -o matrix.rr\n"
> > con0.input "cc matrix.c -o matrix.rr\n"
> > c 100000000
> >
> > con0.input "cp matrix.rr matrixA.rr\n"
> > c 10000000
> > con0.input "cp matrix.rr matrixB.rr\n"
> > c 10000000
> > con0.input "cp matrix.rr matrixC.rr\n"
> > c 10000000
> > con0.input "cp matrix.rr matrixD.rr\n"
> > c 10000000
> >
> > con0.input "/usr/bin/nice --50 ./matrixA.rr A &\n"
> > c 10000000
> > con0.input "PIDBIND=`pgrep matrixA.rr`\n"
> > c 10000000
> > con0.input "pbind -b 2 $PIDBIND\n"
> > c 10000000
> >
> > con0.input "/usr/bin/nice --50 ./matrixB.rr B &\n"
> > con0.input "PIDBIND=`pgrep matrixB.rr`\n"
> > c 10000000
> > con0.input "pbind -b 4 $PIDBIND\n"
> > c 10000000
> >
> > con0.input "/usr/bin/nice --50 ./matrixC.rr C &\n"
> > con0.input "PIDBIND=`pgrep matrixC.rr`\n"
> > c 10000000
> > con0.input "pbind -b 0 $PIDBIND\n"
> > c 10000000
> >
> > con0.input "/usr/bin/nice --50 ./matrixD.rr D &\n"
> > con0.input "PIDBIND=`pgrep matrixD.rr`\n"
> > c 10000000
> > con0.input "pbind -b 6 $PIDBIND\n"
> > c 10000000
> >
> > run
> >
> >
> > _______________________________________________
> > Gems-users mailing list
> > Gems-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/gems-users
> > Use Google to search the GEMS Users mailing list by adding
> > "site:https://lists.cs.wisc.edu/archive/gems-users/" to your search.
> >
> >
>
>
>
|