Hello,
I'm simulating a simple memory benchmark with 2 processors (L1 cache: 64K, L2 cache: 128K). The memory latency (DIRECTORY_LATENCY) is 120. In my benchmark, a data size of 512K is read in every iteration. So, I'd assume ruby cycles should be at least data_size/cache_line_size * DIRECTORY_LATENCY * iterations. For example, it should be at least 98,304,000 cycles for 512K, 64B cache line size and 100 iterations. But the cycles I get is much smaller. Is there any optimization used by Ruby that might cause this?
I've modified Ruby profiler to show cache events for each processor. The number of Other_GETS events on processor1 is larger than the number of loads on processor2, and vice versa. Is this not unusual? Any ideas what might cause this?
Thanks!
Dave
____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
|