Hi Dan,
I made a very
stupid mistake. Because I believed the simulation with default xy routing is
correct, so I only ran our algorithm twice.
A moment ago, we ran the xy routing again. This time the
performance of xy routing is much better, and results show 3.17% packet latency
reduction, and 2.8% ruby cycles reduction.
I think I shall
run the simulation much more times, and use the average results which is more convincing
Thanks again,
Regards,
Fbz
发件人: gems-users-bounces@xxxxxxxxxxx
[mailto:gems-users-bounces@xxxxxxxxxxx] 代表
Dan Gibson
发送时间: 2010年8月26日 1:00
收件人: Gems Users
主题: Re: [Gems-users] 答复: Average packet latency VS. Ruby cycles
A colleague
(Alaa Alemeldeen) reminded me of Amdahl's law just now -- to quote Puzak:
"Everyone knows Amdahl's Law, but quickly forgets it"
In other words, my suggested #2 probably isn't feasible.
Your workload would have to be *really* network-bound to see that kind of
speedup for a 7% reduction in average packet latency, even with multiple
traversals. You should really look into whether some timing transient is
occurring causing one execution to take a different dynamic path.
Regards,
Dan
2010/8/25 fubinzhang <fubinzhang@xxxxxxxxx>
Thanks Dan,
that’s really help
Regards,
Fbz
2. Many protocols cause multiple traversals
of the network -- 7% gain each time sums to greater than 11%.
On Tue, Aug 24, 2010 at 9:04 PM, fubinzhang <fubinzhang@xxxxxxxxx>
wrote:
Hi all,
Recently, we simulate a new routing algorithm on
GEMS with splash2,
and I get puzzled about the results.
For example, for application LU-non-contiguous, our routing algorithm
reduces the average packet latency by 7% compared with the default xy
routing.
However, the ruby cycles is reduced as much as 11%.
I remembered Dan has pointed out the performance uncertainty of parallel
applications. So, I run this application twice, but got the similar results.
How does this happen? Can 7% packet latency reduction lead to 11% execution
time reduction? Is it reasonable?
Thanks in advance.
P.S.
Network Topology: 4x4 mesh
Cache Protocol: MSI_MOSI_CMP_directory
VC per VN: 1
VC buffers: 4
Flow control: We modify the flow control to set an output unit free as soon
as the tail flit has been sent. Then, the VCallocator selects an output unit
once it is free, as well as it has credit.
Regards,
Fbz.
--
http://www.cs.wisc.edu/~gibson
[esc]:wq!
_______________________________________________
Gems-users mailing list
Gems-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/gems-users
Use Google to search the GEMS Users mailing list by adding "site:https://lists.cs.wisc.edu/archive/gems-users/"
to your search.
--
http://www.cs.wisc.edu/~gibson
[esc]:wq!