SLIDE 16 Compensated Dot Product
1 2 3 1K 4K 16K 64K 256K 1M 4M 16M
Array size, elements Cycles per element
Algorithm dot product compensated dot product compensated dot product with FPADDRE
Intel Skylake microarchitecture
1 2 3 1K 4K 16K 64K 256K 1M 4M 16M
Array size, elements Cycles per element
Algorithm dot product compensated dot product compensated dot product
Intel Haswell microarchitecture
2 4 6 1K 4K 16K 64K 256K 1M 4M 16M
Array size, elements Cycles per element
Algorithm dot product compensated dot product compensated dot product with FPADDRE
AMD Steamroller microarchitecture
1 2 3 4 1K 4K 16K 64K 256K 1M 4M 16M
Array size, elements Cycles per element
Algorithm dot product compensated dot product compensated dot product
Intel Knights Corner microarchitecture
- M. Dukhan et al (Georgia Tech)
Wanted: FPADDRE Instruction PMMA’16 16 / 20