LogCA: A High-Level Performance Model for Hardware Accelerators
Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison
Everything should be made as simple as possible, but not simpler—Albert Einstein
*Now at AMD Research, Austin TX
LogCA: A High-Level Performance Model for Hardware Accelerators - - PowerPoint PPT Presentation
Everything should be made as simple as possible, but not simplerAlbert Einstein LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf* David A. Wood University of Wisconsin-Madison * Now at AMD Research,
Everything should be made as simple as possible, but not simpler—Albert Einstein
*Now at AMD Research, Austin TX
2
3
4
M7: Next Generation SPARC Hotchips-26 2014 Power8 Hpctchips-25 2013
5
0.001 0.01 0.1 1 10 Time (ms) Block Size (Bytes) Host Accelerator
Break-even point Accelerator outperforms Host outperforms
Better
6
0.1 1 10 100 Speedup Offloaded Data (Bytes) UltraSPARC T2 SPARC T4 GPU Break-even points Advanced Encryption Standard (AES)
Better
7
8
Host Accelerator Interface
9
T0(g) C0(g) time
L1(g) C1(g)=
#$(&) (
T1(g) Host Accelerator Interface Gain
&→- 𝑇𝑞𝑓𝑓𝑒𝑣𝑞 = 𝐵
# 89:9;
<
# 89:
10
0.1 1 10 Speedup (g) Granularity (Bytes)
A
A
A
( B
A
11
0.1 1 10 100 Speedup Granularity (Bytes)
A 7 (
B
Simple Interface Complex Interface 𝒉𝟐 𝒉𝟐 Large Small
&→- 𝑇𝑞𝑓𝑓𝑒𝑣𝑞 < # : (𝑚𝑗𝑜𝑓𝑏𝑠 𝑏𝑚𝑝𝑠𝑗𝑢ℎ𝑛𝑡)
12
0.1 1 10 Speedup (g) Granularity (Bytes)
A 𝐷 𝑀
0.1 1 10 Speedup (g) Granularity (Bytes)
A g Speedup Sub-linearly Linearly
13
Speedup Granularity (Bytes) A/2 A/2 1 7
𝐷 𝑀 ≥ 1 A 𝐷 𝑀 ≥ 𝐵
14
0.1 1 10 100 1000 Speedup Granularity (Bytes)
LogCA L_0.1x
C_10x A_10x
A
A
A 𝐷 𝑀 ⁄
15
16
17
PCIe Crypto Accelerator UltraSPARC T2 SPARC T3 SPARC T4 engine SPARC T4 instructions
18
19 Source: http://www.medarcade.com/