Becoming More Tolerant: Designing FPGAs for Variable Supply Voltage
Ibrahim Ahmed Linda Shen Vaughn Betz
Becoming More Tolerant: Designing FPGAs for Variable Supply Voltage - - PowerPoint PPT Presentation
Becoming More Tolerant: Designing FPGAs for Variable Supply Voltage Ibrahim Ahmed Linda Shen Vaughn Betz Technology Scaling: Transforming the World Packing ever more computations on a single chip 2 Technology Scaling: Transforming the
Ibrahim Ahmed Linda Shen Vaughn Betz
2
3
4
5
[a] N. Jones. How to stop data centres from gobbling up the worlds electricity. Nature, 561:163-166, 09 2018. [b] A. Shehabi et al. United States Data Center Energy Usage Report. Lawrence Berkeley National Laboratory, Berkeley, California., 2016.
6
[a] N. Jones. How to stop data centres from gobbling up the worlds electricity. Nature, 561:163-166, 09 2018. [b] A. Shehabi et al. United States Data Center Energy Usage Report. Lawrence Berkeley National Laboratory, Berkeley, California., 2016.
7
8
9
10
11
Nominal Vdd not scaling
12
13
FPGA Arria 10 Stratix 10 Agilex Range (V) 0.85-0.9 0.8-0.94 0.6-1
14
15
16
Representative FPGA tile Logic Cluster (LC) Basic Logic Element (BLE)
17
Representative FPGA tile Logic Cluster (LC) Basic Logic Element (BLE)
Routing Logic
18
9-input two-stage multiplexer
I0 I1 I2 I3 I4 I5 I6 I7 I8
SRAM cell storing 1 SRAM cell storing 0
19
Tree-based 6-input LUT multiplexer
SRAM cells
20
Tree-based 6-input LUT multiplexer
SRAM cells A routing MUX that connects one of the LC inputs to a LUT input
21
22
Setup to measure path delays
23
Measuring different types of paths on Stratix V
Setup to measure path delays
24
Measuring different types of paths on Stratix V
LUT delay is more sensitive to Vdd Setup to measure path delays
25
26
Routing delay increases with increasing Vdd above nominal Gate boosted pass transistors
27
Routing delay increases with increasing Vdd above nominal Gate boosted pass transistors
28
29
30
LUT
31
LUT
32
LUT
33
34
Conventional LUT (baseline) decode LUT
35
Local MUX
36
Vddl Vddh
Local MUX
Vddl
37
38
39
40
FPGAs at nominal and below
lower area-delay than baseline at nominal
41
42
43
lower power than baseline
44
lower power than baseline
GB LUT and TG LUT power by 35% and 25%, respectively
45
46
47
when input A toggles
when B or C toggles
higher energy
ED2 at 0.8 V
ED2 at 0.6 V
48
49
50
baseline
largest Fmax
51
52
53
BLIF VPR Architecture file @ 0.8 V .place .route CP delay CP delay STA at 0.6 V STA at 1.0 V
Vnom-optimization flow
54
BLIF VPR Architecture file @ 0.8 V .place .route CP delay CP delay STA at 0.6 V STA at 1.0 V BLIF VPR Architecture file @ 0.6 V .place .route CP delay VPR VPR Architecture file @ 0.7 V Architecture file @ 1 V .place .route .place .route CP delay CP delay
Vnom-optimization flow Vused-optimization flow
55
56
Two-stage routing multiplexer
Tree-based 6-input LUT multiplexer
57
Block Load Input src
58
Block Load Input src
59
60
contributes ~78% of the FPGA active power.