CDA 4253 FPGA System Design FPGA Architectures
Hao Zheng Dept of Comp Sci & Eng U of South Florida
1
CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of - - PowerPoint PPT Presentation
CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida 1 How to HW Reconfigurable Not SW Change structure Change connections among components Change logic functions of components
1
2
Source: Wikipedia PAL PLA
3
4
5
6
7
8
9
10
floating gate
11
SRAM Transistor
12
13
14
15
16
SRAM SRAM SRAM SRAM
00 01 10 11
17
00 01 10 11
18
00 01 10 11
19
00 01 10 11
20
00 01 10 11
21
00 01 10 11
22
00 01 10 11
00 01 10 11
23
00 01 10 11
00 01 10 11
24
– Size of truth table grows exponentially based on # of inputs
– Same number of rows in truth table and LUT – LUTs grow exponentially based on # of inputs
– i = # of inputs, o = # of outputs – Example: 64 input combinational logic with 1 output would require 264 SRAM bits
– So, how do FPGAs implement logic with many inputs?
25
– Divide circuit into smaller circuits that fit in LUTs (same # of inputs and
– Example: 2-input LUTs
26
27
28
– Each LUT implements 1 full adder – Use efficient connections between LUTs for carry signals
3-in, 2-out LUT
FF 2x1 FF 2x1
3-in, 2-out LUT
FF 2x1 FF 2x1 2x1 A(0) B(0) Cin(0) S(0)
Cin(1)
A(1) B(1) S(1)
Cout(0)
Cout(1)
29
30
Must be flexible to accommodate various circuit implementations.
31
SRAM
32
33
Flexibility = 2 Flexibility = 3
*Dots represent possible connections
SRAM cell
34
35
Most designs display locality of connections – hierarchical routing architecture.
36
37
38
3-in, 1-out LUT
FF 2x1
39
40
……0101000100100010010101
41
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
4-6 T 1T SRAM + Configuration circuitry + Error detection/correction + Security features 6T SRAM 4T SRAM https://en.wikipedia.org/wiki/Static_random- access_memory
42
43
44
45
46
47
48
t1 t2 A B C t1 = x t2 = t1 * A t2 = t2 + B t2 = t2 * t1 y = t2 + C
Temporal Computation
x A B C Spatial Computation y = Ax + Bx + C
2
Y
49
50
51
52
53
54
55
56
57
DS099-1_01_032703
58
Hi-performance Serial I/O Connectivity Transceiver Technology Hi-performance Serial I/O Connectivity Transceiver Technology
DSP Slices Precise, Low Jitter Clocking On-Chip block RAM On-Chip block RAM Logic Fabric Logic Fabric
59
60
Logic Cells Block RAM DSP Slices Peak DSP Perf. Transceivers Transceiver Performance Memory Performance I/O Pins I/O Voltages
Lowest Power and Cost Industry’s Best Price/Performance Industry’s Highest System Performance Maximum Capability
61
Switch Matrix Slice(1) COUT COUT CIN CIN Slice(0) CLB
UG474_c1_01_071910
62
LUT Slice
63
64
Slice
LUT LUT LUT LUT F7 MUX F7 MUX F8 MUX
WP405_06_013012
65
6-Input LUT Register
O6 O5 D Q CE CLK S/R
Register
D Q CE CLK S/R
66
67
68
F = A0A1A3 + A1A2Ā3 + Ā0 Ā1 Ā2
4-input LUT 3-input LUT 2-input LUT
Device Logic Cells Configurable Logic Blocks (CLBs) DSP48E1 Slices(2) Block RAM Blocks(3) Slices(1) Max Distributed RAM (Kb) 18 Kb 36 Kb Max (Kb) XC7A15T 16,640 2,600 200 45 50 25 900 XC7A35T 33,280 5,200 400 90 100 50 1,800 XC7A50T 52,160 8,150 600 120 150 75 2,700 XC7A75T 75,520 11,800 892 180 210 105 3,780 XC7A100T 101,440 15,850 1,188 240 270 135 4,860 XC7A200T 215,360 33,650 2,888 740 730 365 13,140
69
Device Logic Cells Configurable Logic Blocks (CLBs) DSP48E1 Slices(2) Block RAM Block Slices(1) Max Distributed RAM (Kb) 18 Kb 36 Kb XC7A15T 16,640 2,600 200 45 50 25 XC7A35T 33,280 5,200 400 90 100 50 1,8 XC7A50T 52,160 8,150 600 120 150 75 2,7 XC7A75T 75,520 11,800 892 180 210 105 3,7 XC7A100T 101,440 15,850 1,188 240 270 135 4,8 XC7A200T 215,360 33,650 2,888 740 730 365 13, CMTs(4) PCIe(5) GTPs XADC Blocks Total I/O Banks(6) Max User I/O(7) 5 1 4 1 5 250 5 1 4 1 5 250 5 1 4 1 5 250 6 1 8 1 6 300 6 1 8 1 6 300 10 1 16 1 10 500
70