CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of - PowerPoint PPT Presentation

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida 1

How to HW Reconfigurable • Not SW • Change structure – Change connections among components – Change logic functions of components 2

History – Simple Programmable Logic PLA PAL 3 Source: Wikipedia

History – Complex Programmable Logic • Built on top of SPL • Suitable for small scale applications • Coarse-grained programmability 4

FPGAs – Generic Architecture Also include common fixed logic blocks for higher performance: • On-chip mem. • DSP/Multiplier • Fast arithmetic logic • Microprocessors • Communication logic 5

Programming Technologies 6

Programming Technologies: Fuses 7

Programming Technologies: Fuses 8

Programming Technologies: Anti-fuses 9

Programming Technologies: Anti-fuses 10

Programming Technologies: FLASH floating gate 11

Programming Technologies: SRAM Transistor 0 1 SRAM Open Closed 12

Static RAM Cell 13

Basic Logic Elements (BLEs) Basic component that can be programmed to logic functions and provide storage. 15

Lookup Tables (LUTs) x y Commercial FPGAs • Xilinx: 6-LUT • Altera: 6-LUT 00 SRAM • Microsemi: 4-LUT SRAM 01 For x-input LUT, it can be SRAM 10 programmed into one of SRAM 11 2 2 x functions. 16

LUT = Programmable Truth Table x y x y z 0 0 A A 00 0 1 B B 01 1 0 C z C 10 1 1 D D 11 Also called function generator. 17

AND x y x y z 0 0 0 0 00 0 1 0 0 01 1 0 0 z 0 10 1 1 1 1 11 18

OR x y x y z 0 0 0 0 00 0 1 1 1 01 1 0 1 z 1 10 1 1 1 1 11 19

NAND x y x y z 0 0 1 1 00 0 1 1 1 01 1 0 1 z 1 10 1 1 0 0 11 20

NOR x y x y z 0 0 1 1 00 0 1 0 0 01 1 0 0 z 0 10 1 1 0 0 11 21

XNOR XOR x y x y 00 00 01 01 z z 10 10 11 11 22

z = y + x z = y x y x y 00 00 01 01 z z 10 10 11 11 23

Features of LUTs • A LUT is a piece of RAM. – Can be configured as distributed RAM in Xilinx. – Can be configured as shift registers. • A n- LUT can implement any n- input logic functions. – Logic minimization should reduce the number of inputs, not logical operators. • All logic functions implemented by a n -LUT have the same propagation delay. 24

Look-up-tables (LUTs) • Why aren � t FPGAs just a big LUT? – Size of truth table grows exponentially based on # of inputs • 3 inputs = 8 rows, 4 inputs = 16 rows, 5 inputs = 32 rows, etc. – Same number of rows in truth table and LUT – LUTs grow exponentially based on # of inputs • Number of SRAM bits in a LUT = 2 i * o – i = # of inputs, o = # of outputs – Example: 64 input combinational logic with 1 output would require 2 64 SRAM bits • 1.84 x 10 19 SRAM bits required. • Large LUT à long latency • Clearly, not feasible to use large LUTs – So, how do FPGAs implement logic with many inputs? 25

Look-up-tables (LUTs) • Map circuits onto multiple LUTs – Divide circuit into smaller circuits that fit in LUTs (same # of inputs and outputs) – Example: 2-input LUTs 26

Sequential Logic LUT FF MUX 27

Configurable Logic Blocks Number of BLEs are grouped with a local network in order to implement functions with a large number of inputs and multiple outputs. More efficient to implement logic functions with common I/O. Save routing resources. 28

Configurable Logic Blocks (CLBs) Example: Ripple-carry A(0) B(0) Cin(0) A(1) B(1) adder – Each LUT implements 1 Cin(1 ) 2x1 full adder – Use efficient 3-in, 2-out 3-in, 2-out connections between LUT LUT LUTs for carry signals FF FF FF FF 2x1 2x1 2x1 2x1 Cout(0) Cout(1) S(1) S(0) 29

Programmable Interconnect 30

FPGA Routing Architectures Must be flexible to accommodate various circuit implementations. 31

Connection Boxes SRAM Programmable switches 32

Connection Boxes • Flexibility – the number of wires a CLB input/output can connect to Flexibility = 2 Flexibility = 3 CLB CLB CLB CLB *Dots represent possible connections 33

Switch Boxes SRAM cell 34

Segmented Routing • Short wires: many, local connections. • Long wires: few, low latency, carrying global signals • Dedicated long wires for clock/reset signals • Optimal routing should use minimal number of programmable connections 35

Hierarchical Routing Architecture Most designs display locality of connections – hierarchical routing architecture. 36

Configuration 37

FPGA Configuration 3-in, 1-out LUT FF 2x1 How to get a bitstream into FPGA? 38

FPGA Configuration 39

FPGA Configuration ……0101000100100010010101 40

FPGA Configuration – After 1 0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 1 41

Configuration Comes at a Cost 1T 6T SRAM 4-6 T SRAM 4T SRAM + Configuration circuitry + Error detection/correction + Security features https://en.wikipedia.org/wiki/Static_random- 42 access_memory

FPGA Design Flow 43

FPGA CAD Flow • Input: – A circuit (netlist) • Output: – FPGA configuration bitstream • Main (Algorithmic) Stages: – Logic synthesis/optimization – Technology mapping – Packing/placement – Routing – Bitstream generation 44

Computing Technologies 45

HW, SW, and FPGA • Traditional approaches to computation: HW & SW • HW (ASICs) – Fixed on a particular application – Efficient: performance, silicon area, power – Higher cost/per application • SW (microprocessors) – Used in many applications – Less efficient: performance, silicon area, power – Lower cost/per application 46

HW, SW, and FPGA • Field Programmable Gate Arrays (FPGAs) – Spatial computing: similar to HW – Reprogrammable: similar to SW – Faster than SW and more flexible than HW – Harder to program than SW – Less efficient than HW: performance, power consumption & silicon area 47

Temporal vs Spatial Computing (SW vs. HW) 2 y = Ax + Bx + C Temporal Computation Spatial Computation x B * * t1 t1 = x t2 C t2 = t1 * A t2 = t2 + B A * + A t2 = t2 * t1 B y = t2 + C C + Y 48

Why SW is Slower? • Generality: – Instruction set may not provide the operations your program needs – Processors provide hardware that may not be useful in every program or in every cycle of a given program: Multipliers, Dividers • Instruction Memory – Program instructions and intermediate results stored in memory. – Accessing memory is very slow. • Bit Width Mismatches – General purpose processors have a fixed bit width, and all computations are performed on that many bits 49

SW or FPGA? • CPUs – cheaper, faster, sequential, fix data format – Sequential, control-oriented applications • FPGA – costlier, slower, parallel, custom data op. – Applications with data parallelism • FPGA wins if (programming + exec time) FPGA <= (compilation + exec time) CPU 50

How about ASIC HW? • Dedicated -> not programmable. • Takes long time and high cost to design and develop (typical processor takes a handful of years to design, with design teams of a few hundred engineers) – High non-recurring cost (NRE) -> very expensive! • Justification for high cost: high volume applications, or high-performance is more desired 51

ASIC vs FPGA 52

ASIC vs FPGA • Time-to-Market – FPGA 6-12 month shorter • Cost – FPGA much less expensive in low-volume applications • Development time – FPGA shorter as no need to fabricate • Power consumption – ASIC is better – no need to run SRAMs • Debug and Verification – FPGA easier – direct test in-device 53

Instance–Specific Design • ASIC targets a particular application • ASIC more efficient than FPGA in application • FPGA can be more efficient if it is customized to particular instances of an application – Encryption design for specific password – reduce area/power, higher performance • Customizations – Data width – Constant folding – Function adaptation 54

Applications • Low-cost customizable digital circuitry – Can be used to make any type of digital circuit. – Rapid with product development with design software. Upgradable. • High-performance computing – Complex algorithms are off-loaded to an FPGA co-processor. – Application-specific hardware – FPGAs are inherently parallel and can have very efficient hardware algorithms: typical speed increase is x10 - x100. • Evolvable hardware – Hardware can change its own circuitry. – Neural Networks. • Digital Signal Processing 55

Reading • Paper at http://www.cse.usf.edu/~haozheng/teaching/cda4253/ FPGA Architectures: An Overview Section 2.1, 2.2, 2.3, 2.4 (skip 2.4.1.1, 2.4.2.2, 2.4.2.3), Skim 2.6 56

Xilinx 7-Series Devices 57

Xilinx FPGA Architecture DS099-1_01_032703 58

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of - PowerPoint PPT Presentation

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida 1 How to HW Reconfigurable Not SW Change structure Change connections among components Change logic functions of components

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci & Eng USF

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci & Eng U of South Florida

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci & Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci & Eng U

CDA 4253/CIS 6930 FPGA System Design Sequential Circuit Building Blocks Hao Zheng Dept of Comp

CDA Technology and Design Overview ubomr Hribk www.tempest.technology CDA DESIGN

Slides on the IT- Slides on the IT- CDA Service CDA Service Documentation Documentation

CDA InterCorp Controllable Drive Actuators AS9100C certified ISO 9001:2008 certified CDA

Architectures Architectural styles Software architectures Architectures versus middleware

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

CDA 5416 Computer System Verification Bounded Model Checking Hao Zheng Department of Computer

T42 Transputer Design in FPGA Transputer Design in FPGA T42 Year- -Three Design Status

Basic Idea The routing problem in ASIC is typically solved using a two step approach:

From n-gate.com: Some academics arrive to tell us that (once again) they have Fixed The

Homogeneous hierarchical routing scheme for proactive protocols presented in 2nd OLSR INTEROP /

Hierarchical Routing Introduce a larger routing unit IP prefix (hosts) from one host

ELEC / COMP 177 Fall 2012 Some slides from Kurose

NDN ROUTING SECURITY Lan Wang, Beichuan Zhang 2/9/2015 www.named-data.net 2 Routing Security

Oblivious Routing on Geometric Networks Costas Busch, Malik Magdon-Ismail and Jing Xi {

Performance Evaluation of Performance Evaluation of Security- -Aware Routing Protocols Aware

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of - PowerPoint PPT Presentation

CDA 4253 FPGA System Design FPGA Architectures Hao Zheng Dept of Comp Sci & Eng U of South Florida 1 How to HW Reconfigurable Not SW Change structure Change connections among components Change logic functions of components

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci &amp; Eng Univ of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci &amp; Eng USF

CDA 4253/CIS 6930 FPGA System Design Modeling of Combinational Circuits Hao Zheng Dept of Comp

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci &amp; Eng U of South Florida

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci &amp; Eng Univ

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci &amp; Eng U of

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci &amp; Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci &amp; Eng U

CDA 4253/CIS 6930 FPGA System Design Sequential Circuit Building Blocks Hao Zheng Dept of Comp

CDA Technology and Design Overview ubomr Hribk www.tempest.technology CDA DESIGN

Slides on the IT- Slides on the IT- CDA Service CDA Service Documentation Documentation

CDA InterCorp Controllable Drive Actuators AS9100C certified ISO 9001:2008 certified CDA

Architectures Architectural styles Software architectures Architectures versus middleware

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

CDA 5416 Computer System Verification Bounded Model Checking Hao Zheng Department of Computer

T42 Transputer Design in FPGA Transputer Design in FPGA T42 Year- -Three Design Status

Basic Idea The routing problem in ASIC is typically solved using a two step approach:

From n-gate.com: Some academics arrive to tell us that (once again) they have Fixed The

Homogeneous hierarchical routing scheme for proactive protocols presented in 2nd OLSR INTEROP /

Hierarchical Routing Introduce a larger routing unit IP prefix (hosts) from one host

ELEC / COMP 177 Fall 2012 Some slides from Kurose

NDN ROUTING SECURITY Lan Wang, Beichuan Zhang 2/9/2015 www.named-data.net 2 Routing Security

Oblivious Routing on Geometric Networks Costas Busch, Malik Magdon-Ismail and Jing Xi {

Performance Evaluation of Performance Evaluation of Security- -Aware Routing Protocols Aware

CDA 4253 FPGA System Design Op7miza7on Techniques Hao Zheng Comp S ci & Eng Univ of South

CDA 4253 FPGA System Design Introduction to VHDL Hao Zheng Dept of Comp Sci & Eng USF

CDA 4253 FPGA System Design PicoBlaze Interface Hao Zheng Comp Sci & Eng U of South Florida

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ

CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of

CDA 4253/CIS 6930 FPGA System Design VHDL Testbench Development Hao Zheng Comp. Sci & Eng

CDA 4253/CIS 6930 FPGA System Design Finite State Machines Dr. Hao Zheng Comp Sci & Eng U