18.1
Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic - - PowerPoint PPT Presentation
Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic - - PowerPoint PPT Presentation
18.1 Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic Functions with Memories 18.2 HARDWARE IMPLEMENTATION TARGETS 18.3 Processing Logic Approaches Application Recall HW/SW designs sit on a continuum Specific
18.2
HARDWARE IMPLEMENTATION TARGETS
18.3
Processing Logic Approaches
- Recall HW/SW designs sit on a continuum
- Suppose I want to implement: F = (X+Y)*(A+B)
- Custom Hardware (Faster, Less Power)
– Logic that directly implements a specific task – Example above may use separate adders and a multiplier unit
- General Purpose (GP) Processor/Microcontroller
(Design Time, Cost)
– Logic designed to execute SW instructions – Provides basic processing resources that are reused by each instruction
- What if I want to perform: (X*Y) + (A*B)
– What's easiest to redesign?
+
(Adder)
+
(Adder)
*
X Y A B F Custom HW Implementation Computing System Continuum
Application Specific Hardware (no software) Processor Executing Software Flexibility, Design Time Performance Cost
+ *
CPU control Instruc. Store
ADD T,X,Y ADD S,A,B MUL F,T,S
GP Proc. Implementation
- f (X+Y)*(A+B)
Data in Mem.
Proc
18.4
Progression of HW Logic Density
- Our ability to design hardware components with
greater numbers of gates/transistors has increased exponentially
- Small Scale Integrated (SSI) Circuits
– 1960’s and 1970’s – A few gates on a chip (74LS00 has 4 NAND gates)
- Medium Scale Integrated (MSI) Circuits
– 1970’s – Around a hundred gates per chip (4-bit adder)
- Large Scale Integrated (LSI) Circuits
- Very Large Scale Integrated (VLSI) Circuits
– 100’s of millions of gates
18.5
ASICs
- Application Specific Integrated Circuits (ASICs) is
another name for a typical "chip"
- Computer engineers determine the gates and
their interconnection that performs a specific task/application
– Start with high level "behavioral" description – Use CAD software tools to refine that to logic gates – Use CAD software tools to refine that to transistors and where each should be located on the surface of the chip and how they should be wired together – From there the chip is fabricated and mass-produced
- Design process is expensive, and once
fabricated the design cannot be changed (but it is fast and uses less power)
In an ASIC design, a unique chip will be manufactured that implements our design at which point the HW design is fixed & cannot be changed (example: Pentium, etc.)
18.6
ASICs
18.7
Motivation for Reconfigurable Logic
- Could we get some of the benefits of
both hardware (speed/power) AND software (flexible/reusable)
- Yes…enter Field Programmable Gate
Arrays (FPGAs)
– Has prebuilt, generic hardware constructs that can be configured and interconnected based on one design and then reconfigured and interconnected later for another design
- Let's learn more about the secret
ingredient to FPGAs…memories!
Computing System Continuum
Application Specific Hardware (no software / custom chip) Microcontroller/Processor Executing Software
Reconfigurable Hardware; FPGAs
FPGA’s have “logic resources” on them that we can configure to implement our specific
- design. We can then
reconfigure it to implement another design
18.8
Where are FPGAs Used
- Datacenters
– Bing search engine – Real-time data analytics – Compression and encryption – High-frequency trading
- Robots and Rovers
– JPL and the Mars Rovers
- Telecom
- Aerospace
18.9
USING MEMORIES TO BUILD COMBINATIONAL CIRCUITS
18.10
MEMORY BASICS
Dimensions and Operations
18.11
Memories
- Memories store (write) and retrieve (read)
data
– Read-Only Memories (ROM’s): Can only retrieve data (contents are initialized and then cannot be changed) – Read-Write Memories (RWM’s): Can retrieve data and change the contents to store new data
18.12
ROM’s
- Memories are just tables
- f data with rows and
columns
- When data is read, one
entire row of data is read
- ut
- The row to be read is
selected by putting a binary number on the address inputs
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A2 A0 A1 D3 D2 D1 D0
1 2 3 4 5 6 7
Address Inputs Data Outputs ROM
18.13
ROM’s
- Example
– Address = 410 = 1002 is provided as input – ROM outputs data in that row (1101 bin.)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A2 A0 A1 1 1 1
1 2 3 4 5 6 7
Address: 1002 = 410 Data: Row 4 is
- utput
ROM 1 D3 D2 D1 D0
18.14
Memory Dimensions
- Memories are named by
their dimensions:
– Rows x Columns
- n rows and m columns =>
n x m ROM
- n rows => log2n address bits
…or… 2k rows => k address bits
- m cols => m data outputs
… 1 1 1 1
1 2 2n-2
ROM
. . . 2n-1
An-1 A0 A1 … Dm-1 D0
18.15
RWM’s
- Writable memories
provide a set of data inputs for write data (as
- pposed to the data
- utputs for read data)
- A control signal R/W
(1=READ / 0 = WRITE) is provided to tell the memory what operation the user wants to perform
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A2 A0 A1 DO3 DO2 DO1 DO0
1 2 3 4 5 6 7
Address Inputs Data Outputs 8x4 RWM DI2 DI0 DI1 DI3 Data Inputs R/W
18.16
RWM’s
- Write example
– Address = 310 = 0112 – DI = 1210 = 11002 – R/W = 0 => Write op.
- Data in row 3 is
- verwritten with the new
value of 11002.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 ? ? ? ?
1 2 3 4 5 6 7
Address Inputs Data Outputs 8x4 RWM 1 1 Data Inputs R/W
1 1 0 0
A2 A0 A1 DI2 DI0 DI1 DI3 DO3 DO2 DO1 DO0 R/W
18.17
USING MEMORIES TO BUILD COMBINATIONAL FUNCTIONS
Look-up tables…
18.18
Memories as Look-Up Tables
- One major application of memories in digital
design is to use them as LUT’s (Look-Up Tables) to implement logic functions
– This is the core technology used by FPGAs (Field- Programmable Gate Arrays)
- Idea: Use a memory to hold the truth table of a
function and feed the inputs of the function to the address inputs to "look-up" the answer
18.19
Implementing Functions w/ Memories
1 1 1 1
A2 A0 A1 D0
1 2 3 4 5 6 7
8x1 Memory
X Y Z F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Arbitrary Logic Function X Z Y F
1 1 1 1
A2 A0 A1 D0
1 2 3 4 5 6 7
8x1 Memory 1 1 X,Y,Z inputs “look up” the correct answer Use a memory with the same dimensions as 'output' side of the truth table. It's almost TOO easy.
X Y Z F X Y Z F A0 A1 A2 D0 8x1 Mem.
18.20
Implementing Functions w/ Memories
1 1 1 1 1 1 1 1
A2 A0 A1 D1
1 2 3 4 5 6 7
8x2 Memory
X Y Z C S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Multi-bit function (One's count) X Z Y C 8x2 Memory D0 S
1 1 1 1 1 1 1 1
A2 A0 A1 D1
1 2 3 4 5 6 7
1 1 1 D0 1+0+1 = 10 Use a memory with the same dimensions as 'output' side of the truth table. It's almost TOO easy.
18.21
3-bit Squaring Circuit
- Q: What size memory
would you use to build
- ur 3-bit squaring
circuit?
- A: 8x6 memory
- Q: What would you
connect to the address inputs of the memory?
- A: A[2:0]
- Q: What bits would you
program into row 5 of the memory?
- A: 011001 (i.e. 25 = 52)
Inputs Outputs A A2 A1 A0 B5 B4 B3 B2 B1 B0 B=A2
1 1 1 1 2 1 1 4 3 1 1 1 1 9 4 1 1 16 5 1 1 1 1 1 25 6 1 1 1 1 36 7 1 1 1 1 1 1 49
Memory Contents to build 3-bit Squaring Circuit
18.22
4x4 Multiplier Example
Determine the dimensions of the memory that would be necessary to implement a 4x4-bit unsigned multiplier with inputs X[3:0] and Y[3:0] and outputs P[??:0] Question: How many bits are needed for P? Question: What are the contents of the numbered rows? Example: X3X2X1X0=0010 Y3Y2Y1Y0=0001 P = X * Y = 2 * 1 = 2 = 00010
ROM ...
A2 A0 A1 Y1 Y0 Y2 Y3 A3 A6 A4 A5 X1 X0 X2 X3 A7 P7 P0 2 20 39 255 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 1 20=00010100 =0001*0100=4 39=00100111 =0010*0111=14 255=11111111 =1111*1111=225
18.23
Implementing Functions w/ Memories
- To implement a function w/ n-variables and m outputs
- Just place the output truth table values in the memory
- Memory will have dimensions: 2n rows and m columns
– Still does not scale terribly well (i.e. n-inputs requires memory w/ 2n outputs) – But it is easy and since we can change the contents of memories it allows us to create "reconfigurable" logic – This idea is at the heart of FPGAs
18.24
FPGAS
18.25
Basis of FPGA’s
- Memories provide a universal way to
implement any combinational logic function
– 2n x m memory can implement a function of n-variables and m outputs
- If we use RWM (read/write memory)
rather than ROM’s we can change what function the memory implements
- Memories are referred to as Look-up
Tables (LUT’s)
1 1 1 1 1 1 1 1
X Cin Y Cout S D1 D0
1 2 3 4 5 6 7
8x2 Memory A2 A0 A1 Full Adder Implementation
18.26
Configurable Logic Blocks (CLB’s)
- The memory allows for any
combinational function
- Provided D-FF’s allow designs
with sequential logic
– “Bypass” mux selects the pure combinational output of the LUT or the sequential/registered/D-FF
- utput
- Blue boxes indicate configurable
bits that control the operation and function of the logic
Any 3-input / 2-output combinational function FF’s if sequential logic needed 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1 A0 A1 A2 D1 D0 8x2 Mem.
CLK D Q CLK D Q
CLB
1 1
bypass mux
18.27
Routing & Switch Matrices
- Inputs and outputs of
neighboring CLB’s connect to a “switch matrix” (SM)
- Switch matrix is simply
composed of muxes that allow us to “route” inputs and
- utputs to another
CLB or further away
SM CLB CLB CLB CLB
3 2 2 3 2 3 3 2
SM CLB CLB CLB CLB
3 2 2 3 2 3 3 2
SM CLB CLB CLB CLB
3 2 2 3 2 3 3 2
SM CLB CLB CLB CLB
3 2 2 3 2 3 3 2
18.28
Routing & Switch Matrices
- Suppose we want
the connection shown in green and purple, what select values would be used?
B A L B A L L B A L B A ... ... ... ... C
To / from N SM
Switch Matrix (SM) CLB CLB
To / from E SM To / from S SM
CLB CLB
To / from W SM
A B D E F G H I J K L 11 1 1 11 1 11 11 1 1110=10112 110=00012
18.29
Place and Route
- ASIC: Find where each gate should be placed on the chip and how to route
the wires that connect to it
– Direct connections can be faster
- FPGA: Determine which LUT’s should be used and how to route through
switch matrices
– Added delay to go through the routing muxes
ASIC FPGA
SM CLB CLB CLB CLB
3 2 2 3 2 3 3 2
SM CLB CLB CLB CLB
3 2 2 3 2 3 3 2
18.30
B A L L B A ... ... C
To / from N SM
Switch Matrix (SM) CLB
To / from E SM
A B D E F 11 1 1 11
CLB
CLB 1 CLB 2 CLB 1 CLB 2 CLB 2 CLB 1
Exercise
- Find the configuration bits to build a 3-bit
free-running (always enabled) counter
1 2 3 4 5 6 7 A0 A1 A2 D1 D0 8x2 Mem.
CLK D Q CLK D Q
CLB
1 1
1 2 3 4 5 6 7 A0 A1 A2 D1 D0 8x2 Mem.
CLK D Q CLK D Q
CLB
1 1
0 1 1 0 d d d d d d d d d d d d 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 Q0 Co Q1 Q2 1 1 1 Q1 Q2 Q0 Co Q0 Co Q0 Co Q1 Q2 Q1 Q2 A = 000 D = 011 E = 100 X = XXX X = XXX B = 001
HA 3-bit Reg. HA HA
1 Q0 Q1 Q2 Ci Q1 Q2 Q0
Q0 Co Q0* (Q0+1) 1 1 1
Co
Q2 Q1 Co Q2* Q1* 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
18.31
ASIC’s vs. FPGA’s
- ASIC’s
– Faster – Handles Larger Designs – More Expensive – Less Flexible (Cannot be reconfigured to perform a new hardware function)
- FPGA’s
– Slower (extra logic to make it reconfigurable) – Smaller Designs – Less Expensive – Extremely Flexible
18.32
Modern FPGA's
- SoC design (Xilinx Kintex [KU115])
– Quad-Core ARM cores – DDR3 SDRAM Memory Interface – ~800 I/O Pins – ~15M gate equivalent FPGA fabric
- ~1M D-FFs + 552K LUTs
- 1968 dedicated DSP "slices" 18x18 multiply + adder