Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic - - PowerPoint PPT Presentation

unit 18
SMART_READER_LITE
LIVE PREVIEW

Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic - - PowerPoint PPT Presentation

18.1 Unit 18 Field Programmable Gate Arrays (FPGAs) Implementing Logic Functions with Memories 18.2 HARDWARE IMPLEMENTATION TARGETS 18.3 Processing Logic Approaches Application Recall HW/SW designs sit on a continuum Specific


slide-1
SLIDE 1

18.1

Unit 18

Field Programmable Gate Arrays (FPGAs) Implementing Logic Functions with Memories

slide-2
SLIDE 2

18.2

HARDWARE IMPLEMENTATION TARGETS

slide-3
SLIDE 3

18.3

Processing Logic Approaches

  • Recall HW/SW designs sit on a continuum
  • Suppose I want to implement: F = (X+Y)*(A+B)
  • Custom Hardware (Faster, Less Power)

– Logic that directly implements a specific task – Example above may use separate adders and a multiplier unit

  • General Purpose (GP) Processor/Microcontroller

(Design Time, Cost)

– Logic designed to execute SW instructions – Provides basic processing resources that are reused by each instruction

  • What if I want to perform: (X*Y) + (A*B)

– What's easiest to redesign?

+

(Adder)

+

(Adder)

*

X Y A B F Custom HW Implementation Computing System Continuum

Application Specific Hardware (no software) Processor Executing Software Flexibility, Design Time Performance Cost

+ *

CPU control Instruc. Store

ADD T,X,Y ADD S,A,B MUL F,T,S

GP Proc. Implementation

  • f (X+Y)*(A+B)

Data in Mem.

Proc

slide-4
SLIDE 4

18.4

Progression of HW Logic Density

  • Our ability to design hardware components with

greater numbers of gates/transistors has increased exponentially

  • Small Scale Integrated (SSI) Circuits

– 1960’s and 1970’s – A few gates on a chip (74LS00 has 4 NAND gates)

  • Medium Scale Integrated (MSI) Circuits

– 1970’s – Around a hundred gates per chip (4-bit adder)

  • Large Scale Integrated (LSI) Circuits
  • Very Large Scale Integrated (VLSI) Circuits

– 100’s of millions of gates

slide-5
SLIDE 5

18.5

ASICs

  • Application Specific Integrated Circuits (ASICs) is

another name for a typical "chip"

  • Computer engineers determine the gates and

their interconnection that performs a specific task/application

– Start with high level "behavioral" description – Use CAD software tools to refine that to logic gates – Use CAD software tools to refine that to transistors and where each should be located on the surface of the chip and how they should be wired together – From there the chip is fabricated and mass-produced

  • Design process is expensive, and once

fabricated the design cannot be changed (but it is fast and uses less power)

In an ASIC design, a unique chip will be manufactured that implements our design at which point the HW design is fixed & cannot be changed (example: Pentium, etc.)

slide-6
SLIDE 6

18.6

ASICs

slide-7
SLIDE 7

18.7

Motivation for Reconfigurable Logic

  • Could we get some of the benefits of

both hardware (speed/power) AND software (flexible/reusable)

  • Yes…enter Field Programmable Gate

Arrays (FPGAs)

– Has prebuilt, generic hardware constructs that can be configured and interconnected based on one design and then reconfigured and interconnected later for another design

  • Let's learn more about the secret

ingredient to FPGAs…memories!

Computing System Continuum

Application Specific Hardware (no software / custom chip) Microcontroller/Processor Executing Software

Reconfigurable Hardware; FPGAs

FPGA’s have “logic resources” on them that we can configure to implement our specific

  • design. We can then

reconfigure it to implement another design

slide-8
SLIDE 8

18.8

Where are FPGAs Used

  • Datacenters

– Bing search engine – Real-time data analytics – Compression and encryption – High-frequency trading

  • Robots and Rovers

– JPL and the Mars Rovers

  • Telecom
  • Aerospace
slide-9
SLIDE 9

18.9

USING MEMORIES TO BUILD COMBINATIONAL CIRCUITS

slide-10
SLIDE 10

18.10

MEMORY BASICS

Dimensions and Operations

slide-11
SLIDE 11

18.11

Memories

  • Memories store (write) and retrieve (read)

data

– Read-Only Memories (ROM’s): Can only retrieve data (contents are initialized and then cannot be changed) – Read-Write Memories (RWM’s): Can retrieve data and change the contents to store new data

slide-12
SLIDE 12

18.12

ROM’s

  • Memories are just tables
  • f data with rows and

columns

  • When data is read, one

entire row of data is read

  • ut
  • The row to be read is

selected by putting a binary number on the address inputs

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A2 A0 A1 D3 D2 D1 D0

1 2 3 4 5 6 7

Address Inputs Data Outputs ROM

slide-13
SLIDE 13

18.13

ROM’s

  • Example

– Address = 410 = 1002 is provided as input – ROM outputs data in that row (1101 bin.)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A2 A0 A1 1 1 1

1 2 3 4 5 6 7

Address: 1002 = 410 Data: Row 4 is

  • utput

ROM 1 D3 D2 D1 D0

slide-14
SLIDE 14

18.14

Memory Dimensions

  • Memories are named by

their dimensions:

– Rows x Columns

  • n rows and m columns =>

n x m ROM

  • n rows => log2n address bits

…or… 2k rows => k address bits

  • m cols => m data outputs

… 1 1 1 1

1 2 2n-2

ROM

. . . 2n-1

An-1 A0 A1 … Dm-1 D0

slide-15
SLIDE 15

18.15

RWM’s

  • Writable memories

provide a set of data inputs for write data (as

  • pposed to the data
  • utputs for read data)
  • A control signal R/W

(1=READ / 0 = WRITE) is provided to tell the memory what operation the user wants to perform

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A2 A0 A1 DO3 DO2 DO1 DO0

1 2 3 4 5 6 7

Address Inputs Data Outputs 8x4 RWM DI2 DI0 DI1 DI3 Data Inputs R/W

slide-16
SLIDE 16

18.16

RWM’s

  • Write example

– Address = 310 = 0112 – DI = 1210 = 11002 – R/W = 0 => Write op.

  • Data in row 3 is
  • verwritten with the new

value of 11002.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 ? ? ? ?

1 2 3 4 5 6 7

Address Inputs Data Outputs 8x4 RWM 1 1 Data Inputs R/W

1 1 0 0

A2 A0 A1 DI2 DI0 DI1 DI3 DO3 DO2 DO1 DO0 R/W

slide-17
SLIDE 17

18.17

USING MEMORIES TO BUILD COMBINATIONAL FUNCTIONS

Look-up tables…

slide-18
SLIDE 18

18.18

Memories as Look-Up Tables

  • One major application of memories in digital

design is to use them as LUT’s (Look-Up Tables) to implement logic functions

– This is the core technology used by FPGAs (Field- Programmable Gate Arrays)

  • Idea: Use a memory to hold the truth table of a

function and feed the inputs of the function to the address inputs to "look-up" the answer

slide-19
SLIDE 19

18.19

Implementing Functions w/ Memories

1 1 1 1

A2 A0 A1 D0

1 2 3 4 5 6 7

8x1 Memory

X Y Z F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Arbitrary Logic Function X Z Y F

1 1 1 1

A2 A0 A1 D0

1 2 3 4 5 6 7

8x1 Memory 1 1 X,Y,Z inputs “look up” the correct answer Use a memory with the same dimensions as 'output' side of the truth table. It's almost TOO easy.

X Y Z F X Y Z F A0 A1 A2 D0 8x1 Mem.

slide-20
SLIDE 20

18.20

Implementing Functions w/ Memories

1 1 1 1 1 1 1 1

A2 A0 A1 D1

1 2 3 4 5 6 7

8x2 Memory

X Y Z C S 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Multi-bit function (One's count) X Z Y C 8x2 Memory D0 S

1 1 1 1 1 1 1 1

A2 A0 A1 D1

1 2 3 4 5 6 7

1 1 1 D0 1+0+1 = 10 Use a memory with the same dimensions as 'output' side of the truth table. It's almost TOO easy.

slide-21
SLIDE 21

18.21

3-bit Squaring Circuit

  • Q: What size memory

would you use to build

  • ur 3-bit squaring

circuit?

  • A: 8x6 memory
  • Q: What would you

connect to the address inputs of the memory?

  • A: A[2:0]
  • Q: What bits would you

program into row 5 of the memory?

  • A: 011001 (i.e. 25 = 52)

Inputs Outputs A A2 A1 A0 B5 B4 B3 B2 B1 B0 B=A2

1 1 1 1 2 1 1 4 3 1 1 1 1 9 4 1 1 16 5 1 1 1 1 1 25 6 1 1 1 1 36 7 1 1 1 1 1 1 49

Memory Contents to build 3-bit Squaring Circuit

slide-22
SLIDE 22

18.22

4x4 Multiplier Example

Determine the dimensions of the memory that would be necessary to implement a 4x4-bit unsigned multiplier with inputs X[3:0] and Y[3:0] and outputs P[??:0] Question: How many bits are needed for P? Question: What are the contents of the numbered rows? Example: X3X2X1X0=0010 Y3Y2Y1Y0=0001 P = X * Y = 2 * 1 = 2 = 00010

ROM ...

A2 A0 A1 Y1 Y0 Y2 Y3 A3 A6 A4 A5 X1 X0 X2 X3 A7 P7 P0 2 20 39 255 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0 0 1 20=00010100 =0001*0100=4 39=00100111 =0010*0111=14 255=11111111 =1111*1111=225

slide-23
SLIDE 23

18.23

Implementing Functions w/ Memories

  • To implement a function w/ n-variables and m outputs
  • Just place the output truth table values in the memory
  • Memory will have dimensions: 2n rows and m columns

– Still does not scale terribly well (i.e. n-inputs requires memory w/ 2n outputs) – But it is easy and since we can change the contents of memories it allows us to create "reconfigurable" logic – This idea is at the heart of FPGAs

slide-24
SLIDE 24

18.24

FPGAS

slide-25
SLIDE 25

18.25

Basis of FPGA’s

  • Memories provide a universal way to

implement any combinational logic function

– 2n x m memory can implement a function of n-variables and m outputs

  • If we use RWM (read/write memory)

rather than ROM’s we can change what function the memory implements

  • Memories are referred to as Look-up

Tables (LUT’s)

1 1 1 1 1 1 1 1

X Cin Y Cout S D1 D0

1 2 3 4 5 6 7

8x2 Memory A2 A0 A1 Full Adder Implementation

slide-26
SLIDE 26

18.26

Configurable Logic Blocks (CLB’s)

  • The memory allows for any

combinational function

  • Provided D-FF’s allow designs

with sequential logic

– “Bypass” mux selects the pure combinational output of the LUT or the sequential/registered/D-FF

  • utput
  • Blue boxes indicate configurable

bits that control the operation and function of the logic

Any 3-input / 2-output combinational function FF’s if sequential logic needed 1 2 3 4 5 6 7 1 1 1 1 1 1 1 1 A0 A1 A2 D1 D0 8x2 Mem.

CLK D Q CLK D Q

CLB

1 1

bypass mux

slide-27
SLIDE 27

18.27

Routing & Switch Matrices

  • Inputs and outputs of

neighboring CLB’s connect to a “switch matrix” (SM)

  • Switch matrix is simply

composed of muxes that allow us to “route” inputs and

  • utputs to another

CLB or further away

SM CLB CLB CLB CLB

3 2 2 3 2 3 3 2

SM CLB CLB CLB CLB

3 2 2 3 2 3 3 2

SM CLB CLB CLB CLB

3 2 2 3 2 3 3 2

SM CLB CLB CLB CLB

3 2 2 3 2 3 3 2

slide-28
SLIDE 28

18.28

Routing & Switch Matrices

  • Suppose we want

the connection shown in green and purple, what select values would be used?

B A L B A L L B A L B A ... ... ... ... C

To / from N SM

Switch Matrix (SM) CLB CLB

To / from E SM To / from S SM

CLB CLB

To / from W SM

A B D E F G H I J K L 11 1 1 11 1 11 11 1 1110=10112 110=00012

slide-29
SLIDE 29

18.29

Place and Route

  • ASIC: Find where each gate should be placed on the chip and how to route

the wires that connect to it

– Direct connections can be faster

  • FPGA: Determine which LUT’s should be used and how to route through

switch matrices

– Added delay to go through the routing muxes

ASIC FPGA

SM CLB CLB CLB CLB

3 2 2 3 2 3 3 2

SM CLB CLB CLB CLB

3 2 2 3 2 3 3 2

slide-30
SLIDE 30

18.30

B A L L B A ... ... C

To / from N SM

Switch Matrix (SM) CLB

To / from E SM

A B D E F 11 1 1 11

CLB

CLB 1 CLB 2 CLB 1 CLB 2 CLB 2 CLB 1

Exercise

  • Find the configuration bits to build a 3-bit

free-running (always enabled) counter

1 2 3 4 5 6 7 A0 A1 A2 D1 D0 8x2 Mem.

CLK D Q CLK D Q

CLB

1 1

1 2 3 4 5 6 7 A0 A1 A2 D1 D0 8x2 Mem.

CLK D Q CLK D Q

CLB

1 1

0 1 1 0 d d d d d d d d d d d d 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 Q0 Co Q1 Q2 1 1 1 Q1 Q2 Q0 Co Q0 Co Q0 Co Q1 Q2 Q1 Q2 A = 000 D = 011 E = 100 X = XXX X = XXX B = 001

HA 3-bit Reg. HA HA

1 Q0 Q1 Q2 Ci Q1 Q2 Q0

Q0 Co Q0* (Q0+1) 1 1 1

Co

Q2 Q1 Co Q2* Q1* 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-31
SLIDE 31

18.31

ASIC’s vs. FPGA’s

  • ASIC’s

– Faster – Handles Larger Designs – More Expensive – Less Flexible (Cannot be reconfigured to perform a new hardware function)

  • FPGA’s

– Slower (extra logic to make it reconfigurable) – Smaller Designs – Less Expensive – Extremely Flexible

slide-32
SLIDE 32

18.32

Modern FPGA's

  • SoC design (Xilinx Kintex [KU115])

– Quad-Core ARM cores – DDR3 SDRAM Memory Interface – ~800 I/O Pins – ~15M gate equivalent FPGA fabric

  • ~1M D-FFs + 552K LUTs
  • 1968 dedicated DSP "slices" 18x18 multiply + adder