SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture - - PowerPoint PPT Presentation

soc design soc design g
SMART_READER_LITE
LIVE PREVIEW

SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture - - PowerPoint PPT Presentation

SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture 4: Programmable ASICs L 4 P 4 P bl ASIC bl ASIC Shaahin Hessabi Shaahin Hessabi Department of Computer Engineering Department of Computer Engineering Sharif University of


slide-1
SLIDE 1

SoC Design SoC Design g

L 4 P bl ASIC L 4 P bl ASIC Lecture 4: Programmable ASICs Lecture 4: Programmable ASICs

Shaahin Hessabi Shaahin Hessabi Department of Computer Engineering Department of Computer Engineering Sharif University of Technology Sharif University of Technology

slide-2
SLIDE 2

Programmability Comparison Programmability Comparison g y p g y p

Processors

Processors

All programmability in the program

All programmability in the program (instructions) stored in memory (instructions) stored in memory

ASICs

ASICs

ASICs

ASICs

No programmability

No programmability

FPGAs

FPGAs

FPGAs

FPGAs

Device

Device-

  • wide (re)programmability

wide (re)programmability

Sharif University of Technology

Slide Slide 2 2 of

  • f 52

52

Programmable ASICs

slide-3
SLIDE 3

Programmable Logic Devices Programmable Logic Devices

  • g

b e

  • g c

ev ces

  • g

b e

  • g c

ev ces

PLA PLA PROM PROM PAL PAL

Sharif University of Technology

Slide Slide 3 3 of

  • f 52

52

Programmable ASICs

fixed connection fixed connection programmable connection programmable connection

slide-4
SLIDE 4

How to Program PLDs? How to Program PLDs? g

+5V

AND Plane

. . .

Inputs OR

.

OR Plane

. . .

+5V

Outputs

PLA NOR structure PLA NOR structure (one plane shown) (one plane shown)

Sharif University of Technology

Slide Slide 4 4 of

  • f 52

52

Programmable ASICs

slide-5
SLIDE 5

How to expand PLD architecture? How to expand PLD architecture?

Increase # of inputs/outputs in a conventional PLD?

Increase # of inputs/outputs in a conventional PLD?

Problems:

Problems:

Problems:

Problems:

n times the number of inputs and outputs requires n

n times the number of inputs and outputs requires n2 as much as much chip area chip area too costly too costly

logic gets slower as number of inputs to AND array increases

logic gets slower as number of inputs to AND array increases

Solution: multiple PLDs (i.e., CPLD) with a relatively

Solution: multiple PLDs (i.e., CPLD) with a relatively ll bl i t t ll bl i t t small programmable interconnect. small programmable interconnect.

Less general than a single large PLD, but can use software

Less general than a single large PLD, but can use software “fitter” to partition into smaller PLD blocks. “fitter” to partition into smaller PLD blocks. fitter to partition into smaller PLD blocks. fitter to partition into smaller PLD blocks.

Sharif University of Technology

Slide Slide 5 5 of

  • f 52

52

Programmable ASICs

slide-6
SLIDE 6

CPLD vs. FPGA CPLD vs. FPGA

CPLD architecture:

CPLD architecture:

Small number of large PLDs on a single

Small number of large PLDs on a single

Small number of large PLDs on a single

Small number of large PLDs on a single chip chip

Programmable interconnect between

Programmable interconnect between PLDs PLDs

FPGA architecture:

FPGA architecture:

FPGA architecture:

FPGA architecture:

Much larger number of smaller

Much larger number of smaller programmable logic blocks programmable logic blocks

Embedded in a sea of lots of

Embedded in a sea of lots of programmable interconnects programmable interconnects

Sharif University of Technology

Slide Slide 6 6 of

  • f 52

52

Programmable ASICs

slide-7
SLIDE 7

Benefits of FPGAs over ASICs and Processors Benefits of FPGAs over ASICs and Processors

Processors

Processors

Slow

Slow

Power hungry

Power hungry

ASIC ASIC

ASICs

ASICs

Very expensive

Very expensive

Long production cycles

Long production cycles

Long production cycles

Long production cycles

Upgradeability a major problem

Upgradeability a major problem

FPGAs

FPGAs

Ideal case: combine the best sides of

Ideal case: combine the best sides of hardware and software… hardware and software…

…unfortunately ideal cases rarely exist!

…unfortunately ideal cases rarely exist!

Sharif University of Technology

Slide Slide 7 7 of

  • f 52

52

Programmable ASICs

slide-8
SLIDE 8

FPGAs FPGAs

FPGAs are closer to “programmable ASICs”

FPGAs are closer to “programmable ASICs” --

  • - large

large emphasis on interconnection routing emphasis on interconnection routing p g p g

Timing is difficult to predict

Timing is difficult to predict --

  • - multiple hops vs. the fixed delay of

multiple hops vs. the fixed delay of a CPLD’s switch matrix. a CPLD’s switch matrix. “

But more “scalable” to large sizes.

But more “scalable” to large sizes.

FPGA programmable logic blocks have only a few inputs

FPGA programmable logic blocks have only a few inputs and and 1 1 or

  • r 2

2 flip flip flops but there are a lot more of them flops but there are a lot more of them and and 1 1 or

  • r 2

2 flip flip-flops, but there are a lot more of them flops, but there are a lot more of them compared to the number of compared to the number of macrocells macrocells in a CPLD. in a CPLD.

Key question:

Key question:

Key question:

Key question:

How to make logic blocks programmable?

How to make logic blocks programmable?

How to connect the wires?

How to connect the wires?

after after the chip has been fabricated? the chip has been fabricated?

Sharif University of Technology

Slide Slide 8 8 of

  • f 52

52

Programmable ASICs

slide-9
SLIDE 9

FPGA Technologies FPGA Technologies

Static RAM Cells:

Static RAM Cells:

The programmable connections are made using pass transistors, The programmable connections are made using pass transistors,

transmission gates, or multiplexers that are controlled by SRAM cells.

Advantage: allows fast in circuit reconfiguration Advantage: allows fast in-circuit reconfiguration. Disadvantage: size of the chip required by the RAM technology.

Anti

Anti-fuse: fuse:

Anti

Anti fuse: fuse:

Anti-fuse resides in a high-impedance state. Can be programmed

into low-impedance or "fused" state.

Less expensive than the RAM technology. One-Time Programmable (OTP)

EPROM/EEPROM t i t EPROM/EEPROM t i t

EPROM/EEPROM transistors:

EPROM/EEPROM transistors:

Can be reprogrammed without external storage of configuration. EPROM transistors cannot be re-programmed in-circuit

Sharif University of Technology

Slide Slide 9 9 of

  • f 52

52

EPROM transistors cannot be re-programmed in-circuit.

Programmable ASICs

slide-10
SLIDE 10

SRAM SRAM

  • Static RAM cells are used for three purposes:

Static RAM cells are used for three purposes:

1

As lookup tables (LUTs) for implementing logic As lookup tables (LUTs) for implementing logic

1. 1.

As lookup tables (LUTs) for implementing logic. As lookup tables (LUTs) for implementing logic.

2. 2.

As embedded RAM blocks (for buffer storage, etc.). As embedded RAM blocks (for buffer storage, etc.).

3

As control to routing and configuration switches As control to routing and configuration switches

3. 3.

As control to routing and configuration switches. As control to routing and configuration switches.

  • Advantages:

Advantages:

  • Easily changeable (even dynamic reconfiguration)

Easily changeable (even dynamic reconfiguration)

  • Easily changeable (even dynamic reconfiguration)

Easily changeable (even dynamic reconfiguration)

  • Good density

Good density

  • Track latest SRAM technology (moving even faster than

Track latest SRAM technology (moving even faster than technology for logic) technology for logic)

  • Flexible

Flexible – – not only good for FSM, also good for arithmetic circuits not only good for FSM, also good for arithmetic circuits

Disad antages Disad antages

  • Disadvantages:

Disadvantages:

  • Volatile

Volatile

  • Generally high power

Generally high power

Sharif University of Technology

Slide Slide 10 10 of

  • f 52

52

  • Generally high power

Generally high power

Programmable ASICs

slide-11
SLIDE 11

SRAM Programming Technology SRAM Programming Technology g g gy g g gy

The pass gate: making a connection between two wire

The pass gate: making a connection between two wire segments segments

The multiplexer: connecting the state of the SRAM cells

The multiplexer: connecting the state of the SRAM cells to the select lines to the select lines

Sharif University of Technology

Slide Slide 11 11 of

  • f 52

52

Programmable ASICs

slide-12
SLIDE 12

SRAM SRAM-

  • Controlled Switches

Controlled Switches

SRAM SRAM

L i L i L i L i Logic Logic Cell Cell Logic Logic Cell Cell

SRAM SRAM SRAM SRAM

Logic Logic Cell Cell Logic Logic Cell Cell

Sharif University of Technology

Slide Slide 12 12 of

  • f 52

52

Programmable ASICs

slide-13
SLIDE 13

Anti Anti-

  • fuse Technology

fuse Technology

Anti

Anti-

  • fuse: normally open circuit

fuse: normally open circuit

Programming current through it (about

Programming current through it (about 5 5 mA mA) ) g g g ( g g g ( )

causes a large power dissipation in a small area, which melts a thin

causes a large power dissipation in a small area, which melts a thin insulating dielectric between insulating dielectric between polysilicon polysilicon and diffusion electrodes and and diffusion electrodes and forms a thin permanent and resistive silicon link forms a thin permanent and resistive silicon link forms a thin, permanent, and resistive silicon link forms a thin, permanent, and resistive silicon link

The process cannot be reversed

The process cannot be reversed

– – OTP technology

OTP technology O tec

  • ogy

O tec

  • ogy

– – Radiation hard

Radiation hard

Modified CMOS process

Modified CMOS process

Actel

Actel-

  • 3

3 extra masks: extra masks:

1. 1.

n-

  • type anti

type anti-

  • fuse diffusion

fuse diffusion

2. 2.

anti anti-

  • fuse

fuse polysilicon polysilicon

3. 3.

thicker than normal gate oxide (for the high thicker than normal gate oxide (for the high-

  • voltage transistors

voltage transistors

Sharif University of Technology

Slide Slide 13 13 of

  • f 52

52

Programmable ASICs

g ( g g ( g g that handle that handle 18 18V to program the anti V to program the anti-

  • fuses)

fuses)

slide-14
SLIDE 14

Merits of using OTP FPGAs Merits of using OTP FPGAs

1. 1.

Anti Anti-

  • fuse exhibits less RC delay than pass

fuse exhibits less RC delay than pass-

  • transistor

transistor higher speed higher speed g p g p

2. 2.

Once programmed, interconnections inside the FPGA Once programmed, interconnections inside the FPGA are available immediately on power are available immediately on power-up up y p y p p

  • No time

No time-

  • delay to reload interconnection information from a

delay to reload interconnection information from a memory memory

  • No need for additional circ itr to ens re proper loading

No need for additional circ itr to ens re proper loading

  • No need for additional circuitry to ensure proper loading.

No need for additional circuitry to ensure proper loading.

Sharif University of Technology

Slide Slide 14 14 of

  • f 52

52

Programmable ASICs

slide-15
SLIDE 15

EPROM Transistor EPROM Transistor

With a high (>

With a high (>12 12V) programming voltage, V) programming voltage, VPP applied to the drain electrons gain VPP applied to the drain electrons gain VPP, applied to the drain, electrons gain VPP, applied to the drain, electrons gain enough energy to “jump” onto the floating enough energy to “jump” onto the floating gate (gate gate (gate1) gate (gate gate (gate1)

Electrons stuck on gate

Electrons stuck on gate1 1 raise the threshold raise the threshold lt th t th t i t i l ff lt th t th t i t i l ff voltage so that the transistor is always off voltage so that the transistor is always off for normal operating voltages for normal operating voltages

UV light provides enough energy for the

UV light provides enough energy for the l t t k t l t t k t 1 t “j ” b k t t “j ” b k t electrons stuck on gate electrons stuck on gate1 1 to “jump” back to to “jump” back to the bulk, allowing the transistor to operate the bulk, allowing the transistor to operate normally normally

Sharif University of Technology

Slide Slide 15 15 of

  • f 52

52

Programmable ASICs

normally normally

slide-16
SLIDE 16

EPROM Technology EPROM Technology gy gy

Used in both SPLD and CPLD

Used in both SPLD and CPLD devices devices

Transistor between two wires

Transistor between two wires i l t i d i l t i d AND f ti AND f ti implement wired implement wired-AND functions. AND functions.

An input to the AND plane can drive a product wire to LOW

An input to the AND plane can drive a product wire to LOW through an EPROM transistor, if that input is part of the through an EPROM transistor, if that input is part of the corresponding product term corresponding product term corresponding product term. corresponding product term.

For inputs not involved in a product term, the appropriate

For inputs not involved in a product term, the appropriate EPROM transistors are programmed as permanently off EPROM transistors are programmed as permanently off EPROM transistors are programmed as permanently off. EPROM transistors are programmed as permanently off.

EEPROM: program

EEPROM: program and erase electrically. and erase electrically.

Sharif University of Technology

Slide Slide 16 16 of

  • f 52

52

Programmable ASICs

slide-17
SLIDE 17

Characteristics of FPGA Technology Characteristics of FPGA Technology

Technology Technology Chip Area Chip Area RP RP Volatile Volatile Technology Technology

CMOS CMOS large large In In-

  • circuit

circuit Yes Yes Static RAM Static RAM CMOS CMOS+ Small, large programming Small, large programming transistor transistor No No No No PLICE Anti PLICE Anti-

  • Fuse

Fuse CMOS CMOS+ Small, large Small, large pr r mmi tr i t r pr r mmi tr i t r No No No No ViaLink Anti ViaLink Anti-

  • F

UVCMOS UVCMOS small small Out of circuit Out of circuit No No EPROM EPROM CMOS CMOS programming transistor programming transistor Fuse Fuse EECMOS EECMOS 2x EPROM x EPROM Out of circuit Out of circuit No No EEPROM EEPROM UVCMOS UVCMOS small small Out of circuit Out of circuit No No EPROM EPROM

Sharif University of Technology

Slide Slide 17 17 of

  • f 52

52

Programmable ASICs

slide-18
SLIDE 18

FPGA Architectures FPGA Architectures

Logic block:

Logic block:

How are functions implemented? fixed

How are functions implemented? fixed p functions or programmable? functions or programmable?

Support complex functions

Support complex functions need need f bl k b t bi l hip f bl k b t bi l hip fewer blocks, but bigger, so less on chip. fewer blocks, but bigger, so less on chip.

Support simple functions

Support simple functions need more need more blocks, but smaller so more on chip. blocks, but smaller so more on chip.

Interconnect

Interconnect:

How are logic blocks arranged?

How are logic blocks arranged?

How many wires will be needed between them?

How many wires will be needed between them?

Are wires evenly distributed across chip?

Are wires evenly distributed across chip?

Programmability slows wires down

Programmability slows wires down are some wires specialized to long are some wires specialized to long

Programmability slows wires down

Programmability slows wires down– are some wires specialized to long are some wires specialized to long distances? distances?

How many inputs/outputs must be routed to/from each logic block?

How many inputs/outputs must be routed to/from each logic block?

Sharif University of Technology

Slide Slide 18 18 of

  • f 52

52

What utilization are we willing to accept?

What utilization are we willing to accept? 50 50%? %? 20 20%? %? 90 90%? %?

Programmable ASICs

slide-19
SLIDE 19

Functional Units Functional Units

RAM blocks (Xilinx):

RAM blocks (Xilinx): implement function truth table implement function truth table

Multiplexers (

Multiplexers (Actel Actel): ): b ild B l f ti i b ild B l f ti i build Boolean functions using build Boolean functions using muxes muxes

Logic gates, flip

Logic gates, flip-

  • flops:

flops: Such as carry chains Used for high Such as carry chains Used for high Such as carry chains. Used for high Such as carry chains. Used for high- performance computations performance computations

Sharif University of Technology

Slide Slide 19 19 of

  • f 52

52

Programmable ASICs

slide-20
SLIDE 20

Programmable Switch Elements Programmable Switch Elements g

Used in connecting:

Used in connecting:

The I/O of functional units

The I/O of functional units to the wires to the wires

A horizontal wire to a

A horizontal wire to a vertical wire vertical wire vertical wire vertical wire

Two wire segments to form

Two wire segments to form

Two wire segments to form

Two wire segments to form a longer wire segment a longer wire segment

Sharif University of Technology

Slide Slide 20 20 of

  • f 52

52

Programmable ASICs

slide-21
SLIDE 21

Applications of FPGAs Applications of FPGAs

Implementation of random logic:

Implementation of random logic:

Easier changes at system

Easier changes at system-

  • level (one device is modified).

level (one device is modified).

Can eliminate need for full

Can eliminate need for full-

  • custom chips.

custom chips.

  • Prototyping

Prototyping

G t /b tt /f t d b i d th ith i l ti G t /b tt /f t d b i d th ith i l ti

Get more/better/faster debugging done than with simulation.

Get more/better/faster debugging done than with simulation.

  • Reconfigurable hardware:

Reconfigurable hardware:

One hardware block used to implement more than one function

One hardware block used to implement more than one function

One hardware block used to implement more than one function.

One hardware block used to implement more than one function.

Functions must be mutually

Functions must be mutually-

  • exclusive in time.

exclusive in time.

Can greatly reduce cost while enhancing flexibility.

Can greatly reduce cost while enhancing flexibility.

RAM

RAM-

  • based option

based option only.

  • nly.
  • Special

Special-

  • purpose computation engines:

purpose computation engines:

Hardware dedicated to solving one problem (or class of

Hardware dedicated to solving one problem (or class of

Hardware dedicated to solving one problem (or class of

Hardware dedicated to solving one problem (or class of problems). problems).

Accelerators attached to general

Accelerators attached to general-

  • purpose computers.

purpose computers.

Sharif University of Technology

Slide Slide 21 21 of

  • f 52

52

Programmable ASICs

slide-22
SLIDE 22

Anti Anti-

  • Fuse Based FPGAs

Fuse Based FPGAs

Actel Actel Actel Actel A t l A t l FPGA A hit t FPGA A hit t Actel Actel FPGA Architecture FPGA Architecture Logic Module Logic Module Interconnect Interconnect

slide-23
SLIDE 23

Actel Actel FPGA Architecture FPGA Architecture

Actel

Actel uses a uses a fine fine-

  • grain architecture

grain architecture; i.e., ; i.e., LMs are close LMs are close to the size of the base cell of an MGA to the size of the base cell of an MGA

Matched to small anti

Matched to small anti-

  • fuse programming technology

fuse programming technology

A simple LM reduces performance, but allows fast and robust

A simple LM reduces performance, but allows fast and robust place place-

  • and

and-

  • route

route

Allows you to use almost all (>

Allows you to use almost all (>90 90%) of the FPGA %) of the FPGA

Synthesis can map logic efficiently to a fine-grain architecture

Sharif University of Technology

Slide Slide 23 23 of

  • f 52

52

Synthesis can map logic efficiently to a fine-grain architecture

Programmable ASICs

slide-24
SLIDE 24

ACT ACT 1 1 Logic Module Logic Module g

The ACT architecture: The ACT architecture:

(a) Organization of the basic (a) Organization of the basic (a) Organization of the basic (a) Organization of the basic logic cells (LM) logic cells (LM) (b) The ACT (b) The ACT 1 1 Logic Module. Logic Module. The ACT The ACT 1 family uses just family uses just The ACT The ACT 1 1 family uses just family uses just

  • ne type of LM. ACT
  • ne type of LM. ACT 2

2 and and ACT ACT 3 3 FPGA families both FPGA families both use two different types of LM use two different types of LM

(c) An example LM implementation using pass transistors (without any (c) An example LM implementation using pass transistors (without any buffering). buffering). F = A · B + B' · C + D F = A · B + B' · C + D (d) An example logic macro. Connect logic signals to some or all of the LM (d) An example logic macro. Connect logic signals to some or all of the LM inputs, the remaining inputs to VDD or GND inputs, the remaining inputs to VDD or GND inputs, the remaining inputs to VDD or GND inputs, the remaining inputs to VDD or GND

Sharif University of Technology

Slide Slide 24 24 of

  • f 52

52

Programmable ASICs

slide-25
SLIDE 25

Shannon’s Expansion Theorem Shannon’s Expansion Theorem

Use the

Use the Shannon expansion theorem Shannon expansion theorem to to expand expand F with F with respect to ( respect to (wrt wrt) a variable (A): ) a variable (A): F =A·F| F =A·F|(A='

(A='1') ') + A'·F|

+ A'·F|(A=‘

(A=‘0') ')

p ( p ( ) ( ) ) ( ) |(A=

(A= 1 )

|(A=

(A= 0 )

Example: F =A'·B + A·B·C' + A'·B'·C = A·(B·C') + A'·(B + B'·C)

Example: F =A'·B + A·B·C' + A'·B'·C = A·(B·C') + A'·(B + B'·C)

F|

F|(A='

(A='1 1') ')=B·C' is the

=B·C' is the cofactor cofactor of F

  • f F wrt

wrt A, or F A, or FA |(

) A

If we expand F

If we expand F wrt wrt B: F =B·(A' + A·C') + B'·(A'·C) B: F =B·(A' + A·C') + B'·(A'·C)

Eventually we reach the unique

Eventually we reach the unique canonical form canonical form, which , which y q y q , uses only uses only minterms minterms: F = C·(A'·B+A'·B') + C'·(A·B+A'·B) : F = C·(A'·B+A'·B') + C'·(A·B+A'·B)

Sharif University of Technology

Slide Slide 25 25 of

  • f 52

52

Programmable ASICs

slide-26
SLIDE 26

Shannon’s Expansion Theorem (cont’d) Shannon’s Expansion Theorem (cont’d)

Another example: F=(A·B) + (B'·C) + D

Another example: F=(A·B) + (B'·C) + D

Expand F wrt B: F=B·(A + D) + B'·(C + D) =B·F2 + B'·F1 Expand F wrt B: F B (A D) B (C D) B F2 B F1 F: a 2:1 MUX, B selecting between 2 inputs: F |

|(B='

(B='1 1') ') and F |

|(B=‘

(B=‘0 0') ')

– F also describes the output of the ACT 1 LM Now we need to split up F1 and F2 Expand F2 wrt A, and F1 wrt C:

F2=A + D =(A 1) + (A' D);

– F2=A + D =(A·1) + (A'·D); – F1=C + D =(C·1) + (C'·D) A, B, C connect to the select lines and '1' and D are the inputs of

, , p the MUXes in the ACT 1 LM

Connections: A0=D, A1='1', B0=D, B1='1', SA=C, SB=A, S0='0',

and S1=B and S1=B

Sharif University of Technology

Slide Slide 26 26 of

  • f 52

52

Programmable ASICs

slide-27
SLIDE 27

Multiplexer Logic as Function Generators Multiplexer Logic as Function Generators p g p g

The

The 16 16 logic functions of logic functions of 2 2 variables: variables:

There are

There are 10 10 functions that we can functions that we can implement using just implement using just

  • ne
  • ne 2:1

1 MUX MUX

  • ne
  • ne 2:1

1 MUX MUX

6

6 functions are functions are useful: INV, BUF, useful: INV, BUF, , , , , AND, OR, AND AND, OR, AND1 1-

  • 1,

, NOR NOR1 1-

  • 1

1

Sharif University of Technology

Slide Slide 27 27 of

  • f 52

52

Programmable ASICs

slide-28
SLIDE 28

ACT ACT1 1 LM as a Boolean Function Generator LM as a Boolean Function Generator

(a) A (a) A 2:1 MUX viewed as a function wheel MUX viewed as a function wheel (a) A (a) A 2:1 1 MUX viewed as a function wheel MUX viewed as a function wheel (b) The ACT (b) The ACT1 1 LM is two function wheels, an OR gate, and a LM is two function wheels, an OR gate, and a 2 2: :1 1 MUX MUX

A

A 2 2: :1 1 MUX is a function wheel that can generate MUX is a function wheel that can generate BUF,

BUF,

g

INV, AND INV, AND-

  • 11

11, AND , AND1 1-

  • 1

1, OR, AND , OR, AND

WHEEL(A, B) =MUX(A

WHEEL(A, B) =MUX(A0 0, A , A1 1, SA) , SA)

Each of the inputs (A

Each of the inputs (A0, A , A1, and SA) may be A,B,' , and SA) may be A,B,'0 0',or ' ',or '1 1' '

ACT

ACT 1 1 LM =MUX [WHEEL LM =MUX [WHEEL1 1, WHEEL , WHEEL2 2, OR(S , OR(S0 0, S , S1 1)] )]

Sharif University of Technology

Slide Slide 28 28 of

  • f 52

52

Programmable ASICs

slide-29
SLIDE 29

ACT ACT 2 2 and ACT and ACT 3 3 Logic Modules Logic Modules g

ACT

ACT1 1 requires requires 2 2 LMs per LMs per FF: with unknown FF: with unknown interconnect capacitance interconnect capacitance ACT ACT 2 d ACT d ACT 3

ACT

ACT 2 2 and ACT and ACT 3 3 use use two types of LMs: two types of LMs:

ACT

ACT 2 C-Module Module

ACT

ACT 2 2 C-Module Module (combinational) (combinational) is similar to is similar to the ACT the ACT 1 1 LM, but can LM, but can implement five implement five input logic input logic implement five implement five-input logic input logic functions functions

ACT

ACT 2 2 S-

  • Module

Module (sequential module) (sequential module) contains a C contains a C-

  • Module and a

Module and a sequential element sequential element

Sharif University of Technology

Slide Slide 29 29 of

  • f 52

52

q

Programmable ASICs

slide-30
SLIDE 30

Interconnect Interconnect

Anti

Anti-

  • fuses join wire

fuses join wire segments segments within within each channel into each channel into wire segments wire segments

H i t l t H i t l t

  • Horizontal segments vary

Horizontal segments vary in length from four in length from four columns of LMs to the columns of LMs to the entire row of modules entire row of modules

If th L i M d l t th d f t i l th t If th L i M d l t th d f t i l th t

entire row of modules entire row of modules (long lines) (long lines)

If the Logic Module at the end of a net is less than two rows away

If the Logic Module at the end of a net is less than two rows away from the driver module, a connection requires two anti from the driver module, a connection requires two anti-

  • fuses, a

fuses, a vertical track, and two horizontal segments vertical track, and two horizontal segments

If the modules are more than two rows apart, a connection

If the modules are more than two rows apart, a connection requires a long vertical track together with another vertical track requires a long vertical track together with another vertical track (the output stub) and two horizontal tracks, with (the output stub) and two horizontal tracks, with 4 anti anti-fuses. fuses.

Sharif University of Technology

Slide Slide 30 30 of

  • f 52

52

(the output stub) and two horizontal tracks, with (the output stub) and two horizontal tracks, with 4 4 anti anti fuses. fuses.

Programmable ASICs

slide-31
SLIDE 31

Routing Channels Routing Channels g

Fixed channel widths (tracks)

Fixed channel widths (tracks)

Channel

Channel -

  • > track

> track -

  • > segment

> segment

Segment length?

Segment length?

Long: carry the signal longer,

Long: carry the signal longer, less “concatenation” switches, less “concatenation” switches, but might waste track but might waste track but might waste track but might waste track

Short: local connections, slow

Short: local connections, slow for longer connections for longer connections

Sharif University of Technology

Slide Slide 31 31 of

  • f 52

52

Programmable ASICs

slide-32
SLIDE 32

Switch Boxes Switch Boxes

Ideally, provide switches for all

Ideally, provide switches for all possible connections possible connections

Trade

Trade-

  • off:
  • ff:

Too many switches:

Too many switches:

Too many switches:

Too many switches:

– – Large area

Large area

– – Complex to program

Complex to program

Too few switches:

Too few switches:

– – Cannot route signals

Cannot route signals

One possible solution:

One possible solution:

Sharif University of Technology

Slide Slide 32 32 of

  • f 52

52

Programmable ASICs

slide-33
SLIDE 33

RC Delay in RC Delay in Antifuse Antifuse Connections Connections y

A four

A four-antifuse antifuse connection connection

L0

0: output stub : output stub

L1

1 and L and L3 3: horizontal tracks : horizontal tracks

L2

2: long vertical track (LVT) : long vertical track (LVT)

L4

4: input stub : input stub

A tif A tif d l d b i t h i t t d l d b i t h i t t

Antifuse

Antifuse modeled by a resistance, each interconnect modeled by a resistance, each interconnect segment modeled by a capacitance. segment modeled by a capacitance.

Sharif University of Technology

Slide Slide 33 33 of

  • f 52

52

Programmable ASICs

slide-34
SLIDE 34

SRAM Based FPGAs SRAM Based FPGAs

Xilinx Xilinx Xilinx Xilinx FPGA A hit t FPGA A hit t FPGA Architecture FPGA Architecture

  • Configurable Logic Block (CLB)

Configurable Logic Block (CLB)

  • Interconnect

Interconnect

  • Interconnect

Interconnect

  • Multipliers and DSP Blocks

Multipliers and DSP Blocks

  • On

On-

  • Chip RAM

Chip RAM p

slide-35
SLIDE 35

Xilinx FPGA Structure Xilinx FPGA Structure

Fixed arrays of logical function cells (CLBs)

Fixed arrays of logical function cells (CLBs) connectable by a system of pass transistors driven by connectable by a system of pass transistors driven by static RAM cells static RAM cells static RAM cells static RAM cells

Sharif University of Technology

Slide Slide 35 35 of

  • f 52

52

Programmable ASICs

slide-36
SLIDE 36

Virtex Virtex II II

0.

.13 13 μ, μ, 8 8-

  • layer metal CMOS process

layer metal CMOS process

Cu power distribution and interconnect

Cu power distribution and interconnect

Cu power distribution and interconnect

Cu power distribution and interconnect

Up to

Up to 10 10 million system gates million system gates

>100

100 000 000 LUTs and LUTs and

>100

100,000 000 LUTs and LUTs and flip flip-

  • flops

flops

>1000

1000 BlockRAMs BlockRAMs and multipliers and multipliers

>200

200 MHz clock rate, MHz clock rate, multi multi-Gbps serial I/O Gbps serial I/O multi multi Gbps serial I/O Gbps serial I/O

On

On-

  • chip PowerPC

chip PowerPC with cache with cache

3

3 Gigabit Serial I/O Gigabit Serial I/O

Sharif University of Technology

Slide Slide 36 36 of

  • f 52

52

Programmable ASICs

slide-37
SLIDE 37

Simplified CLB Structure Simplified CLB Structure p

Two slices in each

Two slices in each Virtex Virtex CLB CLB 4 slices in slices in Virtex Virtex CLB, CLB, 4 4 slices in slices in each each Virtex Virtex-

  • II CLB

II CLB

Two buffers

Two buffers

Two buffers

Two buffers associated with each associated with each CLB, accessible by all CLB, accessible by all CLB outputs CLB outputs CLB outputs CLB outputs

Fast dedicated carry

Fast dedicated carry logic runs vertically up logic runs vertically up

CLB ti d l i fi d (LUT ti ) d CLB ti d l i fi d (LUT ti ) d

CLB propagation delay is fixed (LUT access time) and

CLB propagation delay is fixed (LUT access time) and independent of the logic function independent of the logic function

Sharif University of Technology

Slide Slide 37 37 of

  • f 52

52

Programmable ASICs

slide-38
SLIDE 38

Shift Register LUT Shift Register LUT g

Dynamically addressable

Dynamically addressable hift i t (SRL) hift i t (SRL) shift register (SRL) shift register (SRL)

Ultra

Ultra-

  • efficient programmable

efficient programmable delay for balancing pipelined delay for balancing pipelined delay for balancing pipelined delay for balancing pipelined designs designs

Can also be used for simple

Can also be used for simple FIFO FIFO FIFOs FIFOs

Maximum delay of

Maximum delay of 16 16 clock clock cycles in one LUT, up to cycles in one LUT, up to 128 128 y , p y , p in one CLB in one CLB

Can be read asynchronously

Can be read asynchronously by toggling address lines by toggling address lines by toggling address lines by toggling address lines

Sharif University of Technology

Slide Slide 38 38 of

  • f 52

52

Programmable ASICs

slide-39
SLIDE 39

SRL SRL16 16 Applications Applications

1...

...16 16-

  • bit shift register in one LUT

bit shift register in one LUT

Up to

Up to 128 128 bits in one bits in one Virtex Virtex II CLB II CLB

Up to

Up to 128 128 bits in one bits in one Virtex Virtex II CLB II CLB

FIFO, pseudo

FIFO, pseudo-

  • random number generator (LFSR)

random number generator (LFSR)

Pulse generator and clock divider

Pulse generator and clock divider Pulse generator and clock divider Pulse generator and clock divider

Pattern generator, state machine

Pattern generator, state machine

… Website:

Website:

http://support.xilinx.com/support/techxclusives/SRL http://support.xilinx.com/support/techxclusives/SRL16 16-echxclusive echxclusive2.htm .htm p pp pp p pp pp

Sharif University of Technology

Slide Slide 39 39 of

  • f 52

52

Programmable ASICs

slide-40
SLIDE 40

Dedicated Fast Carry Dedicated Fast Carry

64

64-

  • bit adders would require

bit adders would require 128 128 levels of logic levels of logic

Expensive complex carry schemes would be needed to preserve

Expensive complex carry schemes would be needed to preserve

Expensive complex carry schemes would be needed to preserve

Expensive complex carry schemes would be needed to preserve performance without using Carry Logic performance without using Carry Logic

Virtex

Virtex minimizes the carry propagation delay minimizes the carry propagation delay

<50

50 ps ps per bit, includes routing per bit, includes routing

Fast adders, accumulators, and counters

Fast adders, accumulators, and counters

24

24-

  • bit operation at up to

bit operation at up to 300 300 MHz in MHz in Virtex Virtex-

  • II

II

64

64-

  • bit operation at up to

bit operation at up to 190 190 MHz in MHz in Virtex Virtex-

  • II

II

Fully synchronous operation Fully synchronous operation

Fully synchronous operation

Fully synchronous operation

Same speed for add/subtract, accumulate, or count

Same speed for add/subtract, accumulate, or count

Sharif University of Technology

Slide Slide 40 40 of

  • f 52

52

Programmable ASICs

slide-41
SLIDE 41

Interconnect Interconnect

Vertical lines and horizontal lines run between CLBs General

General-

  • purpose interconnect joins switch boxes

purpose interconnect joins switch boxes

Long lines run across the entire chip.

Long lines run across the entire chip.

PIPs are programmable pass transistors that connect the CLB

PIPs are programmable pass transistors that connect the CLB inputs and outputs to the routing network inputs and outputs to the routing network

Sharif University of Technology

Slide Slide 41 41 of

  • f 52

52

inputs and outputs to the routing network inputs and outputs to the routing network

Programmable ASICs

slide-42
SLIDE 42

Interconnect Delay Interconnect Delay

Sharif University of Technology

Slide Slide 42 42 of

  • f 52

52

Programmable ASICs

slide-43
SLIDE 43

Multipliers and DSP Blocks Multipliers and DSP Blocks p

Multiplying

Multiplying 2 2 numbers: extremely numbers: extremely

4-

  • Bit by

Bit by 4 4-

  • Bit Multiplier

Bit Multiplier

resource resource-

  • intensive and complex to

intensive and complex to implement in digital circuitry implement in digital circuitry

More than

More than 2000 2000 operations for a single

  • perations for a single

More than

More than 2000 2000 operations for a single

  • perations for a single

32 32-

  • bit numbers multiply

bit numbers multiply

FPGAs have prebuilt multipliers to save

FPGAs have prebuilt multipliers to save

  • n LUT and FF usage
  • n LUT and FF usage

DSP

DSP48 48 slices, integrate a slices, integrate a 25 25-

  • bit by

bit by 18 18-

  • bit

bit multiplier with adder circuitry (MAC) multiplier with adder circuitry (MAC) multiplier with adder circuitry (MAC) multiplier with adder circuitry (MAC)

Virtex-II 1000 Virtex-II 3000 Spartan-3 1000 Spartan-3 2000 Virtex-5 LX30 Virtex-5 LX50 Virtex-5 LX85 1000 3000 1000 2000 LX30 LX50 LX85 #Multiplier s 40 96 24 40 32 48 48 Type 18x18 18x18 18x18 18x18 DSP48 DSP48 DSP48

Sharif University of Technology

Slide Slide 43 43 of

  • f 52

52

Programmable ASICs

Type 18x18 18x18 18x18 18x18 Slices Slices Slices

slide-44
SLIDE 44

On On-

  • Chip RAM

Chip RAM

Can implement datasets as arrays using flip

Can implement datasets as arrays using flip-

  • flops

flops

Large arrays become expensive for FPGA logic resources

Large arrays become expensive for FPGA logic resources

Large arrays become expensive for FPGA logic resources

Large arrays become expensive for FPGA logic resources

Embedded block RAM: useful for storing datasets or

Embedded block RAM: useful for storing datasets or passing values between parallel loops passing values between parallel loops p g p p p g p p

A 100

A 100-

  • element array of 32

element array of 32-

  • bit numbers:

bit numbers:

– – > 30% of FFs in a

> 30% of FFs in a Virtex Virtex-

  • II 1000 FPGA

II 1000 FPGA

– – Less than 1 percent of the embedded block RAM

Less than 1 percent of the embedded block RAM

Virtex-II 3000 Virtex-II 1000 Spartan

  • 3 1000

Spartan

  • 3 2000

Virtex-5 LX30 Virtex-5 LX50 Virtex-5 LX85 Total RAM (kbit ) 1728 720 432 720 1152 1728 3456 (kbits) 1728 720 432 720 1152 1728 3456 Blocks Size (kbits) 16 16 16 16 36 36 36

Sharif University of Technology

Slide Slide 44 44 of

  • f 52

52

Programmable ASICs

slide-45
SLIDE 45

Simplified I/O Block Structure Simplified I/O Block Structure p

Fast I/O drivers

Fast I/O drivers

Separate registers for input,

Separate registers for input,

  • utput,
  • utput, 3

3-

  • state control

state control

/S /S

Async

Async/Sync set or reset /Sync set or reset

Common clock and separate

Common clock and separate clock enables improve usability clock enables improve usability c oc e ab es p o e usab y c oc e ab es p o e usab y

Configure as FF or latch

Configure as FF or latch

Programmable slew rate and

Programmable slew rate and adjustable input delay adjustable input delay

Selectable I/O standards

Selectable I/O standards

Output drive, input threshold

Output drive, input threshold

Sharif University of Technology

Slide Slide 45 45 of

  • f 52

52

Programmable ASICs

slide-46
SLIDE 46

FPGA Resources for Various Families FPGA Resources for Various Families

Virtex-II 1000 Virtex-II 3000 Spartan-3 1000 Spartan- 3 Virtex-5 LX30 Virtex-5 LX50 Virtex-5 LX85 1000 3000 1000 3 2000 LX30 LX50 LX85 Gates 1 million 3 million 1 million 2 million

  • Flip-Flops

10,240 28,672 15,360 40,960 19,200 28,800 51,840 LUTs 10 240 28 672 15 360 40 960 19 200 28 800 51 840 LUTs 10,240 28,672 15,360 40,960 19,200 28,800 51,840 Multipliers 40 96 24 40 32 48 48

Even hardwired processor cores

Even hardwired processor cores

Block RAM (kb) 720 1,728 432 720 1,152 1,728 3,456

Even hardwired processor cores

Even hardwired processor cores

2 PowerPC cores on Xilinx Virtex

2 PowerPC cores on Xilinx Virtex-

  • 4

4

Ethernet

Ethernet MACs PLLs bit MACs PLLs bit-file encryption ADC/DAC etc file encryption ADC/DAC etc

Sharif University of Technology

Slide Slide 46 46 of

  • f 52

52

Ethernet

Ethernet MACs, PLLs, bit MACs, PLLs, bit-file encryption, ADC/DAC, etc. file encryption, ADC/DAC, etc.

Programmable ASICs

slide-47
SLIDE 47

FPGA to ASIC Conversion FPGA to ASIC Conversion

Conversion benefits:

Conversion benefits:

Multi

Multi-chip integration. chip integration.

Multi

Multi chip integration. chip integration.

Reliable high

Reliable high-

  • volume production capacity.

volume production capacity.

Lower cost.

Lower cost.

Greater choice of output buffer types and strengths.

Greater choice of output buffer types and strengths.

Greater pin

Greater pin-

  • out flexibility, might even be able to reduce the pin
  • ut flexibility, might even be able to reduce the pin

count to less than that of the PLD or FPGA count to less than that of the PLD or FPGA count to less than that of the PLD or FPGA. count to less than that of the PLD or FPGA.

More robust routing resources.

More robust routing resources.

ASICs usually deliver higher performance and lower power.

ASICs usually deliver higher performance and lower power.

Sharif University of Technology

Slide Slide 47 47 of

  • f 52

52

Programmable ASICs

slide-48
SLIDE 48

FPGA to ASIC Conversion (cont’d) FPGA to ASIC Conversion (cont’d)

Risks:

Risks:

Cannot use some specialized circuits.

Cannot use some specialized circuits.

Cannot use some specialized circuits.

Cannot use some specialized circuits.

– – RAM initialization and configuration logic are expensive to

RAM initialization and configuration logic are expensive to implement in an ASIC implement in an ASIC f

Need to redesign the system board if the replacement chip

Need to redesign the system board if the replacement chip doesn't exactly mimic the original FPGA characteristics. doesn't exactly mimic the original FPGA characteristics.

FPGA design may contain asynchronous components, which are

FPGA design may contain asynchronous components, which are G des g ay co a asy c

  • ous co

po e s, c a e G des g ay co a asy c

  • ous co

po e s, c a e unlikely to work in an ASIC migration. unlikely to work in an ASIC migration.

  • Functional verification is often incomplete for FPGA designs.

Functional verification is often incomplete for FPGA designs.

Sharif University of Technology

Slide Slide 48 48 of

  • f 52

52

Programmable ASICs

slide-49
SLIDE 49

Conversion Flow Conversion Flow

FPGA

FPGA netlist netlist: : Verilog Verilog, , EDIF, VHDL, or XNF EDIF, VHDL, or XNF

Netlist

Netlist translation: Each translation: Each l i f ti i th l i f ti i th tli t tli t logic function in the logic function in the netlist netlist is mapped into an ASIC is mapped into an ASIC equivalent equivalent equivalent. equivalent.

Design analysis: boundary

Design analysis: boundary scan, power analysis, scan, power analysis, , p y , , p y , DRC, pin DRC, pin-

  • pad selection,

pad selection, and package finalization. and package finalization.

Sharif University of Technology

Slide Slide 49 49 of

  • f 52

52

Programmable ASICs

slide-50
SLIDE 50

Conversion Flow (cont’d) Conversion Flow (cont’d)

Static Timing Analysis:

Static Timing Analysis:

Exhaustive timing path analysis reporting chip

Exhaustive timing path analysis reporting chip-level clock level clock-to to-out

  • ut

Exhaustive timing path analysis reporting chip

Exhaustive timing path analysis reporting chip level clock level clock to to out

  • ut

times, register times, register-

  • to

to-

  • register delays and set

register delays and set-

  • up and hold

up and hold-

  • time

time requirements. requirements.

T t B h G ti T t B h G ti

Test Bench Generation

Test Bench Generation

ASIC devices are fully customized and require a custom test

ASIC devices are fully customized and require a custom test program. program. program. program.

  • If fault coverage improvement is necessary, use ATPG, or insert

If fault coverage improvement is necessary, use ATPG, or insert an internal scan chain and combine it with ATPG. an internal scan chain and combine it with ATPG.

Sharif University of Technology

Slide Slide 50 50 of

  • f 52

52

Programmable ASICs

slide-51
SLIDE 51

Designing with Conversion in Mind Designing with Conversion in Mind

Use synchronous circuits when possible.

Use synchronous circuits when possible.

  • Use one external clock for the chip

Use one external clock for the chip

  • Use one external clock for the chip.

Use one external clock for the chip.

Don't gate it or generate additional internal clocks from it.

Don't gate it or generate additional internal clocks from it.

Use the FFs that the FPD vendor provides

Use the FFs that the FPD vendor provides Use the FFs that the FPD vendor provides Use the FFs that the FPD vendor provides

Don't synthesize additional registers or latches from random

Don't synthesize additional registers or latches from random logic gates. logic gates.

  • Consider all possible values in decoding logic and state

Consider all possible values in decoding logic and state machine and provide a path from each unused option to machine and provide a path from each unused option to a known initial state a known initial state a known initial state. a known initial state.

For fault

For fault-

  • tolerant or otherwise

tolerant or otherwise-

  • redundant circuits, let your

redundant circuits, let your conversion vendor know conversion vendor know conversion vendor know. conversion vendor know.

Otherwise, the

Otherwise, the netlist netlist conversion may automatically eliminate conversion may automatically eliminate extra logic. extra logic.

Sharif University of Technology

Slide Slide 51 51 of

  • f 52

52

Programmable ASICs

slide-52
SLIDE 52

Designing with Conversion in Mind (cont’d) Designing with Conversion in Mind (cont’d)

Use VHDL or

Use VHDL or Verilog Verilog instead of schematics or another instead of schematics or another programmable programmable-

  • logic

logic-

  • centric alternative.

centric alternative. p g p g g

Use standards as much as possible.

Use standards as much as possible.

Don't rely on any proprietary circuits.

Don't rely on any proprietary circuits. y y p p y y y p p y

Simulate your FPGA design, to have a base set of

Simulate your FPGA design, to have a base set of functional test vectors to run against the ASIC conversion. functional test vectors to run against the ASIC conversion.

Consider prototyping with a fine

Consider prototyping with a fine-

  • grained programmable

grained programmable-

  • logic architecture with an FPGA that provides abundant

logic architecture with an FPGA that provides abundant ti ti routing resources. routing resources.

Most closely mimics the final NAND gate

Most closely mimics the final NAND gate-

  • based ASIC.

based ASIC.

Sharif University of Technology

Slide Slide 52 52 of

  • f 52

52

Programmable ASICs