SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture - - PowerPoint PPT Presentation
SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture - - PowerPoint PPT Presentation
SoC Design SoC Design g Lecture 4: Programmable ASICs L Lecture 4: Programmable ASICs L 4 P 4 P bl ASIC bl ASIC Shaahin Hessabi Shaahin Hessabi Department of Computer Engineering Department of Computer Engineering Sharif University of
Programmability Comparison Programmability Comparison g y p g y p
Processors
Processors
All programmability in the program
All programmability in the program (instructions) stored in memory (instructions) stored in memory
ASICs
ASICs
ASICs
ASICs
No programmability
No programmability
FPGAs
FPGAs
FPGAs
FPGAs
Device
Device-
- wide (re)programmability
wide (re)programmability
Sharif University of Technology
Slide Slide 2 2 of
- f 52
52
Programmable ASICs
Programmable Logic Devices Programmable Logic Devices
- g
b e
- g c
ev ces
- g
b e
- g c
ev ces
PLA PLA PROM PROM PAL PAL
Sharif University of Technology
Slide Slide 3 3 of
- f 52
52
Programmable ASICs
fixed connection fixed connection programmable connection programmable connection
How to Program PLDs? How to Program PLDs? g
+5V
AND Plane
. . .
Inputs OR
.
OR Plane
. . .
+5V
Outputs
PLA NOR structure PLA NOR structure (one plane shown) (one plane shown)
Sharif University of Technology
Slide Slide 4 4 of
- f 52
52
Programmable ASICs
How to expand PLD architecture? How to expand PLD architecture?
Increase # of inputs/outputs in a conventional PLD?
Increase # of inputs/outputs in a conventional PLD?
Problems:
Problems:
Problems:
Problems:
n times the number of inputs and outputs requires n
n times the number of inputs and outputs requires n2 as much as much chip area chip area too costly too costly
logic gets slower as number of inputs to AND array increases
logic gets slower as number of inputs to AND array increases
Solution: multiple PLDs (i.e., CPLD) with a relatively
Solution: multiple PLDs (i.e., CPLD) with a relatively ll bl i t t ll bl i t t small programmable interconnect. small programmable interconnect.
Less general than a single large PLD, but can use software
Less general than a single large PLD, but can use software “fitter” to partition into smaller PLD blocks. “fitter” to partition into smaller PLD blocks. fitter to partition into smaller PLD blocks. fitter to partition into smaller PLD blocks.
Sharif University of Technology
Slide Slide 5 5 of
- f 52
52
Programmable ASICs
CPLD vs. FPGA CPLD vs. FPGA
CPLD architecture:
CPLD architecture:
Small number of large PLDs on a single
Small number of large PLDs on a single
Small number of large PLDs on a single
Small number of large PLDs on a single chip chip
Programmable interconnect between
Programmable interconnect between PLDs PLDs
FPGA architecture:
FPGA architecture:
FPGA architecture:
FPGA architecture:
Much larger number of smaller
Much larger number of smaller programmable logic blocks programmable logic blocks
Embedded in a sea of lots of
Embedded in a sea of lots of programmable interconnects programmable interconnects
Sharif University of Technology
Slide Slide 6 6 of
- f 52
52
Programmable ASICs
Benefits of FPGAs over ASICs and Processors Benefits of FPGAs over ASICs and Processors
Processors
Processors
Slow
Slow
Power hungry
Power hungry
ASIC ASIC
ASICs
ASICs
Very expensive
Very expensive
Long production cycles
Long production cycles
Long production cycles
Long production cycles
Upgradeability a major problem
Upgradeability a major problem
FPGAs
FPGAs
Ideal case: combine the best sides of
Ideal case: combine the best sides of hardware and software… hardware and software…
…unfortunately ideal cases rarely exist!
…unfortunately ideal cases rarely exist!
Sharif University of Technology
Slide Slide 7 7 of
- f 52
52
Programmable ASICs
FPGAs FPGAs
FPGAs are closer to “programmable ASICs”
FPGAs are closer to “programmable ASICs” --
- - large
large emphasis on interconnection routing emphasis on interconnection routing p g p g
Timing is difficult to predict
Timing is difficult to predict --
- - multiple hops vs. the fixed delay of
multiple hops vs. the fixed delay of a CPLD’s switch matrix. a CPLD’s switch matrix. “
But more “scalable” to large sizes.
But more “scalable” to large sizes.
FPGA programmable logic blocks have only a few inputs
FPGA programmable logic blocks have only a few inputs and and 1 1 or
- r 2
2 flip flip flops but there are a lot more of them flops but there are a lot more of them and and 1 1 or
- r 2
2 flip flip-flops, but there are a lot more of them flops, but there are a lot more of them compared to the number of compared to the number of macrocells macrocells in a CPLD. in a CPLD.
Key question:
Key question:
Key question:
Key question:
How to make logic blocks programmable?
How to make logic blocks programmable?
How to connect the wires?
How to connect the wires?
after after the chip has been fabricated? the chip has been fabricated?
Sharif University of Technology
Slide Slide 8 8 of
- f 52
52
Programmable ASICs
FPGA Technologies FPGA Technologies
Static RAM Cells:
Static RAM Cells:
The programmable connections are made using pass transistors, The programmable connections are made using pass transistors,
transmission gates, or multiplexers that are controlled by SRAM cells.
Advantage: allows fast in circuit reconfiguration Advantage: allows fast in-circuit reconfiguration. Disadvantage: size of the chip required by the RAM technology.
Anti
Anti-fuse: fuse:
Anti
Anti fuse: fuse:
Anti-fuse resides in a high-impedance state. Can be programmed
into low-impedance or "fused" state.
Less expensive than the RAM technology. One-Time Programmable (OTP)
EPROM/EEPROM t i t EPROM/EEPROM t i t
EPROM/EEPROM transistors:
EPROM/EEPROM transistors:
Can be reprogrammed without external storage of configuration. EPROM transistors cannot be re-programmed in-circuit
Sharif University of Technology
Slide Slide 9 9 of
- f 52
52
EPROM transistors cannot be re-programmed in-circuit.
Programmable ASICs
SRAM SRAM
- Static RAM cells are used for three purposes:
Static RAM cells are used for three purposes:
1
As lookup tables (LUTs) for implementing logic As lookup tables (LUTs) for implementing logic
1. 1.
As lookup tables (LUTs) for implementing logic. As lookup tables (LUTs) for implementing logic.
2. 2.
As embedded RAM blocks (for buffer storage, etc.). As embedded RAM blocks (for buffer storage, etc.).
3
As control to routing and configuration switches As control to routing and configuration switches
3. 3.
As control to routing and configuration switches. As control to routing and configuration switches.
- Advantages:
Advantages:
- Easily changeable (even dynamic reconfiguration)
Easily changeable (even dynamic reconfiguration)
- Easily changeable (even dynamic reconfiguration)
Easily changeable (even dynamic reconfiguration)
- Good density
Good density
- Track latest SRAM technology (moving even faster than
Track latest SRAM technology (moving even faster than technology for logic) technology for logic)
- Flexible
Flexible – – not only good for FSM, also good for arithmetic circuits not only good for FSM, also good for arithmetic circuits
Disad antages Disad antages
- Disadvantages:
Disadvantages:
- Volatile
Volatile
- Generally high power
Generally high power
Sharif University of Technology
Slide Slide 10 10 of
- f 52
52
- Generally high power
Generally high power
Programmable ASICs
SRAM Programming Technology SRAM Programming Technology g g gy g g gy
The pass gate: making a connection between two wire
The pass gate: making a connection between two wire segments segments
The multiplexer: connecting the state of the SRAM cells
The multiplexer: connecting the state of the SRAM cells to the select lines to the select lines
Sharif University of Technology
Slide Slide 11 11 of
- f 52
52
Programmable ASICs
SRAM SRAM-
- Controlled Switches
Controlled Switches
SRAM SRAM
L i L i L i L i Logic Logic Cell Cell Logic Logic Cell Cell
SRAM SRAM SRAM SRAM
Logic Logic Cell Cell Logic Logic Cell Cell
Sharif University of Technology
Slide Slide 12 12 of
- f 52
52
Programmable ASICs
Anti Anti-
- fuse Technology
fuse Technology
Anti
Anti-
- fuse: normally open circuit
fuse: normally open circuit
Programming current through it (about
Programming current through it (about 5 5 mA mA) ) g g g ( g g g ( )
causes a large power dissipation in a small area, which melts a thin
causes a large power dissipation in a small area, which melts a thin insulating dielectric between insulating dielectric between polysilicon polysilicon and diffusion electrodes and and diffusion electrodes and forms a thin permanent and resistive silicon link forms a thin permanent and resistive silicon link forms a thin, permanent, and resistive silicon link forms a thin, permanent, and resistive silicon link
The process cannot be reversed
The process cannot be reversed
– – OTP technology
OTP technology O tec
- ogy
O tec
- ogy
– – Radiation hard
Radiation hard
Modified CMOS process
Modified CMOS process
Actel
Actel-
- 3
3 extra masks: extra masks:
1. 1.
n-
- type anti
type anti-
- fuse diffusion
fuse diffusion
2. 2.
anti anti-
- fuse
fuse polysilicon polysilicon
3. 3.
thicker than normal gate oxide (for the high thicker than normal gate oxide (for the high-
- voltage transistors
voltage transistors
Sharif University of Technology
Slide Slide 13 13 of
- f 52
52
Programmable ASICs
g ( g g ( g g that handle that handle 18 18V to program the anti V to program the anti-
- fuses)
fuses)
Merits of using OTP FPGAs Merits of using OTP FPGAs
1. 1.
Anti Anti-
- fuse exhibits less RC delay than pass
fuse exhibits less RC delay than pass-
- transistor
transistor higher speed higher speed g p g p
2. 2.
Once programmed, interconnections inside the FPGA Once programmed, interconnections inside the FPGA are available immediately on power are available immediately on power-up up y p y p p
- No time
No time-
- delay to reload interconnection information from a
delay to reload interconnection information from a memory memory
- No need for additional circ itr to ens re proper loading
No need for additional circ itr to ens re proper loading
- No need for additional circuitry to ensure proper loading.
No need for additional circuitry to ensure proper loading.
Sharif University of Technology
Slide Slide 14 14 of
- f 52
52
Programmable ASICs
EPROM Transistor EPROM Transistor
With a high (>
With a high (>12 12V) programming voltage, V) programming voltage, VPP applied to the drain electrons gain VPP applied to the drain electrons gain VPP, applied to the drain, electrons gain VPP, applied to the drain, electrons gain enough energy to “jump” onto the floating enough energy to “jump” onto the floating gate (gate gate (gate1) gate (gate gate (gate1)
Electrons stuck on gate
Electrons stuck on gate1 1 raise the threshold raise the threshold lt th t th t i t i l ff lt th t th t i t i l ff voltage so that the transistor is always off voltage so that the transistor is always off for normal operating voltages for normal operating voltages
UV light provides enough energy for the
UV light provides enough energy for the l t t k t l t t k t 1 t “j ” b k t t “j ” b k t electrons stuck on gate electrons stuck on gate1 1 to “jump” back to to “jump” back to the bulk, allowing the transistor to operate the bulk, allowing the transistor to operate normally normally
Sharif University of Technology
Slide Slide 15 15 of
- f 52
52
Programmable ASICs
normally normally
EPROM Technology EPROM Technology gy gy
Used in both SPLD and CPLD
Used in both SPLD and CPLD devices devices
Transistor between two wires
Transistor between two wires i l t i d i l t i d AND f ti AND f ti implement wired implement wired-AND functions. AND functions.
An input to the AND plane can drive a product wire to LOW
An input to the AND plane can drive a product wire to LOW through an EPROM transistor, if that input is part of the through an EPROM transistor, if that input is part of the corresponding product term corresponding product term corresponding product term. corresponding product term.
For inputs not involved in a product term, the appropriate
For inputs not involved in a product term, the appropriate EPROM transistors are programmed as permanently off EPROM transistors are programmed as permanently off EPROM transistors are programmed as permanently off. EPROM transistors are programmed as permanently off.
EEPROM: program
EEPROM: program and erase electrically. and erase electrically.
Sharif University of Technology
Slide Slide 16 16 of
- f 52
52
Programmable ASICs
Characteristics of FPGA Technology Characteristics of FPGA Technology
Technology Technology Chip Area Chip Area RP RP Volatile Volatile Technology Technology
CMOS CMOS large large In In-
- circuit
circuit Yes Yes Static RAM Static RAM CMOS CMOS+ Small, large programming Small, large programming transistor transistor No No No No PLICE Anti PLICE Anti-
- Fuse
Fuse CMOS CMOS+ Small, large Small, large pr r mmi tr i t r pr r mmi tr i t r No No No No ViaLink Anti ViaLink Anti-
- F
UVCMOS UVCMOS small small Out of circuit Out of circuit No No EPROM EPROM CMOS CMOS programming transistor programming transistor Fuse Fuse EECMOS EECMOS 2x EPROM x EPROM Out of circuit Out of circuit No No EEPROM EEPROM UVCMOS UVCMOS small small Out of circuit Out of circuit No No EPROM EPROM
Sharif University of Technology
Slide Slide 17 17 of
- f 52
52
Programmable ASICs
FPGA Architectures FPGA Architectures
Logic block:
Logic block:
How are functions implemented? fixed
How are functions implemented? fixed p functions or programmable? functions or programmable?
Support complex functions
Support complex functions need need f bl k b t bi l hip f bl k b t bi l hip fewer blocks, but bigger, so less on chip. fewer blocks, but bigger, so less on chip.
Support simple functions
Support simple functions need more need more blocks, but smaller so more on chip. blocks, but smaller so more on chip.
Interconnect
Interconnect:
How are logic blocks arranged?
How are logic blocks arranged?
How many wires will be needed between them?
How many wires will be needed between them?
Are wires evenly distributed across chip?
Are wires evenly distributed across chip?
Programmability slows wires down
Programmability slows wires down are some wires specialized to long are some wires specialized to long
Programmability slows wires down
Programmability slows wires down– are some wires specialized to long are some wires specialized to long distances? distances?
How many inputs/outputs must be routed to/from each logic block?
How many inputs/outputs must be routed to/from each logic block?
Sharif University of Technology
Slide Slide 18 18 of
- f 52
52
What utilization are we willing to accept?
What utilization are we willing to accept? 50 50%? %? 20 20%? %? 90 90%? %?
Programmable ASICs
Functional Units Functional Units
RAM blocks (Xilinx):
RAM blocks (Xilinx): implement function truth table implement function truth table
Multiplexers (
Multiplexers (Actel Actel): ): b ild B l f ti i b ild B l f ti i build Boolean functions using build Boolean functions using muxes muxes
Logic gates, flip
Logic gates, flip-
- flops:
flops: Such as carry chains Used for high Such as carry chains Used for high Such as carry chains. Used for high Such as carry chains. Used for high- performance computations performance computations
Sharif University of Technology
Slide Slide 19 19 of
- f 52
52
Programmable ASICs
Programmable Switch Elements Programmable Switch Elements g
Used in connecting:
Used in connecting:
The I/O of functional units
The I/O of functional units to the wires to the wires
A horizontal wire to a
A horizontal wire to a vertical wire vertical wire vertical wire vertical wire
Two wire segments to form
Two wire segments to form
Two wire segments to form
Two wire segments to form a longer wire segment a longer wire segment
Sharif University of Technology
Slide Slide 20 20 of
- f 52
52
Programmable ASICs
Applications of FPGAs Applications of FPGAs
Implementation of random logic:
Implementation of random logic:
Easier changes at system
Easier changes at system-
- level (one device is modified).
level (one device is modified).
Can eliminate need for full
Can eliminate need for full-
- custom chips.
custom chips.
- Prototyping
Prototyping
G t /b tt /f t d b i d th ith i l ti G t /b tt /f t d b i d th ith i l ti
Get more/better/faster debugging done than with simulation.
Get more/better/faster debugging done than with simulation.
- Reconfigurable hardware:
Reconfigurable hardware:
One hardware block used to implement more than one function
One hardware block used to implement more than one function
One hardware block used to implement more than one function.
One hardware block used to implement more than one function.
Functions must be mutually
Functions must be mutually-
- exclusive in time.
exclusive in time.
Can greatly reduce cost while enhancing flexibility.
Can greatly reduce cost while enhancing flexibility.
RAM
RAM-
- based option
based option only.
- nly.
- Special
Special-
- purpose computation engines:
purpose computation engines:
Hardware dedicated to solving one problem (or class of
Hardware dedicated to solving one problem (or class of
Hardware dedicated to solving one problem (or class of
Hardware dedicated to solving one problem (or class of problems). problems).
Accelerators attached to general
Accelerators attached to general-
- purpose computers.
purpose computers.
Sharif University of Technology
Slide Slide 21 21 of
- f 52
52
Programmable ASICs
Anti Anti-
- Fuse Based FPGAs
Fuse Based FPGAs
Actel Actel Actel Actel A t l A t l FPGA A hit t FPGA A hit t Actel Actel FPGA Architecture FPGA Architecture Logic Module Logic Module Interconnect Interconnect
Actel Actel FPGA Architecture FPGA Architecture
Actel
Actel uses a uses a fine fine-
- grain architecture
grain architecture; i.e., ; i.e., LMs are close LMs are close to the size of the base cell of an MGA to the size of the base cell of an MGA
Matched to small anti
Matched to small anti-
- fuse programming technology
fuse programming technology
A simple LM reduces performance, but allows fast and robust
A simple LM reduces performance, but allows fast and robust place place-
- and
and-
- route
route
Allows you to use almost all (>
Allows you to use almost all (>90 90%) of the FPGA %) of the FPGA
Synthesis can map logic efficiently to a fine-grain architecture
Sharif University of Technology
Slide Slide 23 23 of
- f 52
52
Synthesis can map logic efficiently to a fine-grain architecture
Programmable ASICs
ACT ACT 1 1 Logic Module Logic Module g
The ACT architecture: The ACT architecture:
(a) Organization of the basic (a) Organization of the basic (a) Organization of the basic (a) Organization of the basic logic cells (LM) logic cells (LM) (b) The ACT (b) The ACT 1 1 Logic Module. Logic Module. The ACT The ACT 1 family uses just family uses just The ACT The ACT 1 1 family uses just family uses just
- ne type of LM. ACT
- ne type of LM. ACT 2
2 and and ACT ACT 3 3 FPGA families both FPGA families both use two different types of LM use two different types of LM
(c) An example LM implementation using pass transistors (without any (c) An example LM implementation using pass transistors (without any buffering). buffering). F = A · B + B' · C + D F = A · B + B' · C + D (d) An example logic macro. Connect logic signals to some or all of the LM (d) An example logic macro. Connect logic signals to some or all of the LM inputs, the remaining inputs to VDD or GND inputs, the remaining inputs to VDD or GND inputs, the remaining inputs to VDD or GND inputs, the remaining inputs to VDD or GND
Sharif University of Technology
Slide Slide 24 24 of
- f 52
52
Programmable ASICs
Shannon’s Expansion Theorem Shannon’s Expansion Theorem
Use the
Use the Shannon expansion theorem Shannon expansion theorem to to expand expand F with F with respect to ( respect to (wrt wrt) a variable (A): ) a variable (A): F =A·F| F =A·F|(A='
(A='1') ') + A'·F|
+ A'·F|(A=‘
(A=‘0') ')
p ( p ( ) ( ) ) ( ) |(A=
(A= 1 )
|(A=
(A= 0 )
Example: F =A'·B + A·B·C' + A'·B'·C = A·(B·C') + A'·(B + B'·C)
Example: F =A'·B + A·B·C' + A'·B'·C = A·(B·C') + A'·(B + B'·C)
F|
F|(A='
(A='1 1') ')=B·C' is the
=B·C' is the cofactor cofactor of F
- f F wrt
wrt A, or F A, or FA |(
) A
If we expand F
If we expand F wrt wrt B: F =B·(A' + A·C') + B'·(A'·C) B: F =B·(A' + A·C') + B'·(A'·C)
Eventually we reach the unique
Eventually we reach the unique canonical form canonical form, which , which y q y q , uses only uses only minterms minterms: F = C·(A'·B+A'·B') + C'·(A·B+A'·B) : F = C·(A'·B+A'·B') + C'·(A·B+A'·B)
Sharif University of Technology
Slide Slide 25 25 of
- f 52
52
Programmable ASICs
Shannon’s Expansion Theorem (cont’d) Shannon’s Expansion Theorem (cont’d)
Another example: F=(A·B) + (B'·C) + D
Another example: F=(A·B) + (B'·C) + D
Expand F wrt B: F=B·(A + D) + B'·(C + D) =B·F2 + B'·F1 Expand F wrt B: F B (A D) B (C D) B F2 B F1 F: a 2:1 MUX, B selecting between 2 inputs: F |
|(B='
(B='1 1') ') and F |
|(B=‘
(B=‘0 0') ')
– F also describes the output of the ACT 1 LM Now we need to split up F1 and F2 Expand F2 wrt A, and F1 wrt C:
F2=A + D =(A 1) + (A' D);
– F2=A + D =(A·1) + (A'·D); – F1=C + D =(C·1) + (C'·D) A, B, C connect to the select lines and '1' and D are the inputs of
, , p the MUXes in the ACT 1 LM
Connections: A0=D, A1='1', B0=D, B1='1', SA=C, SB=A, S0='0',
and S1=B and S1=B
Sharif University of Technology
Slide Slide 26 26 of
- f 52
52
Programmable ASICs
Multiplexer Logic as Function Generators Multiplexer Logic as Function Generators p g p g
The
The 16 16 logic functions of logic functions of 2 2 variables: variables:
There are
There are 10 10 functions that we can functions that we can implement using just implement using just
- ne
- ne 2:1
1 MUX MUX
- ne
- ne 2:1
1 MUX MUX
6
6 functions are functions are useful: INV, BUF, useful: INV, BUF, , , , , AND, OR, AND AND, OR, AND1 1-
- 1,
, NOR NOR1 1-
- 1
1
Sharif University of Technology
Slide Slide 27 27 of
- f 52
52
Programmable ASICs
ACT ACT1 1 LM as a Boolean Function Generator LM as a Boolean Function Generator
(a) A (a) A 2:1 MUX viewed as a function wheel MUX viewed as a function wheel (a) A (a) A 2:1 1 MUX viewed as a function wheel MUX viewed as a function wheel (b) The ACT (b) The ACT1 1 LM is two function wheels, an OR gate, and a LM is two function wheels, an OR gate, and a 2 2: :1 1 MUX MUX
A
A 2 2: :1 1 MUX is a function wheel that can generate MUX is a function wheel that can generate BUF,
BUF,
g
INV, AND INV, AND-
- 11
11, AND , AND1 1-
- 1
1, OR, AND , OR, AND
WHEEL(A, B) =MUX(A
WHEEL(A, B) =MUX(A0 0, A , A1 1, SA) , SA)
Each of the inputs (A
Each of the inputs (A0, A , A1, and SA) may be A,B,' , and SA) may be A,B,'0 0',or ' ',or '1 1' '
ACT
ACT 1 1 LM =MUX [WHEEL LM =MUX [WHEEL1 1, WHEEL , WHEEL2 2, OR(S , OR(S0 0, S , S1 1)] )]
Sharif University of Technology
Slide Slide 28 28 of
- f 52
52
Programmable ASICs
ACT ACT 2 2 and ACT and ACT 3 3 Logic Modules Logic Modules g
ACT
ACT1 1 requires requires 2 2 LMs per LMs per FF: with unknown FF: with unknown interconnect capacitance interconnect capacitance ACT ACT 2 d ACT d ACT 3
ACT
ACT 2 2 and ACT and ACT 3 3 use use two types of LMs: two types of LMs:
ACT
ACT 2 C-Module Module
ACT
ACT 2 2 C-Module Module (combinational) (combinational) is similar to is similar to the ACT the ACT 1 1 LM, but can LM, but can implement five implement five input logic input logic implement five implement five-input logic input logic functions functions
ACT
ACT 2 2 S-
- Module
Module (sequential module) (sequential module) contains a C contains a C-
- Module and a
Module and a sequential element sequential element
Sharif University of Technology
Slide Slide 29 29 of
- f 52
52
q
Programmable ASICs
Interconnect Interconnect
Anti
Anti-
- fuses join wire
fuses join wire segments segments within within each channel into each channel into wire segments wire segments
H i t l t H i t l t
- Horizontal segments vary
Horizontal segments vary in length from four in length from four columns of LMs to the columns of LMs to the entire row of modules entire row of modules
If th L i M d l t th d f t i l th t If th L i M d l t th d f t i l th t
entire row of modules entire row of modules (long lines) (long lines)
If the Logic Module at the end of a net is less than two rows away
If the Logic Module at the end of a net is less than two rows away from the driver module, a connection requires two anti from the driver module, a connection requires two anti-
- fuses, a
fuses, a vertical track, and two horizontal segments vertical track, and two horizontal segments
If the modules are more than two rows apart, a connection
If the modules are more than two rows apart, a connection requires a long vertical track together with another vertical track requires a long vertical track together with another vertical track (the output stub) and two horizontal tracks, with (the output stub) and two horizontal tracks, with 4 anti anti-fuses. fuses.
Sharif University of Technology
Slide Slide 30 30 of
- f 52
52
(the output stub) and two horizontal tracks, with (the output stub) and two horizontal tracks, with 4 4 anti anti fuses. fuses.
Programmable ASICs
Routing Channels Routing Channels g
Fixed channel widths (tracks)
Fixed channel widths (tracks)
Channel
Channel -
- > track
> track -
- > segment
> segment
Segment length?
Segment length?
Long: carry the signal longer,
Long: carry the signal longer, less “concatenation” switches, less “concatenation” switches, but might waste track but might waste track but might waste track but might waste track
Short: local connections, slow
Short: local connections, slow for longer connections for longer connections
Sharif University of Technology
Slide Slide 31 31 of
- f 52
52
Programmable ASICs
Switch Boxes Switch Boxes
Ideally, provide switches for all
Ideally, provide switches for all possible connections possible connections
Trade
Trade-
- off:
- ff:
Too many switches:
Too many switches:
Too many switches:
Too many switches:
– – Large area
Large area
– – Complex to program
Complex to program
Too few switches:
Too few switches:
– – Cannot route signals
Cannot route signals
One possible solution:
One possible solution:
Sharif University of Technology
Slide Slide 32 32 of
- f 52
52
Programmable ASICs
RC Delay in RC Delay in Antifuse Antifuse Connections Connections y
A four
A four-antifuse antifuse connection connection
L0
0: output stub : output stub
L1
1 and L and L3 3: horizontal tracks : horizontal tracks
L2
2: long vertical track (LVT) : long vertical track (LVT)
L4
4: input stub : input stub
A tif A tif d l d b i t h i t t d l d b i t h i t t
Antifuse
Antifuse modeled by a resistance, each interconnect modeled by a resistance, each interconnect segment modeled by a capacitance. segment modeled by a capacitance.
Sharif University of Technology
Slide Slide 33 33 of
- f 52
52
Programmable ASICs
SRAM Based FPGAs SRAM Based FPGAs
Xilinx Xilinx Xilinx Xilinx FPGA A hit t FPGA A hit t FPGA Architecture FPGA Architecture
- Configurable Logic Block (CLB)
Configurable Logic Block (CLB)
- Interconnect
Interconnect
- Interconnect
Interconnect
- Multipliers and DSP Blocks
Multipliers and DSP Blocks
- On
On-
- Chip RAM
Chip RAM p
Xilinx FPGA Structure Xilinx FPGA Structure
Fixed arrays of logical function cells (CLBs)
Fixed arrays of logical function cells (CLBs) connectable by a system of pass transistors driven by connectable by a system of pass transistors driven by static RAM cells static RAM cells static RAM cells static RAM cells
Sharif University of Technology
Slide Slide 35 35 of
- f 52
52
Programmable ASICs
Virtex Virtex II II
0.
.13 13 μ, μ, 8 8-
- layer metal CMOS process
layer metal CMOS process
Cu power distribution and interconnect
Cu power distribution and interconnect
Cu power distribution and interconnect
Cu power distribution and interconnect
Up to
Up to 10 10 million system gates million system gates
>100
100 000 000 LUTs and LUTs and
>100
100,000 000 LUTs and LUTs and flip flip-
- flops
flops
>1000
1000 BlockRAMs BlockRAMs and multipliers and multipliers
>200
200 MHz clock rate, MHz clock rate, multi multi-Gbps serial I/O Gbps serial I/O multi multi Gbps serial I/O Gbps serial I/O
On
On-
- chip PowerPC
chip PowerPC with cache with cache
3
3 Gigabit Serial I/O Gigabit Serial I/O
Sharif University of Technology
Slide Slide 36 36 of
- f 52
52
Programmable ASICs
Simplified CLB Structure Simplified CLB Structure p
Two slices in each
Two slices in each Virtex Virtex CLB CLB 4 slices in slices in Virtex Virtex CLB, CLB, 4 4 slices in slices in each each Virtex Virtex-
- II CLB
II CLB
Two buffers
Two buffers
Two buffers
Two buffers associated with each associated with each CLB, accessible by all CLB, accessible by all CLB outputs CLB outputs CLB outputs CLB outputs
Fast dedicated carry
Fast dedicated carry logic runs vertically up logic runs vertically up
CLB ti d l i fi d (LUT ti ) d CLB ti d l i fi d (LUT ti ) d
CLB propagation delay is fixed (LUT access time) and
CLB propagation delay is fixed (LUT access time) and independent of the logic function independent of the logic function
Sharif University of Technology
Slide Slide 37 37 of
- f 52
52
Programmable ASICs
Shift Register LUT Shift Register LUT g
Dynamically addressable
Dynamically addressable hift i t (SRL) hift i t (SRL) shift register (SRL) shift register (SRL)
Ultra
Ultra-
- efficient programmable
efficient programmable delay for balancing pipelined delay for balancing pipelined delay for balancing pipelined delay for balancing pipelined designs designs
Can also be used for simple
Can also be used for simple FIFO FIFO FIFOs FIFOs
Maximum delay of
Maximum delay of 16 16 clock clock cycles in one LUT, up to cycles in one LUT, up to 128 128 y , p y , p in one CLB in one CLB
Can be read asynchronously
Can be read asynchronously by toggling address lines by toggling address lines by toggling address lines by toggling address lines
Sharif University of Technology
Slide Slide 38 38 of
- f 52
52
Programmable ASICs
SRL SRL16 16 Applications Applications
1...
...16 16-
- bit shift register in one LUT
bit shift register in one LUT
Up to
Up to 128 128 bits in one bits in one Virtex Virtex II CLB II CLB
Up to
Up to 128 128 bits in one bits in one Virtex Virtex II CLB II CLB
FIFO, pseudo
FIFO, pseudo-
- random number generator (LFSR)
random number generator (LFSR)
Pulse generator and clock divider
Pulse generator and clock divider Pulse generator and clock divider Pulse generator and clock divider
Pattern generator, state machine
Pattern generator, state machine
… Website:
Website:
http://support.xilinx.com/support/techxclusives/SRL http://support.xilinx.com/support/techxclusives/SRL16 16-echxclusive echxclusive2.htm .htm p pp pp p pp pp
Sharif University of Technology
Slide Slide 39 39 of
- f 52
52
Programmable ASICs
Dedicated Fast Carry Dedicated Fast Carry
64
64-
- bit adders would require
bit adders would require 128 128 levels of logic levels of logic
Expensive complex carry schemes would be needed to preserve
Expensive complex carry schemes would be needed to preserve
Expensive complex carry schemes would be needed to preserve
Expensive complex carry schemes would be needed to preserve performance without using Carry Logic performance without using Carry Logic
Virtex
Virtex minimizes the carry propagation delay minimizes the carry propagation delay
<50
50 ps ps per bit, includes routing per bit, includes routing
Fast adders, accumulators, and counters
Fast adders, accumulators, and counters
24
24-
- bit operation at up to
bit operation at up to 300 300 MHz in MHz in Virtex Virtex-
- II
II
64
64-
- bit operation at up to
bit operation at up to 190 190 MHz in MHz in Virtex Virtex-
- II
II
Fully synchronous operation Fully synchronous operation
Fully synchronous operation
Fully synchronous operation
Same speed for add/subtract, accumulate, or count
Same speed for add/subtract, accumulate, or count
Sharif University of Technology
Slide Slide 40 40 of
- f 52
52
Programmable ASICs
Interconnect Interconnect
Vertical lines and horizontal lines run between CLBs General
General-
- purpose interconnect joins switch boxes
purpose interconnect joins switch boxes
Long lines run across the entire chip.
Long lines run across the entire chip.
PIPs are programmable pass transistors that connect the CLB
PIPs are programmable pass transistors that connect the CLB inputs and outputs to the routing network inputs and outputs to the routing network
Sharif University of Technology
Slide Slide 41 41 of
- f 52
52
inputs and outputs to the routing network inputs and outputs to the routing network
Programmable ASICs
Interconnect Delay Interconnect Delay
Sharif University of Technology
Slide Slide 42 42 of
- f 52
52
Programmable ASICs
Multipliers and DSP Blocks Multipliers and DSP Blocks p
Multiplying
Multiplying 2 2 numbers: extremely numbers: extremely
4-
- Bit by
Bit by 4 4-
- Bit Multiplier
Bit Multiplier
resource resource-
- intensive and complex to
intensive and complex to implement in digital circuitry implement in digital circuitry
More than
More than 2000 2000 operations for a single
- perations for a single
More than
More than 2000 2000 operations for a single
- perations for a single
32 32-
- bit numbers multiply
bit numbers multiply
FPGAs have prebuilt multipliers to save
FPGAs have prebuilt multipliers to save
- n LUT and FF usage
- n LUT and FF usage
DSP
DSP48 48 slices, integrate a slices, integrate a 25 25-
- bit by
bit by 18 18-
- bit
bit multiplier with adder circuitry (MAC) multiplier with adder circuitry (MAC) multiplier with adder circuitry (MAC) multiplier with adder circuitry (MAC)
Virtex-II 1000 Virtex-II 3000 Spartan-3 1000 Spartan-3 2000 Virtex-5 LX30 Virtex-5 LX50 Virtex-5 LX85 1000 3000 1000 2000 LX30 LX50 LX85 #Multiplier s 40 96 24 40 32 48 48 Type 18x18 18x18 18x18 18x18 DSP48 DSP48 DSP48
Sharif University of Technology
Slide Slide 43 43 of
- f 52
52
Programmable ASICs
Type 18x18 18x18 18x18 18x18 Slices Slices Slices
On On-
- Chip RAM
Chip RAM
Can implement datasets as arrays using flip
Can implement datasets as arrays using flip-
- flops
flops
Large arrays become expensive for FPGA logic resources
Large arrays become expensive for FPGA logic resources
Large arrays become expensive for FPGA logic resources
Large arrays become expensive for FPGA logic resources
Embedded block RAM: useful for storing datasets or
Embedded block RAM: useful for storing datasets or passing values between parallel loops passing values between parallel loops p g p p p g p p
A 100
A 100-
- element array of 32
element array of 32-
- bit numbers:
bit numbers:
– – > 30% of FFs in a
> 30% of FFs in a Virtex Virtex-
- II 1000 FPGA
II 1000 FPGA
– – Less than 1 percent of the embedded block RAM
Less than 1 percent of the embedded block RAM
Virtex-II 3000 Virtex-II 1000 Spartan
- 3 1000
Spartan
- 3 2000
Virtex-5 LX30 Virtex-5 LX50 Virtex-5 LX85 Total RAM (kbit ) 1728 720 432 720 1152 1728 3456 (kbits) 1728 720 432 720 1152 1728 3456 Blocks Size (kbits) 16 16 16 16 36 36 36
Sharif University of Technology
Slide Slide 44 44 of
- f 52
52
Programmable ASICs
Simplified I/O Block Structure Simplified I/O Block Structure p
Fast I/O drivers
Fast I/O drivers
Separate registers for input,
Separate registers for input,
- utput,
- utput, 3
3-
- state control
state control
/S /S
Async
Async/Sync set or reset /Sync set or reset
Common clock and separate
Common clock and separate clock enables improve usability clock enables improve usability c oc e ab es p o e usab y c oc e ab es p o e usab y
Configure as FF or latch
Configure as FF or latch
Programmable slew rate and
Programmable slew rate and adjustable input delay adjustable input delay
Selectable I/O standards
Selectable I/O standards
Output drive, input threshold
Output drive, input threshold
Sharif University of Technology
Slide Slide 45 45 of
- f 52
52
Programmable ASICs
FPGA Resources for Various Families FPGA Resources for Various Families
Virtex-II 1000 Virtex-II 3000 Spartan-3 1000 Spartan- 3 Virtex-5 LX30 Virtex-5 LX50 Virtex-5 LX85 1000 3000 1000 3 2000 LX30 LX50 LX85 Gates 1 million 3 million 1 million 2 million
- Flip-Flops
10,240 28,672 15,360 40,960 19,200 28,800 51,840 LUTs 10 240 28 672 15 360 40 960 19 200 28 800 51 840 LUTs 10,240 28,672 15,360 40,960 19,200 28,800 51,840 Multipliers 40 96 24 40 32 48 48
Even hardwired processor cores
Even hardwired processor cores
Block RAM (kb) 720 1,728 432 720 1,152 1,728 3,456
Even hardwired processor cores
Even hardwired processor cores
2 PowerPC cores on Xilinx Virtex
2 PowerPC cores on Xilinx Virtex-
- 4
4
Ethernet
Ethernet MACs PLLs bit MACs PLLs bit-file encryption ADC/DAC etc file encryption ADC/DAC etc
Sharif University of Technology
Slide Slide 46 46 of
- f 52
52
Ethernet
Ethernet MACs, PLLs, bit MACs, PLLs, bit-file encryption, ADC/DAC, etc. file encryption, ADC/DAC, etc.
Programmable ASICs
FPGA to ASIC Conversion FPGA to ASIC Conversion
Conversion benefits:
Conversion benefits:
Multi
Multi-chip integration. chip integration.
Multi
Multi chip integration. chip integration.
Reliable high
Reliable high-
- volume production capacity.
volume production capacity.
Lower cost.
Lower cost.
Greater choice of output buffer types and strengths.
Greater choice of output buffer types and strengths.
Greater pin
Greater pin-
- out flexibility, might even be able to reduce the pin
- ut flexibility, might even be able to reduce the pin
count to less than that of the PLD or FPGA count to less than that of the PLD or FPGA count to less than that of the PLD or FPGA. count to less than that of the PLD or FPGA.
More robust routing resources.
More robust routing resources.
ASICs usually deliver higher performance and lower power.
ASICs usually deliver higher performance and lower power.
Sharif University of Technology
Slide Slide 47 47 of
- f 52
52
Programmable ASICs
FPGA to ASIC Conversion (cont’d) FPGA to ASIC Conversion (cont’d)
Risks:
Risks:
Cannot use some specialized circuits.
Cannot use some specialized circuits.
Cannot use some specialized circuits.
Cannot use some specialized circuits.
– – RAM initialization and configuration logic are expensive to
RAM initialization and configuration logic are expensive to implement in an ASIC implement in an ASIC f
Need to redesign the system board if the replacement chip
Need to redesign the system board if the replacement chip doesn't exactly mimic the original FPGA characteristics. doesn't exactly mimic the original FPGA characteristics.
FPGA design may contain asynchronous components, which are
FPGA design may contain asynchronous components, which are G des g ay co a asy c
- ous co
po e s, c a e G des g ay co a asy c
- ous co
po e s, c a e unlikely to work in an ASIC migration. unlikely to work in an ASIC migration.
- Functional verification is often incomplete for FPGA designs.
Functional verification is often incomplete for FPGA designs.
Sharif University of Technology
Slide Slide 48 48 of
- f 52
52
Programmable ASICs
Conversion Flow Conversion Flow
FPGA
FPGA netlist netlist: : Verilog Verilog, , EDIF, VHDL, or XNF EDIF, VHDL, or XNF
Netlist
Netlist translation: Each translation: Each l i f ti i th l i f ti i th tli t tli t logic function in the logic function in the netlist netlist is mapped into an ASIC is mapped into an ASIC equivalent equivalent equivalent. equivalent.
Design analysis: boundary
Design analysis: boundary scan, power analysis, scan, power analysis, , p y , , p y , DRC, pin DRC, pin-
- pad selection,
pad selection, and package finalization. and package finalization.
Sharif University of Technology
Slide Slide 49 49 of
- f 52
52
Programmable ASICs
Conversion Flow (cont’d) Conversion Flow (cont’d)
Static Timing Analysis:
Static Timing Analysis:
Exhaustive timing path analysis reporting chip
Exhaustive timing path analysis reporting chip-level clock level clock-to to-out
- ut
Exhaustive timing path analysis reporting chip
Exhaustive timing path analysis reporting chip level clock level clock to to out
- ut
times, register times, register-
- to
to-
- register delays and set
register delays and set-
- up and hold
up and hold-
- time
time requirements. requirements.
T t B h G ti T t B h G ti
Test Bench Generation
Test Bench Generation
ASIC devices are fully customized and require a custom test
ASIC devices are fully customized and require a custom test program. program. program. program.
- If fault coverage improvement is necessary, use ATPG, or insert
If fault coverage improvement is necessary, use ATPG, or insert an internal scan chain and combine it with ATPG. an internal scan chain and combine it with ATPG.
Sharif University of Technology
Slide Slide 50 50 of
- f 52
52
Programmable ASICs
Designing with Conversion in Mind Designing with Conversion in Mind
Use synchronous circuits when possible.
Use synchronous circuits when possible.
- Use one external clock for the chip
Use one external clock for the chip
- Use one external clock for the chip.
Use one external clock for the chip.
Don't gate it or generate additional internal clocks from it.
Don't gate it or generate additional internal clocks from it.
Use the FFs that the FPD vendor provides
Use the FFs that the FPD vendor provides Use the FFs that the FPD vendor provides Use the FFs that the FPD vendor provides
Don't synthesize additional registers or latches from random
Don't synthesize additional registers or latches from random logic gates. logic gates.
- Consider all possible values in decoding logic and state
Consider all possible values in decoding logic and state machine and provide a path from each unused option to machine and provide a path from each unused option to a known initial state a known initial state a known initial state. a known initial state.
For fault
For fault-
- tolerant or otherwise
tolerant or otherwise-
- redundant circuits, let your
redundant circuits, let your conversion vendor know conversion vendor know conversion vendor know. conversion vendor know.
Otherwise, the
Otherwise, the netlist netlist conversion may automatically eliminate conversion may automatically eliminate extra logic. extra logic.
Sharif University of Technology
Slide Slide 51 51 of
- f 52
52
Programmable ASICs
Designing with Conversion in Mind (cont’d) Designing with Conversion in Mind (cont’d)
Use VHDL or
Use VHDL or Verilog Verilog instead of schematics or another instead of schematics or another programmable programmable-
- logic
logic-
- centric alternative.
centric alternative. p g p g g
Use standards as much as possible.
Use standards as much as possible.
Don't rely on any proprietary circuits.
Don't rely on any proprietary circuits. y y p p y y y p p y
Simulate your FPGA design, to have a base set of
Simulate your FPGA design, to have a base set of functional test vectors to run against the ASIC conversion. functional test vectors to run against the ASIC conversion.
Consider prototyping with a fine
Consider prototyping with a fine-
- grained programmable
grained programmable-
- logic architecture with an FPGA that provides abundant
logic architecture with an FPGA that provides abundant ti ti routing resources. routing resources.
Most closely mimics the final NAND gate
Most closely mimics the final NAND gate-
- based ASIC.
based ASIC.
Sharif University of Technology
Slide Slide 52 52 of
- f 52
52
Programmable ASICs