ASIC Layout Overview Design flow Back-end process FPGA - - PowerPoint PPT Presentation

asic layout overview design flow back end process fpga
SMART_READER_LITE
LIVE PREVIEW

ASIC Layout Overview Design flow Back-end process FPGA - - PowerPoint PPT Presentation

ASIC Layout Overview Design flow Back-end process FPGA design process Conclusions 2 ASIC Design flow 3 Source: http://www.ami.ac.uk What is Backend? Physical Design: 1. FloorPlanning : Architects job 2.


slide-1
SLIDE 1

ASIC Layout

slide-2
SLIDE 2

Overview

  • Design flow
  • Back-end process
  • FPGA design process
  • Conclusions

2

slide-3
SLIDE 3

ASIC Design flow

3

Source: http://www.ami.ac.uk

slide-4
SLIDE 4

What is Backend?

  • Physical Design:

1. FloorPlanning : Architect’s job 2. Placement : Builder’s job 3. Routing : Electrician’s job

4

slide-5
SLIDE 5

Input for Layout Tools

Libraries:

  • Physical Libraries (LEF/OA)
  • Cell boundaries, pins, routing rules
  • Timing Libraries (*.lib)

Optional Input Files:

  • Floorplan File
  • IO File
  • Scan Definition File

Input:

  • Verilog Gate Level Netlist
  • Timing Constraint files, for all modes (*.sdc)

Optional Libraries:

  • Technology Files (Cap Tables, QRC Tech file)
  • SI Libraries (*.cdb)
slide-6
SLIDE 6

Import Design Procedure – Global Definition File

File - Import Design

Verilog Netlist File(s) OA-Flow Reference, Custom Libraries of Standard Cells; IOs, Custom Blocks, Rams …

  • r LEF files (LEF/DEF Flow)

Specify MMMC (Multi Mode Multi Corner) view file: links timing libraries, RC corners, and constraints per view Power/Ground (Special) net definitions, CPF: Common Power Format (Low-Power Design/Power Islands)

Command: source <myfile>.globals init_design

slide-7
SLIDE 7

Structure of a Die

  • Silicon die is mounted inside a chip package.
  • A die consists of a logic core inside a power ring.
  • Special power pads are used for the VDD and VSS (Core and Pad).

7

slide-8
SLIDE 8

The Design Implementation Flow

slide-9
SLIDE 9

Floorplaning

  • Floorplanning is a very important step in layout design.
  • Important objectives:

Chip size Aspect ratio Placement of basic building blocks IO placement

  • Definition of chip size and aspect ratio along with the placement of

its building blocks (memories, hard macros) strongly affects the chip routability and the final performance

  • The pads should be placed in a way to meet minimum pitch

requirements defined by the packaging methodology

9

slide-10
SLIDE 10

Placement and Routing

  • Placement
  • Defines the position of each cell

from the netlist

  • Placement performed in the

defined rows

  • Target is to place the connected

cells into neighboring positions to reduce the timing penalty

  • Routing
  • Performing the connection

between the cells (and IOs)

  • Metal lines are used to make the

routing

  • Objective is to reduce the

interconnection length (reducing line capacitance i.e. interconnection delay)

  • Global and local routing

10

slide-11
SLIDE 11

Back-end Design decisions

  • Core and pad limited design

Design size can be defined either by the core size or by the pad size. In general the design complexity is defined by the number of gates (reflected to core area) However, the pads are unproportionally big and therefore in case of great number of them, they could define the chip area

  • Opposite to that we have a core-limited design.
  • The aspect ratio of the chip has to be chosen such that it doesn’t

affect the chip routability and that corresponds to packaging. The aspect ratio of 1.0 defines quadratic shape of the chip. This shape is the optimal shape in respect to placement and routing.

  • The size of power rings depends on estimated power consumption of

the chip. Since the power pads are usually distributed evenly on all four sides of the chip, the maximum current flow through the power rings is ¼ of the total estimated current.

11

slide-12
SLIDE 12

Placement

  • ASIC placement is performed in rows
  • Routing can be performed in both

directions – horizontal and vertical

  • The chip size strongly depends on the

chosen core (row) utilization. A typical value of core utilization is 75%. If the chip contains complex logic requiring excessive routing, the user should consider relaxing the core utilization. If the chip logic is relatively simple, the user may try to tighten up utilization value in order to reduce the chip size

12

slide-13
SLIDE 13

Objectives of Placement Process

  • Performing the placement of each individual cells

in the rows

  • Reducing the placement distance between the

connected cells

  • Performing high density placements
  • Reducing the timing overhead and power

consumption

  • Addressing the routing challenges (avoiding

routing congestion congestion)

  • Timing driven placement tries to fulfil the timing

constraints while performing placement It is connected with the processes of trial routing and RC extraction to estimate the effects of the placement choices

13

slide-14
SLIDE 14

Placement Algorithms

  • Two general types of the algorithms:

Constructive placement Iterative placement improvement.

  • Constructive placement method

Min-cut algorithm, or eigenvalue method

  • Starts with a constructed solution,

following iterative improvement

  • The min-cut algorithm placement

method uses successive application of partitioning Cut the area into two pieces. Swap the cells to minimize the cost. Repeat the process, cutting smaller pieces until all the logic cells are placed.

  • The eigenvalue placement algorithm

uses the cost matrix or weighted connectivity matrix Source: Application-Specific Integrated Circuits - Michael J. S. Smith (a) Divide the chip into bins using a grid. (b) Merge all connections to the center of each bin. (c) Make a cut and swap cells between bins to minimize the cost (d) Throw out all the edges that are not inside the piece. (e) Repeat the process and continue the individual bins.

slide-15
SLIDE 15

Iterative Placement

  • Based on initial placement further improvements are done

Selection criteria decides which cells should be moved. Measurement criteria decides whether to move the selected cells.

  • Several exchange methods

pairwise interchange, force-directed interchange, force-directed relaxation, and force-directed pairwise relaxation.

  • All methods based on selecting a pair of cells which need to be

exchanged.

  • First the examined cell is selected, after that exchange with all other

random cells is evaluated based on cost criteria. The limits of selecting the pair could be defined through the Manhattan distance (a) Swapping two cells (b) Swapping more cells provides better results but It is more complex (c) A one-neighborhood. (d) A two-neighborhood.

Source: Application-Specific Integrated Circuits - Michael J. S. Smith

slide-16
SLIDE 16
  • Clock network need to be implemented to drive all sink elements (flip-flips,

lathes, etc) from the same source line

  • Clock network consisting of large numbers of buffers, invertors, clock gates
  • Objective is to reduce the phase difference between the clock at the

different clock sinks (clock skew)

  • Additional goals is to reduce the clock latency (depending on the clock tree

complexity and interconnection delay)

  • Clock is significant source of power consumption, therefore the objective to

reduce it In modern designs ~50%

  • Many sinks use all falling edge of the clocks

Important objecting is balancing of the rise and the fall time.

  • The clock tree is defined in clock tree definition file

Clock synthesis

16

slide-17
SLIDE 17

Clock trees

  • A path from the clock source to clock sinks

17

Figure source: vlsi.pro

slide-18
SLIDE 18

Concept of Clock Tree

18

Clock pad Clock tree Sub trees

slide-19
SLIDE 19

Clock Skew

  • Clock skew is the maximum difference in the arrival time of a clock signal

at two different sinks (flip-flops, latches etc).

  • Clock skew could lead to performance drop or to the need for fixing of hold

time delay (adding the buffers) which results in additional power and area

  • Clock skew should be minimized

19

Figure source: vlsi.pro

slide-20
SLIDE 20

Clock Gating and CTS

  • Clock gating is often used as a methodology for reducing the power

consumption Clock network uses ~50% of the power budget Switching of the network when it is not needed the consumption can be dramatically reduced

  • Clock gating needs to be taken into consideration while making CTS

Clock gate is part of the CTS and contribute to the skew CT balancing required between not-gated and gated subtrees

slide-21
SLIDE 21

Routing

  • Goals of the routing is to minimize the interconnect delay

Routing in performed using the available different layers of metal connections in the automatic way Design rules need to be fulfilled (minimum spacing etc.) Different types of routing (trial, clock routing, final routing) depending on the design phase Global routing – first phase of the final routing, connecting blocks Detailed routing – final routing of all interblock connections

21

slide-22
SLIDE 22

Manhattan Routing Algorithm

  • Motivated by the streets of New York

Straight connections in the horizontal and vertical directions Specific metal lines only for vertical or only for horizontal direction Avoiding interconnection problems Routing channels defined

  • Manhattan distance

Summary of distance in X-axis and Y-axis direction

  • There are now much more advanced algorithms

Pin A Pin B Pin C Pin D Metal 1 Metal 2

slide-23
SLIDE 23

Left-Edge Routing Algorithm

Source: Application-Specific Integrated Circuits - Michael J. S. Smith

slide-24
SLIDE 24

Verification

  • Timing verification
  • Power verification
  • LVS (layout vs schematics)
  • DRC (Design rule check)

24

  • ptDesign Final Non-SI Timing Summary
  • +--------------------+---------+---------+---------+---------+

| Setup mode | all | reg2reg |reg2cgate| default | +--------------------+---------+---------+---------+---------+ | WNS (ns):| 0.000 | 0.000 | 0.815 | 0.000 | | TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 | | Violating Paths:| 0 | 0 | 0 | 0 | | All Paths:| 4906 | 3787 | 38 | 1143 | +--------------------+---------+---------+---------+---------+ +--------------------+---------+---------+---------+---------+ | Hold mode | all | reg2reg |reg2cgate| default | +--------------------+---------+---------+---------+---------+ | WNS (ns):| 0.003 | 0.003 | 0.009 | 8.622 | | TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 | | Violating Paths:| 0 | 0 | 0 | 0 | | All Paths:| 4906 | 3787 | 38 | 1143 | +--------------------+---------+---------+---------+---------+

slide-25
SLIDE 25

Timing Verification in Backend Design

  • Timing verification after synthesis was possible based on the cell

delay and assumed interconnect delay (wireload model)

  • After layout the real interconnect delay can be estimated
  • Based on routing information (length, types of metal lines between

two pins) the parasitics can be calculated

  • Two important parameters R (resistivity) and C (capacity) of the line
  • Interconnect delay

td = R * C

Figure source: Application-Specific Integrated Circuits - Michael J. S. Smith

slide-26
SLIDE 26

Power Verification

  • Power related issues are very important in verification process

Power consumption IR drop Ground bounce EMI Substrate noise Crosstalk

slide-27
SLIDE 27

DRC & LVS

  • During the verification step Design Rule Check it is verified whether all

manufacturer rules have been followed

  • LVS includes extraction of schematics from the final layout and

comparison with the original netlist which was input for the layout Expected result is full matching Non-matching could indicate the problems: shorts, opens, parametric missmatch etc.

slide-28
SLIDE 28

Full Back-End Flow

  • Technology and IP setup (libraries, memory/hard macro IP, PDK)
  • Loading of input data (verilog netlist, constraints)
  • Floorplanning
  • Power planning
  • Placement
  • Initial verification and IPO
  • Clock tree insertion
  • Post-CTS verification and IPO
  • Routing
  • Post-Routing Verification and IPO
  • Timing Closure and ECO (Error Correction and Optimization)
  • Power/Voltage verification
  • DRC
  • LVS
  • Design for Manufacturability (Metal fillers etc)

28

slide-29
SLIDE 29

Field-Programmable Gate Arrays (FPGAs)

  • FPGAs are already fabricated chips which can be fully

functionally programmed after production Programming can be done by writing into configuration memory after power-on Configuration SRAM or Flash

  • FPGAs are consisting of configurable logic blocks (CLBs)

which can be individually programmed using programmable LUTs and memory blocks

  • Routing (interconnect) between the CLBs is also

programmable using configurable routing elements

  • FPGAs are in general less power efficient and with

reduced performances but NRE costs are reduced to minimum Today FPGAs contain specialized blocks (embedded processors, DSP) which make them more optimal

29

slide-30
SLIDE 30

30

Basic Architecture

Source figure: Xilinx Example: Spartan 2

  • Basic architecture of FPGA

contains the elements which can be fully programmed CLBs Memory IOs Interconnect Clocking

slide-31
SLIDE 31

31

Configurable Logic Block (CLB)

Source figure : Xilinx Example: Spartan 6

  • CLBs enable full functional

programmability programmable Lookup-tables (LUT) for arbitrary combinational function selectable/programmable sequential cell for targeted distributed memory function use of multiplexors for interconnecting the correct function

slide-32
SLIDE 32

32

I/O Block

Source figure: Xilinx Example: Spartan 6

  • IO pads in FPGAs are fully reconfigurable

support different IO directions (I, O, IO) single ended /differential different interface standards (CMOS, TTL, LVDS) different power supplies (3.3V, 2.5V, 1.8V, 1.5V, 1.2V) pullups, pulldowns, with and wo registering

slide-33
SLIDE 33

FPGA Clocking

33

Example: Spartan 6

  • Clocking in FPGAs is also programmable

based on DCMs which can be programmed in frequency/phase and aligned with other clock sources

  • Clock driver is routed to all relevant

sinks CLBs, memory, IOs

Source figure: Xilinx

slide-34
SLIDE 34

FPGA Design Flow

34

Source figure eet.com

  • Design flow corresponds to the one

for ASIC, but with different implementation Synthesis – translation of HDL into components of FPGA Place – placing the netlist into CLBs of FPGA Route – programming interconnects to execute the function

slide-35
SLIDE 35

FPGA Pros and Cons Pros Reducing NRE costs – no mask costs, reduced design costs Reducing design time – no need to wait for chip samples Possibility for easy correction – only reprogramming needed Cons High unit costs – one FPGA can be even ~10k€ Higher power consumption Reduced performances Today’s FPGA much more optimal Integrating multiprocessors on chip, DSPs, interfaces etc.

35

slide-36
SLIDE 36

Example- Xilinx Zynq Ultrascale+ Example of optimized FPGA platform Multi-core ARM system implemented on chip Large memory resources Advanced connectivity (USB, PCIe, CAN, SATA, etc) Real-time support Combining with programmable logic Support for high-speed serial interfaces

36

Source figure: Xilinx

slide-37
SLIDE 37

Conclusions

  • Process of designing ASICs was here analysed in details.
  • Main stapes include the synthesis, back-end and timing verification
  • During the practical part we will analyze the steps using the software

CAD tools

  • FPGA flow is similar to ASIC flow

37