SLIDE 1
ASIC Layout
SLIDE 2 Overview
- Design flow
- Back-end process
- FPGA design process
- Conclusions
2
SLIDE 3
ASIC Design flow
3
Source: http://www.ami.ac.uk
SLIDE 4 What is Backend?
1. FloorPlanning : Architect’s job 2. Placement : Builder’s job 3. Routing : Electrician’s job
4
SLIDE 5 Input for Layout Tools
Libraries:
- Physical Libraries (LEF/OA)
- Cell boundaries, pins, routing rules
- Timing Libraries (*.lib)
Optional Input Files:
- Floorplan File
- IO File
- Scan Definition File
Input:
- Verilog Gate Level Netlist
- Timing Constraint files, for all modes (*.sdc)
Optional Libraries:
- Technology Files (Cap Tables, QRC Tech file)
- SI Libraries (*.cdb)
SLIDE 6 Import Design Procedure – Global Definition File
File - Import Design
Verilog Netlist File(s) OA-Flow Reference, Custom Libraries of Standard Cells; IOs, Custom Blocks, Rams …
- r LEF files (LEF/DEF Flow)
Specify MMMC (Multi Mode Multi Corner) view file: links timing libraries, RC corners, and constraints per view Power/Ground (Special) net definitions, CPF: Common Power Format (Low-Power Design/Power Islands)
Command: source <myfile>.globals init_design
SLIDE 7 Structure of a Die
- Silicon die is mounted inside a chip package.
- A die consists of a logic core inside a power ring.
- Special power pads are used for the VDD and VSS (Core and Pad).
7
SLIDE 8
The Design Implementation Flow
SLIDE 9 Floorplaning
- Floorplanning is a very important step in layout design.
- Important objectives:
Chip size Aspect ratio Placement of basic building blocks IO placement
- Definition of chip size and aspect ratio along with the placement of
its building blocks (memories, hard macros) strongly affects the chip routability and the final performance
- The pads should be placed in a way to meet minimum pitch
requirements defined by the packaging methodology
9
SLIDE 10 Placement and Routing
- Placement
- Defines the position of each cell
from the netlist
- Placement performed in the
defined rows
- Target is to place the connected
cells into neighboring positions to reduce the timing penalty
- Routing
- Performing the connection
between the cells (and IOs)
- Metal lines are used to make the
routing
- Objective is to reduce the
interconnection length (reducing line capacitance i.e. interconnection delay)
10
SLIDE 11 Back-end Design decisions
- Core and pad limited design
Design size can be defined either by the core size or by the pad size. In general the design complexity is defined by the number of gates (reflected to core area) However, the pads are unproportionally big and therefore in case of great number of them, they could define the chip area
- Opposite to that we have a core-limited design.
- The aspect ratio of the chip has to be chosen such that it doesn’t
affect the chip routability and that corresponds to packaging. The aspect ratio of 1.0 defines quadratic shape of the chip. This shape is the optimal shape in respect to placement and routing.
- The size of power rings depends on estimated power consumption of
the chip. Since the power pads are usually distributed evenly on all four sides of the chip, the maximum current flow through the power rings is ¼ of the total estimated current.
11
SLIDE 12 Placement
- ASIC placement is performed in rows
- Routing can be performed in both
directions – horizontal and vertical
- The chip size strongly depends on the
chosen core (row) utilization. A typical value of core utilization is 75%. If the chip contains complex logic requiring excessive routing, the user should consider relaxing the core utilization. If the chip logic is relatively simple, the user may try to tighten up utilization value in order to reduce the chip size
12
SLIDE 13 Objectives of Placement Process
- Performing the placement of each individual cells
in the rows
- Reducing the placement distance between the
connected cells
- Performing high density placements
- Reducing the timing overhead and power
consumption
- Addressing the routing challenges (avoiding
routing congestion congestion)
- Timing driven placement tries to fulfil the timing
constraints while performing placement It is connected with the processes of trial routing and RC extraction to estimate the effects of the placement choices
13
SLIDE 14 Placement Algorithms
- Two general types of the algorithms:
Constructive placement Iterative placement improvement.
- Constructive placement method
Min-cut algorithm, or eigenvalue method
- Starts with a constructed solution,
following iterative improvement
- The min-cut algorithm placement
method uses successive application of partitioning Cut the area into two pieces. Swap the cells to minimize the cost. Repeat the process, cutting smaller pieces until all the logic cells are placed.
- The eigenvalue placement algorithm
uses the cost matrix or weighted connectivity matrix Source: Application-Specific Integrated Circuits - Michael J. S. Smith (a) Divide the chip into bins using a grid. (b) Merge all connections to the center of each bin. (c) Make a cut and swap cells between bins to minimize the cost (d) Throw out all the edges that are not inside the piece. (e) Repeat the process and continue the individual bins.
SLIDE 15 Iterative Placement
- Based on initial placement further improvements are done
Selection criteria decides which cells should be moved. Measurement criteria decides whether to move the selected cells.
pairwise interchange, force-directed interchange, force-directed relaxation, and force-directed pairwise relaxation.
- All methods based on selecting a pair of cells which need to be
exchanged.
- First the examined cell is selected, after that exchange with all other
random cells is evaluated based on cost criteria. The limits of selecting the pair could be defined through the Manhattan distance (a) Swapping two cells (b) Swapping more cells provides better results but It is more complex (c) A one-neighborhood. (d) A two-neighborhood.
Source: Application-Specific Integrated Circuits - Michael J. S. Smith
SLIDE 16
- Clock network need to be implemented to drive all sink elements (flip-flips,
lathes, etc) from the same source line
- Clock network consisting of large numbers of buffers, invertors, clock gates
- Objective is to reduce the phase difference between the clock at the
different clock sinks (clock skew)
- Additional goals is to reduce the clock latency (depending on the clock tree
complexity and interconnection delay)
- Clock is significant source of power consumption, therefore the objective to
reduce it In modern designs ~50%
- Many sinks use all falling edge of the clocks
Important objecting is balancing of the rise and the fall time.
- The clock tree is defined in clock tree definition file
Clock synthesis
16
SLIDE 17 Clock trees
- A path from the clock source to clock sinks
17
Figure source: vlsi.pro
SLIDE 18
Concept of Clock Tree
18
Clock pad Clock tree Sub trees
SLIDE 19 Clock Skew
- Clock skew is the maximum difference in the arrival time of a clock signal
at two different sinks (flip-flops, latches etc).
- Clock skew could lead to performance drop or to the need for fixing of hold
time delay (adding the buffers) which results in additional power and area
- Clock skew should be minimized
19
Figure source: vlsi.pro
SLIDE 20 Clock Gating and CTS
- Clock gating is often used as a methodology for reducing the power
consumption Clock network uses ~50% of the power budget Switching of the network when it is not needed the consumption can be dramatically reduced
- Clock gating needs to be taken into consideration while making CTS
Clock gate is part of the CTS and contribute to the skew CT balancing required between not-gated and gated subtrees
SLIDE 21 Routing
- Goals of the routing is to minimize the interconnect delay
Routing in performed using the available different layers of metal connections in the automatic way Design rules need to be fulfilled (minimum spacing etc.) Different types of routing (trial, clock routing, final routing) depending on the design phase Global routing – first phase of the final routing, connecting blocks Detailed routing – final routing of all interblock connections
21
SLIDE 22 Manhattan Routing Algorithm
- Motivated by the streets of New York
Straight connections in the horizontal and vertical directions Specific metal lines only for vertical or only for horizontal direction Avoiding interconnection problems Routing channels defined
Summary of distance in X-axis and Y-axis direction
- There are now much more advanced algorithms
Pin A Pin B Pin C Pin D Metal 1 Metal 2
SLIDE 23
Left-Edge Routing Algorithm
Source: Application-Specific Integrated Circuits - Michael J. S. Smith
SLIDE 24 Verification
- Timing verification
- Power verification
- LVS (layout vs schematics)
- DRC (Design rule check)
24
- ptDesign Final Non-SI Timing Summary
- +--------------------+---------+---------+---------+---------+
| Setup mode | all | reg2reg |reg2cgate| default | +--------------------+---------+---------+---------+---------+ | WNS (ns):| 0.000 | 0.000 | 0.815 | 0.000 | | TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 | | Violating Paths:| 0 | 0 | 0 | 0 | | All Paths:| 4906 | 3787 | 38 | 1143 | +--------------------+---------+---------+---------+---------+ +--------------------+---------+---------+---------+---------+ | Hold mode | all | reg2reg |reg2cgate| default | +--------------------+---------+---------+---------+---------+ | WNS (ns):| 0.003 | 0.003 | 0.009 | 8.622 | | TNS (ns):| 0.000 | 0.000 | 0.000 | 0.000 | | Violating Paths:| 0 | 0 | 0 | 0 | | All Paths:| 4906 | 3787 | 38 | 1143 | +--------------------+---------+---------+---------+---------+
SLIDE 25 Timing Verification in Backend Design
- Timing verification after synthesis was possible based on the cell
delay and assumed interconnect delay (wireload model)
- After layout the real interconnect delay can be estimated
- Based on routing information (length, types of metal lines between
two pins) the parasitics can be calculated
- Two important parameters R (resistivity) and C (capacity) of the line
- Interconnect delay
td = R * C
Figure source: Application-Specific Integrated Circuits - Michael J. S. Smith
SLIDE 26 Power Verification
- Power related issues are very important in verification process
Power consumption IR drop Ground bounce EMI Substrate noise Crosstalk
SLIDE 27 DRC & LVS
- During the verification step Design Rule Check it is verified whether all
manufacturer rules have been followed
- LVS includes extraction of schematics from the final layout and
comparison with the original netlist which was input for the layout Expected result is full matching Non-matching could indicate the problems: shorts, opens, parametric missmatch etc.
SLIDE 28 Full Back-End Flow
- Technology and IP setup (libraries, memory/hard macro IP, PDK)
- Loading of input data (verilog netlist, constraints)
- Floorplanning
- Power planning
- Placement
- Initial verification and IPO
- Clock tree insertion
- Post-CTS verification and IPO
- Routing
- Post-Routing Verification and IPO
- Timing Closure and ECO (Error Correction and Optimization)
- Power/Voltage verification
- DRC
- LVS
- Design for Manufacturability (Metal fillers etc)
28
SLIDE 29 Field-Programmable Gate Arrays (FPGAs)
- FPGAs are already fabricated chips which can be fully
functionally programmed after production Programming can be done by writing into configuration memory after power-on Configuration SRAM or Flash
- FPGAs are consisting of configurable logic blocks (CLBs)
which can be individually programmed using programmable LUTs and memory blocks
- Routing (interconnect) between the CLBs is also
programmable using configurable routing elements
- FPGAs are in general less power efficient and with
reduced performances but NRE costs are reduced to minimum Today FPGAs contain specialized blocks (embedded processors, DSP) which make them more optimal
29
SLIDE 30 30
Basic Architecture
Source figure: Xilinx Example: Spartan 2
- Basic architecture of FPGA
contains the elements which can be fully programmed CLBs Memory IOs Interconnect Clocking
SLIDE 31 31
Configurable Logic Block (CLB)
Source figure : Xilinx Example: Spartan 6
- CLBs enable full functional
programmability programmable Lookup-tables (LUT) for arbitrary combinational function selectable/programmable sequential cell for targeted distributed memory function use of multiplexors for interconnecting the correct function
SLIDE 32 32
I/O Block
Source figure: Xilinx Example: Spartan 6
- IO pads in FPGAs are fully reconfigurable
support different IO directions (I, O, IO) single ended /differential different interface standards (CMOS, TTL, LVDS) different power supplies (3.3V, 2.5V, 1.8V, 1.5V, 1.2V) pullups, pulldowns, with and wo registering
SLIDE 33 FPGA Clocking
33
Example: Spartan 6
- Clocking in FPGAs is also programmable
based on DCMs which can be programmed in frequency/phase and aligned with other clock sources
- Clock driver is routed to all relevant
sinks CLBs, memory, IOs
Source figure: Xilinx
SLIDE 34 FPGA Design Flow
34
Source figure eet.com
- Design flow corresponds to the one
for ASIC, but with different implementation Synthesis – translation of HDL into components of FPGA Place – placing the netlist into CLBs of FPGA Route – programming interconnects to execute the function
SLIDE 35
FPGA Pros and Cons Pros Reducing NRE costs – no mask costs, reduced design costs Reducing design time – no need to wait for chip samples Possibility for easy correction – only reprogramming needed Cons High unit costs – one FPGA can be even ~10k€ Higher power consumption Reduced performances Today’s FPGA much more optimal Integrating multiprocessors on chip, DSPs, interfaces etc.
35
SLIDE 36
Example- Xilinx Zynq Ultrascale+ Example of optimized FPGA platform Multi-core ARM system implemented on chip Large memory resources Advanced connectivity (USB, PCIe, CAN, SATA, etc) Real-time support Combining with programmable logic Support for high-speed serial interfaces
36
Source figure: Xilinx
SLIDE 37 Conclusions
- Process of designing ASICs was here analysed in details.
- Main stapes include the synthesis, back-end and timing verification
- During the practical part we will analyze the steps using the software
CAD tools
- FPGA flow is similar to ASIC flow
37