Synthesis and optimization of domino logic Min Zhao and Sachin - PDF document

Synthesis and optimization of domino logic Min Zhao and Sachin Sapatnekar Department of Electrical Engineering University of Minnesota Minneapolis, MN 55455 1 Outline I Introduction to domino logic I Domino logic synthesis flow I Technology mapping of domino logic I Timing-driven static-domino partitioning 2 1 1 1

Basics of domino logic clk out precharge evaluation d y out z x T c,f T c,r T c,f + P clk d : dynamic node 3 Advantages of domino logic I Speed advantages – Reduced fighting during transitions – Fewer transistors per gate, lower capacitive load I Area advantages – Mainly consists of NMOS – N+4 transistors instead of 2N transistor per gate I Therefore, domino logic is widely used in high- performance circuit design. 4 2 2 2

Disadvantages of domino logic I Disadvantages – Non-inverting nature may require logic duplication – Strict timing constraints – Charge sharing, noise susceptibility – High clock routing overhead I Need automated techniques considering these issues for domino circuit design 5 Domino logic synthesis flow Logic description(BLIF, Verilog) Timing constraints Technology independent optimization Clocking strategy Partitioning - static-domino, between clock phases Parameterized library technology mapping Timing verification and optimization Library layout synthesizer Noise verification and optimization Physical design 6 3 3 3

Technology mapping of domino logic 7 What is technology mapping? I Implement input network with gates in a library. a b c d e f g h 8 4 4 4

Parameterized library I Large NMOS pull-down network of domino gate. – Small short circuit current and small driven load. – No complementary part. – The delay overhead of inverter may offset the advantage of fast switch speeds in small gates. I Dramatical increase of library number with the increase of length(s) and width(p) of gate. – (s,p): (3,6): 6877; (4,4): 3503; (4,6): 222943 I A parameterized library is applied for technology mapping of domino logic. 9 Problem definition I A parameterized library I A collection of gates that satisfy the constraints on the width and height of the pull-down(pull-up) implementation of a gate. I Cell layout produced on the fly I Technology mapping of domino logic – Given I An optimized Boolean network I A constraint on the width and height of domino gates – Find I Minimum cost solution to the problem that nodes in the network are implemented in domino logic 10 5 5 5

General technology mapping algorithm I Dynamic programming algorithm is applied. I At each network node – pattern matching – cost calculation for each possible matching I The cost will be large if the library is large. 11 Parameterized library mapping algorithm I Starting point I Given an arbitrarily optimized network I It is first unated I Then mapped into a two input AND-OR DAG I Then the DAG is decomposed into trees. I Complexity – space complexity: O(WHN) – time complexity: O(W 2 H 2 N) I W: maximum number of parallel chains I H: maximum number of series transistors I N: number of nodes in the tree 12 6 6 6

Subsolutions I Subsolution space at each node. W H {S,P} S = 2, S ≤ H P = 3, P ≤ W I Each stored subsolution is optimal for its subtree under specified constraints I Physically, – {S,P}(S ≥ 1 & P ≥ 1) represents a segment of a domino pull-down whose height and width are S and P – {1,1} represents a complete domino gate or a PI. 13 Basic Operations I OR operation: S=max(S l , S r ), P=P l +P r I AND operation: S=S l + S r , P=max(P l , P r ) I PI / Gate formation operation: S=1, P=1 – A gate formation operation corresponds to a situation where the structure collected so far is converted to a domino gate with an output at that network node. clk * Gate formation AND clk PI PI 14 7 7 7

Node data structure I Store the optimal subsolutions for all possible [height, width] combinations from [1,1] to [H,W]. I Each optimal subsolution can be represented as {S, P, C, {S l , P l }, {S r , P r }} I S (1 ≤ S ≤ H) is the maximum height of the current solution. I P (1 ≤ P ≤ W) is the maximum width of the current solution. I C is the cost. I {S l , P l }, {S r , P r } is the subsolutions of left and right child whose combination provides the minimal cost of subsolution {S,P} 15 Node data calculations I {S, P} (S ≥ 1 & P ≥ 1) subsolution at a parent node is obtained by combining optimal subsolutions at child nodes. I {1, 1} subsolution at a node is obtained from the subsolution of the same node whose cost is minimal. I The procedure consists of – Node constraint functions – Node cost functions 16 8 8 8

Node cost functions I Here, cost is area -- the number of transistors. I Literal operation: C=C+1 – Literal operation corresponds to a primary input or a situation where a new domino structure is started after gate formation operation. I OR/AND operation: C=Literal(C l ) + Literal(C r ) I Gate formation operation: C=C min +4 – The minimal cost solution, C min is the minimal value out of all H*W optimal subsolutions – ‘4’ includes two clock control transistors + an inverter 17 Node mapping algorithm For each valid [height width] subsolution of the left child { for each valid [height width] subsolution of the right child{ {S,P}= Node constraint functions ({S l , P l }, {S r , P r }); if {S, P} was within the constraints (H, W) { C = Node cost functions (C l , C r ) if (C<C[S,P] min ) then C[S, P] min = C. if (C<C min ) then C min =C. } } } C[1,1] = Gate formation ( C min ) 18 9 9 9

An example Of all (S,P) mapping subsolutions for the children only those with I minimal cost are stored {4,3,8} {4,2,15} 8,{2,2},{2,3} {3,3,13} 13,{2,1},{2,3} {S, P, C} AND {3,2,13} {3,1,18} {2,2,3} {2,1,18} {2,1,8} {1,1,12} C min =8 AND node: AND OR {1,1,7} C = Cl+Cr P = max(Pl,Pr) S = Sl+Sr {2,3,5} Or node: C = Cl+Cr OR {3,2,7} PI P = P l + P r {1,1,9} S = max(S l , S r ) {1,2,2} {1,1,0} Gate formation: {1,1,6} C = C min + 4 S = 1 19 P = 1 Wide domino gate I NAND, NOR gate can be used to replace inverter. – Break up large stacks of series transistors into parallel chains 20 10 10 10

Wide AND/OR domino gate mapping I Enlarged subsolution space is used. W b 2H a H a c 2W I Region a: standard domino gate mapping I Region b: wide AND domino gate mapping I Region c: wide OR domino gate mapping 21 Dual-monotonic gate I A common dual-monotonic XOR gate. clk clk O=a XOR b O=a XNOR b a a a a b b clk clk I The presence of an XOR/XNOR function decomposes the input network into small mapping trees, which causes a larger area and delay cost. 22 11 11 11

Dual-monotonic gate mapping I Recognize the XOR/XNOR logic of the network by pattern matching. I Perform the technology mapping on the AND/OR/XOR/ XNOR subject network, mapping AND/OR nodes to the standard domino gate and XOR/XNOR nodes to dual- monotonic gate. I Permitted mapping scheme. XOR/XNOR XOR/XNOR OTHER OTHER XOR/XNOR AND/OR NODES NODES 23 Implementation and results(1) I Execution time: < 10 seconds I Comparison with another domino mapper Circuits Our approach Prasad et al. Reduction #trans/#level #trans/#level % 289/6 328/7 13.5% c8 890/2 890/3 0% I6 C880 1056/9 1499/7 42.0% I Comparison of various mapping methods Circuits Basic mapping Wide AND/OR gate Dual-mono gate #trans/#level #trans/#level #trans/#level 1824/9 1824/9 1360/7 C1355 C1908 1978/18 1965/18 1588/14 k2 2884/16 2738/15 2884/16 24 12 12 12

Experimental results I Domino mapping vs. static mapping Circuits Domino SIS: 44-3.genlib Reduction Dup-ratio #trans/#levels #trans/#levels % % i6 761/3 1194/5 36.3% 13% C1355 1360/7 1378/20 1.3% 77% 4002/20 3140/34 -27.5% 92% C3540 25 Partitioning: Motivation I Use domino gates to speed up parts of the circuit; remainder is implemented in static CMOS I Domino logic is typically multiphase I General clocking strategy CLK Latch on ph1 Latch on ph1 Latch on ph2 Domino chain Domino chain Static Static Evaluated in ph2 Evaluated in ph1 Precharged in ph1 Precharged in ph2 26 13 13 13

Another consideration I Observation: duplication cost can be reduced by proper partitioning I An example static * CUT A * * CUT B * * domino c + + + + * * * * * * * * I In addition to the partitioning cost, implementation cost varies with partitions. 27 Problem definition I Static-domino partitioning problem – Given I An optimized combinational circuit I The delay specification on the output of the network – Implement the nodes with domino+static logic I Minimize the cost while meeting delay specs I Satisfy the precedence constraints that no static logic gate is permitted to fan out a domino gate I Two-way domino partitioning I Partition the domino implementation into two phases, with inverters permitted between the phases. 28 14 14 14

Synthesis and optimization of domino logic Min Zhao and Sachin - PDF document

Synthesis and optimization of domino logic Min Zhao and Sachin Sapatnekar Department of Electrical Engineering University of Minnesota Minneapolis, MN 55455 1 Outline I Introduction to domino logic I Domino logic synthesis flow I Technology

Domino Personal listening system A presentation Domino , R3 Domino | Features Domino Uses

Novidades 6 .5 Domino Server 6.5 Domino Designer 6.5 Notes 6.5 Domino Web Access Lotus

The difference between hearing and hearing Domino , R2 Domino Classic | Introduction

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

Lets count: Domino tilings Christopher R. H. Hanusa Queens College, CUNY 2 n 3 n n n

Why Again Logic Synthesis Giovanni De Micheli Why again logic synthesis? Strong

Logic Synthesis Overview Design flow Principles of logic synthesis Logic

Logic Synthesis Page 1 Introduction to Digital VLSI Logic Synthesis Course Outline Design

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Random matrices and Aztec diamonds Kurt Johansson Florence, May, 2015. Domino Tilings of the

Mathematical colorings Kaloyan Slavov Department of Mathematics ETH Z urich

Insertion algorithms for shifted domino tableaux Zakaria Chemli, Mathias P etr eolle S

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Code Modification Forum Telco/Webex, Wednesday, 25 th March 2020 Agenda 1. Review of minutes

Reducing the overhead of direct application instrumentation using prior static analysis 3 rd May

Motivations for a CRT J. Klein, Penn, DUNE CTF Workshop Parameter Classes 1. Universal:

EECS 583 Class 6 Dataflow Analysis University of Michigan September 22, 2014 Announcements

CSE 311: Foundations of Computing Lecture 3: Digital Circuits & Equivalence Homework #1

Low-Depth, Low-Size Circuits for Cryptographic Applications Joan Boyar* 1 Magnus Gausdal Find 2

ECE 3060 VLSI and Advanced Digital Design Lecture 5 Complex Gates Example: NAND Gate (Vertical)

Using Logic-Based Reduction for Adversarial Component Recovery* J. Todd McDonald, Eric D. Trias,

Synthesis and optimization of domino logic Min Zhao and Sachin - PDF document

Synthesis and optimization of domino logic Min Zhao and Sachin Sapatnekar Department of Electrical Engineering University of Minnesota Minneapolis, MN 55455 1 Outline I Introduction to domino logic I Domino logic synthesis flow I Technology

Domino Personal listening system A presentation Domino , R3 Domino | Features Domino Uses

Novidades 6 .5 Domino Server 6.5 Domino Designer 6.5 Notes 6.5 Domino Web Access Lotus

The difference between hearing and hearing Domino , R2 Domino Classic | Introduction

Domino Tilings Can you tile the grid with L-shaped tiles? Domino Tilings Can you tile the grid

Lets count: Domino tilings Christopher R. H. Hanusa Queens College, CUNY 2 n 3 n n n

Why Again Logic Synthesis Giovanni De Micheli Why again logic synthesis? Strong

Logic Synthesis Overview Design flow Principles of logic synthesis Logic

Logic Synthesis Page 1 Introduction to Digital VLSI Logic Synthesis Course Outline Design

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Markov Logic Markov Logic Probability First-Order Logic Propositional Logic Markov Logic

Random matrices and Aztec diamonds Kurt Johansson Florence, May, 2015. Domino Tilings of the

Mathematical colorings Kaloyan Slavov Department of Mathematics ETH Z urich

Insertion algorithms for shifted domino tableaux Zakaria Chemli, Mathias P etr eolle S

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Code Modification Forum Telco/Webex, Wednesday, 25 th March 2020 Agenda 1. Review of minutes

Reducing the overhead of direct application instrumentation using prior static analysis 3 rd May

Motivations for a CRT J. Klein, Penn, DUNE CTF Workshop Parameter Classes 1. Universal:

EECS 583 Class 6 Dataflow Analysis University of Michigan September 22, 2014 Announcements

CSE 311: Foundations of Computing Lecture 3: Digital Circuits &amp; Equivalence Homework #1

Low-Depth, Low-Size Circuits for Cryptographic Applications Joan Boyar* 1 Magnus Gausdal Find 2

ECE 3060 VLSI and Advanced Digital Design Lecture 5 Complex Gates Example: NAND Gate (Vertical)

Using Logic-Based Reduction for Adversarial Component Recovery* J. Todd McDonald, Eric D. Trias,

CSE 311: Foundations of Computing Lecture 3: Digital Circuits & Equivalence Homework #1