regular fabrics for retiming regular fabrics for retiming
play

Regular Fabrics for Retiming & Regular Fabrics for Retiming - PowerPoint PPT Presentation

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global Interconnects Pipelining over Global Interconnects Jason Cong Jason Cong Computer Science Department Computer Science Department University of


  1. Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global Interconnects Pipelining over Global Interconnects Jason Cong Jason Cong Computer Science Department Computer Science Department University of California, Los Angeles University of California, Los Angeles cong@cs cs. .ucla ucla. .edu edu cong@ http://cadlab cadlab. .cs cs. .ucla ucla. .edu edu/~cong /~cong http:// FCRP Interconnect Workshop, June 28, 2002 FCRP Interconnect Workshop, June 28, 2002 DUSD(Labs)

  2. Overarching GSRC Research Emphasis Overarching GSRC Research Emphasis [Jan Rabaey Rabaey, June 2002] , June 2002] [Jan A broadened focus on application-oriented embedded systems under tight cost, PDA, and time-to-market constraints Founded on One Basic Principle “From Ad- -Hoc System Hoc System- -on on- -a a- -Chip Design Chip Design “From Ad to Disciplined, Platform- -Based Design” Based Design” to Disciplined, Platform

  3. The Discipline of Platform- -Based Design Based Design The Discipline of Platform Application Application Programming Model: Kernels/Benchmarks Models/Estimators Architecture(s) Architecture(s) Architectural Platform Architectural Platform Microarchitecture(s) Microarchitecture(s) Functional Blocks, Cycle-speed, power, area Interconnect V S G S V S V V S S G G S S V S V S Circuit Fabric(s) Circuit Fabric(s) S SV G Silicon Implementation Platform V Silicon Implementation Platform S Manfacturing Interface Manfacturing Interface Delay, variation, Basic device & interconnect SPICE models structures Silicon Implementation Silicon Implementation

  4. The Discipline of Platform- -Based Design Based Design The Discipline of Platform Application Application Comm Comp and Comm Based Design Based Design Programming Model: Programmable Systems Programmable Systems Kernels/Benchmarks Calibrating Achievable Design Test, Verification, Energy&Power Calibrating Achievable Design Comp and Test, Verification, Energy&Power Models/Estimators Architecture(s) Architecture(s) Architectural Platform Architectural Platform Microarchitecture(s) Microarchitecture(s) Constructive Fabrics Constructive Fabrics Functional Blocks, Cycle-speed, power, area Interconnect Circuit Fabric(s) Circuit Fabric(s) Silicon Implementation Platform Silicon Implementation Platform Manfacturing Interface Manfacturing Interface Delay, variation, Basic device & interconnect SPICE models structures Silicon Implementation Silicon Implementation

  5. From Architecture to Silicon Implementation Platform From Architecture to Silicon Implementation Platform Different targets employ different intermediate platforms, hence � Different targets employ different intermediate platforms, hence � different layers of regularity and design regularity and design- -space constraints space constraints different layers of Design space may actually be smaller smaller than with large steps! than with large steps! � Design space may actually be � � Large Large- -step predictions/abstractions may misguide the optimizations step predictions/abstractions may misguide the optimizations � Architecture Logic Regularity Component Regularity and Reuse Regular Fabrics Geometrical Regularity Silicon Implementation Constructive Fabrics Th [Source: Larry Pileggi]

  6. Sample Work from the GSRC Fabric Theme Sample Work from the GSRC Fabric Theme � Bob Bob Brayton Brayton: Topologically Constrained Logic Synthesis : Topologically Constrained Logic Synthesis � � Malgorzata Marek Malgorzata Marek- -Sadowska Sadowska: Interconnecting Regular Fabrics : Interconnecting Regular Fabrics � � Wojtek Maly Wojtek Maly: Geometrical Regularity : Geometrical Regularity � � Herman Herman Schmit Schmit: Regular Communication Fabrics : Regular Communication Fabrics � � Jason Cong Jason Cong: : Regular Fabrics for Retiming and Pipelining over Regular Fabrics for Retiming and Pipelining over � Global Interconnects Global Interconnects

  7. Motivation: How Far Can We Go in Each Clock Cycle Motivation: How Far Can We Go in Each Clock Cycle 7 clock � NTRS’97 0.07um Tech � 5 G Hz across-chip clock 6 clock � 620 mm 2 (24.9mm x 24.9mm) � IPEM BIWS estimations � Buffer size: 100x � Driver/receiver size: 100x 5 clock � From corner to corner: � 7 clock cycles 4 clock 3 clock 1 clock 2 clock 15.04 22.56 24.9 (mm) 0 7.52

  8. Solutions Solutions Fully asynchronous designs � Fully asynchronous designs � GALS (global asynchronous locally synchronous designs) � GALS (global asynchronous locally synchronous designs) � � Latency Latency- -insensitive designs insensitive designs � Synchronous designs, with multi- -cycle communications cycle communications � Synchronous designs, with multi � � Much better understood Much better understood � � Supported by the current tool set Supported by the current tool set � � More energy efficient ? More energy efficient ? �

  9. Need of Considering Retiming during Placement Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects - Retiming/pipelining on global interconnects � Multiple clock cycles are needed to cross the chip Multiple clock cycles are needed to cross the chip � � Proper placement allows retiming to Proper placement allows retiming to hide hide global interconnect delays. global interconnect delays. � Placement 1 Placement 2 b c d a c d a b d(v)=1, WL=6, d(e) ∝ WL d(v)=1, WL=6, d(e) ∝ WL Before retiming, φ = 4.0 Before retiming, φ = 5.0 Better Initial Placement !! After retiming, φ = 3.0

  10. Need of Considering Retiming during Placement Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects - Retiming/pipelining on global interconnects � Multiple clock cycles are needed to cross the chip Multiple clock cycles are needed to cross the chip � � Proper placement allows retiming to Proper placement allows retiming to hide hide global interconnect delays. global interconnect delays. � Placement 1 Placement 2 b c d a c d a b d(v)=1, WL=6, d(e) ∝ WL d(v)=1, WL=6, d(e) ∝ WL Before retiming, φ = 4.0 Before retiming, φ = 5.0 Better Initial Placement !! After retiming, φ = 3.0 After retiming, φ = 4.0

  11. Difficulties Difficulties � How to consider retiming/pipelining over global How to consider retiming/pipelining over global � interconnects interconnects � Flip Flip- -flop boundaries are not fixed during placement, difficult to do flop boundaries are not fixed during placement, difficult to do � static timing analysis static timing analysis Use of the concepts of c-retiming and sequential timing analysis (Seq-TA) � How to handle the high complexity of the combined problem How to handle the high complexity of the combined problem � Use the multi-level optimization technique

  12. Static Timing Analysis (STA) Static Timing Analysis (STA) Sequential circuit example: PI: a, b. PO: g. a d c e g b f Suppose d(v)=1, d(e)=2 a a b g f c d e d AT: 1 1 3 3 3 6 9 c e g Suppose clock cycle φ =11 RT: 9 9 11 9 3 6 9 a f Transform the circuit into a DAG for static timing analysis Topological order: a,b,g,f,c,d,e Compute arrival time (AT) and required time (RT) of each node are computed in linear time.

  13. Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) Sequential Arrival Time (SAT) � Definition [Pan et al, TCAD98] Definition [Pan et al, TCAD98] � Given a clock period φ , transfer circuit transfer circuit C C into an edge into an edge- -weighted vertex weighted weighted vertex weighted Given a clock period φ , � � graph G, G, graph � Label vertex v as l Label vertex v as l ( ( v v ) = the weight of longest path from PIs to v = max{ ) = the weight of longest path from PIs to v = max{ l l ( ( u u ) ) - - φ φ · · � w ( ( u,v u,v ) + ) + d d ( ( u,v u,v ) + ) + d d ( ( v v )}, )}, l l ( ( v v ) is also called ) is also called SAT(v). SAT(v). w ≤ φ (POs) ≤ Theorem: C Theorem: C can be retimed to can be retimed to φ φ + max{ + max{ d d ( ( v v )} iff )} iff l l (POs) φ � � ) =   l φ   - Relation to retiming: r r ( ( v v ) = l ( ( v v ) / ) / φ - 1 1 Relation to retiming: � � Complexity is O(VE) Complexity is O(VE) � � w l (a,c)= d(e (a,c) ) - φ φ · · w w ( ( a,c a,c ) ) a d(a) w ( ( a,c a,c )=1 )=1 w a l ( a ) = 7 d(c) c d ( a )= d (b) = 1, d ( a,c ) = d ( b,c )= 2, φ = 5 c l ( c ) = max{7+2-5·1+1, 3+2+1} = 6 b l ( b ) = 3 w ( w ( b.c b.c )=0 )=0 b d(b) w l (b,c)= d(e (b,c) ) - φ φ · · w w (b (b ,c ,c ) )

  14. Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) Sequential Arrival Time (SAT) Retiming graph (not a DAG) Sequential circuit a a 2 d d -2.5 c e g c e g 2 2 -2.5 -2.5 -7 b -2.5 b -2.5 f f d(v)=1, d(e)=2 Is φ = 4.5 possible ? Iter# a b c d e f g Retimed circuit 0 0 0 - ∞ - ∞ - ∞ - ∞ - ∞ a 1 0 0 -1.5 - ∞ - ∞ - ∞ - ∞ d 2 0 0 -1.5 1.5 1.5 - ∞ - ∞ c e g 3 0 0 -1.5 1.5 4.5 0 0 4 0 0 -1.5 1.5 4.5 0 0 b 5 0 0 -1.5 1.5 4.5 0 0 f Cycle time 4.5 is possible because l (g) ≤ 4.5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend