grid grid to grid grid to to ports clock routing for to
play

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports - PowerPoint PPT Presentation

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung


  1. Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung Tang # , Evangeline F.Y. Young # and C.N. Sze * # Department of Computer Science and Engineering The Chinese University of Hong Kong * IBM Austin Research Laboratory IBM Austin Research Laboratory ISPD ’11, Santa Barbara , USA March 28, 2011

  2. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results E i l R l  Conclusion  Conclusion I SPD 2011 2

  3. Clock Distribution Categories  Clock distribution is an very important issue  Buffered and unbuffered trees  Used in various ASICs  Supported by many physical design tools  See Tsay TCAD’93, Xi DAC’95  Non-tree structure with crosslinks  Intended for reducing clock skews  See Rajaram DAC’04, TCAD’06 See aja a C 0 , C 06  Grid and buffered trees  High performance processors  Sometimes manually design the clock structures  Sometimes manually design the clock structures  See Shelar ISPD’09, TCAD’10, Guru VLSI Circuits’10 I SPD 2011 3

  4. High Performance Clock Distribution  Clock network in high Grid buffers performance microprocessors microprocessors Regional Local  Distributed as global grid External Clock Clock Clock ... buffers buffers followed by buffered trees  See Shelar ISPD 09,  See Shelar ISPD’09 PLL PLL ... TCAD’10, Guru VLSI Regional Local Clock Clock Circuits’10 ... buffers buffers  This paper focuses on the post-grid clock distribution area Post-grid Clock Local Clock Grid Bufer Clock Grid Distribution Network Post grid clock distribution I SPD 2011 4

  5. Post-grid Clock Distribution  In our modeling Global grid  Entire chip divided into several layout areas several layout areas Grid Buffer  Each layout area contains Blocks many blocks many blocks Reserved R d Tracks Port  Each block contains Sequential standard cells and/or macros Global Grid Local Clock Buffer  Each layout area contains Layout Region  100s-1000s clock ports p  Grid wires reserved for Reserved clock routing multilayer tracks  Typically upper mental layers layers I SPD 2011 5

  6. Motivations  Clock distribution of microprocessor:  Crucial importance  Major source of power dissipation  High capacitance usage  18 1% f t t l l  18.1% of total clock capacitance [1] [1] k it  See Pham Solid State Circuits’06  Manually design in practice y g p  Hard to satisfy delay/slew constraints  Time to market  S  See Shelar ISPD’09, TCAD’10 Sh l ISPD’09 TCAD’10 [1]: D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns, et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE Journal of Solid-State Circuits, 41(1):179–196, Jan. 2006. I SPD 2011 6

  7. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 7

  8. Problem Formulation  Input  A set of reserved tracks  Locations and capacitances of ports P  Different types of wires on each metal layer  Delay limit D . Slew limit S  Output  A clock network (may be non tree structures)  A clock network (may be non-tree structures)  Objective  Connecting every port to the source  Satisfying delay and slew constraints  Minimizing capacitance usage  Minimizing capacitance usage I SPD 2011 8

  9. Post-grid Clock Routing  0 7 7 6 6 Layer Layer 5 5 4 4 3 3 0 0 500 500 1800 1800 1600 1600 1600 1600 1400 1400 1400 1400 1200 1200 1000 1000 1000 1000 800 800 600 600 400 400 200 200 1500 1500 0 0 I SPD 2011 9

  10. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 10

  11. Overall Algorithm  Critical ports  Ports with large capacitance or f far away from the source f th  Path expansion algorithm  Elmore-delay driven  Expanding in some selected directions  Post-processing  Wire replacement  Topology refinement  Iterations  The overall algorithm is repeatedly invoked  May fail when number of y iterations > K (user specified) I SPD 2011 11

  12. Delay-driven Path Expansion Algorithm  Basic steps  Simultaneously expand from all ports  Select the path with the minimum Elmore delay to further expand  Connect the ports to the source once the path reaches the source grid  Check delay/slew constraints I SPD 2011 12

  13. A Routing Example  Initially, the heap is empty  First iteration (simultaneously expand from all ports)  Heap={(P 1 ,P 2 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 );(P 3 ,C 2 )}  Second iteration (P 1 ,P 2 )  Heap={(P 3 ,C 2 );(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 )} p {( , );( , , );( , );( , );( , );( , )}  Third iteration (P C )  Third iteration (P 3 ,C 2 )  Heap ={(P 3 ,C 2 ,S 2 );(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 )} I SPD 2011 13

  14. A Routing Example  Fourth iteration (identify chain paths)  Heap ={(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 )}  Chain path={(P 1 ,P 2 ,P 3 );(P 2 ,P 3 )}  Fifth iteration (P 2 ,P 3 )  Heap={(P 1 ,P 2 ,P 3 );(P 1 ,C 1 )}  Chain path={(P 1 ,P 2 ,P 3 );(P 1 ,P 2 )} p {( , , );( , )}  Sixth iteration (P 1 P 2 )  Sixth iteration (P 1 ,P 2 )  Heap={}, chain path={}  Final result I SPD 2011 14

  15. Post-processing Techniques  Wire replacement  Wire replacement  Port with largest delay: P 5  Two types of wires  Replace edge P 1 C 1  Replace edge P 1 C 1   capacitance/resistance tradeoff it / i t t d ff  Replace edge P 4 C 2  Replace edge P 2 P 3 , P 3 C 1  Procedures  Replace P 5 C 3 , C 3 C 2 , C 2 C 1 , C 1 S 1  Identify port P l with the largest p , , , Identify port P l with the largest Elmore delay  Replace wires in a bottom-up style S 1 S 1 S 1 S 1 S 1  Check delay/slew constrains P 2 P 2 P 2 P 2 P 2 P 3 P 3 P 3 P 3 P 3 C 1 C 1 C 1 C 1 C 1 P 1 P 1 P 1 P 1 P 1 C 3 C 3 C 3 C 3 C 3 C 2 C 2 C 2 C 2 C 2 P 4 P 4 P 4 P 4 P 4 P 5 P 5 P 5 P 5 P 5 I SPD 2011 15

  16. Post-processing Techniques  Topology refinement  Topology refinement  Procedures  Elmore delay:  P 5 >P 4 >P 6 >P 2 >P 1 >P 3 >P 7  Disconnect a port P  Sequentially process all the ports  Expand P towards all directions S 1 S 1 S 1 S 1 P 2 P 3 C 1 C 1 C 1 C 1 P 1 P 1 P 1 P 1 P 2 P 2 P 2 P 3 P 3 P 3  Select paths with smaller capacitance i C 2 C 2 C 2 C 3 C 2 P 4 P 4 P 4 P 4 P 5 P 5 P 5  Check delay/slew constraints P 5 P 6 P 6 P 6 C 4 C 4 C 4 P 6 C 4 C 5 C 5 C 5 P 7 P 7 P 7 C 5 P 7 S 2 S 2 S 2 S 2 I SPD 2011 16

  17. Non-tree Extensions  A small number of ports have  Non-tree extensions exceptionally large capacitances  Connect p to S 1  The delay of its shortest path  The delay of its shortest path  Find a second source S 2 exceeds the delay limit D  Add crosslinks  Procedures  Find a third source S 3  Establish a shortest path for p p p  Add crosslinks  Find a second shortest path  Target delay not met? Add all useful corsslinks  Target delay not met? Do the  Target delay not met? Do the same thing for parent node of p I SPD 2011 17

  18. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 18

  19. Experiment Setup  Environment  Implemented in C++  Run on Linux server  Intel Pentium 4 3.2GHz  2GB RAM  Delay setup: 5ps  Slew setup: input: 10ps; output: 15 ps  Benchmarks B h k  3 test cases are provided by industry  11 test cases are from ISPD 2010 Clock Network Synthesis Contest es cases a e o S 0 0 C oc Ne wo Sy es s Co es  Comparisons  Compared with TG, which was proposed by Shelar in ISPD’09, TCAD’10 TCAD’10 I SPD 2011 19

  20. Tree Growing Algorithm  Proposed in R. Shelar ISPD’09,  Tree Growing Algorithm TCAD’10  Expand from the source  D l  Delay/Slew constraints /Sl t i t  Add S 1 C 1 , S 2 C 2  Greedy expansion from the  Add C 2 P 3 source  Add C 1 P 1  Edges with the smallest  Ed ith th ll t  Add P 3 P 2 capacitance will be added into the network S 1 S 1 S 1 S 1 S 1 S 1 S 2 S 2 S 2 S 2 S 2 S 2 C 1 C 1 C 1 C 1 C 1 C 1 C 2 C 2 C 2 C 2 C 2 C 2 P 1 P 1 P 1 P 1 P 1 P 1 P 2 P 2 P 2 P 2 P 2 P 2 P 3 P 3 P 3 P 3 P 3 P 3 I SPD 2011 20

  21. Comparisons: capacitance  Without post-processing: Capacitance (without post-processing techniques) 18 3% i 18.3% improvement t 25000 capacitance (fF) 20000 15000 TG Ours 10000 5000 5000 c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test cases  With topology refinement: Capacitance (with topology refinement) 24.6% improvement 25000 ce(fF) 20000 capacitanc 15000 TG Ours 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test cases I SPD 2011 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend