outline
play

Outline Motivation Seeing the Forest and the Why current placement - PowerPoint PPT Presentation

Outline Motivation Seeing the Forest and the Why current placement tools are outdated Trees: Steiner Wirelength Analysis of placement objectives Optimization in Placement A nave attempt at optimization Our placement


  1. Outline � Motivation Seeing the Forest and the � Why current placement tools are outdated Trees: Steiner Wirelength � Analysis of placement objectives Optimization in Placement � A naïve attempt at optimization � Our placement framework Jarrod A. Roy, James F. Lu and Igor L. Markov University of Michigan Ann Arbor � New techniques � Empirical results � Conclusions Motivation (1) Motivation (2) � Place-and-route � The HPWL (half-perimeter wirelength) objective hopelessly outdated – does not account for � Pivotal step in any design flow � Routing demand of multi-pin nets � Closely related to physical synthesis � Detours around obstacles � Is becoming harder every year � Vias � Greater scale, “boulders and dust”, fixed obstacles � Impact of buffers on delay (and where buffers can be inserted) � Novel design techniques require P&R support � Our goal: reduce the gap between placement and routing � Heavily affected by variability by replacing the HPWL objective with realistic routes � P&R in tool flows � Empirical results: consistent improvement over all published P&R results � Single step for designers? � Routability, routed wirelength, via counts � P&R implemented as separate point tools � Compared to Silicon Ensemble (Cadence): � Very little interaction/communication 26% better routed WL, 3% fewer vias � Use different optimization objectives

  2. HPWL vs. Steiner Tree WL vs. MST WL Computing Steiner Trees � Computing HPWL takes linear time, MST super linear (P log P), but Steiner trees are NP-hard � Steiner Tree tools we evaluate : Half-perimeter Steiner (tree) Minimum Spanning � Batched Iterated 1-Steiner (BI1ST) [Kahng,Robins 1992] wirelength wirelength Tree (MST) wirelength � HPWL ≤ Steiner Tree WL ≤ MST WL � Slow ( n 3 ) � Very accurate, even for 20+ pins HPWL > rWL ? � FastSteiner [Kahng,Mandoiu,Zelikovsky 2003] � MST WL: � Faster but less accurate than BI1ST most accurate � FLUTE [Chu 2004, 2005] Internal cell an average � Very fast wiring not � Optimal lookup tables for ≤ 9 pins counted in � Steiner WL: rWL � Less accurate for 10+ pins best fidelity Optimizing Steiner Tree Length Outline = ? + � Simple experiment � Motivation � Take a floorplanner that uses Sim. Annealing � Why current placement tools are outdated (we used Parquet) � Analysis of placement objectives � Consider the wirelength term in its objective function � A naïve attempt at optimization � Replace the HPWL computation � Our placement framework with Min. Steiner-tree length � New techniques (we used FLUTE) � Empirical observations � Empirical results � Slow-down (even for 3-pin nets) – expected � Conclusions � Did not improve StWL – very surprising result !

  3. Existing Placement Framework Existing Placement Framework � Consider placement bins Placement bins Placement bins � Partition them 1 2 1 2 � Use min-cut bisection � Place end-cases optimally End-case placement � Propagate terminals 3 3 4 before partitioning � Terminals: fixed cells or cells outside current bin pins of one net � Assigned to one of partitions propagated � Save runtime: a 20-pin may “propagate” into 3-pin net � “Inessential nets”: fixed terminals in both partitions (can be entirely ignored) � Traditional min-cut placement tracks HPWL � Traditional min-cut placement tracks HPWL Better Modeling of HPWL Key Observation by Net Weights In Min-cut � For bisection, cost of each net is characterized by 3 cases � Introduced in Theto placer [Selvakkumaran 2004] � Cost of net when cut w cut � Refined in [Chen 2005] � Cost of net when entirely in left partition: w left � Shown to accurately track HPWL � Cost of net when entirely in right partition: w right � Uses three net costs � In our work, we compute these costs � w left : HPWL when all cells on left side (a) � w right :HPWL when all cells on the right (b) using realistic routes � w cut : HPWL when cells on both sides (c) � Can/should account for both X and Y � In min-cut partitioning, represents compontents of cost each net with 1 or 2 hyper-edges � Real difficulty in data structures! Figure from [Chen,Chang,Lin 2005]

  4. Optimizing Steiner WL Our Contributions During Global Placement � Optimization of Steiner WL � Recall: each net can be modeled � In global placement (runtime penalty ~25%) by 3 numbers � In detail placement � This has only been applied to HPWL optimization � Whitespace allocation to tame congestion � We calculate w top , w bottom , w cut � Empirical evaluation of ROOSTER using Steiner-tree evaluator � No violations on 16 IBMv2 benchmarks (easy + hard) � For each net, before partitioning starts � Consistent improvements of published results � 4-10% by routed wirelength � The bottleneck is still in partitioning → can afford a fast Steiner-tree evaluator � 10-15% by via counts � Vs Cadence: 26% better rWL, 3% fewer vias Net Weights from Steiner Trees Net Weights from Steiner Trees w top w bottom w cut w top w bottom w cut � For horizontal cutlines: w top , w bottom , w cut � Pitfall : cannot propagate terminals ! � For vertical cutlines: w left , w right , w cut � Nets that were inessential are now essential � Optimal tree may look very different for each cost � Must consider all pins of each net � Recompute tree from scratch each time � More accurate modeling, but potentially much slower

  5. New Data Structure 6 4 6 4 Pointsets in Action for Global Placement 2 2 1 1 4 2 2 4 2 2 � Consider a net � For each net, two pointsets with multiplicities with 4 movable pins � Unique locations of fixed & movable pins 1 1 1 1 � At top placement layers, very few unique pin positions (except for fixed I/O pins) 1 1 1 1 1 1 1 2 1 2 � Avoid repetitive/expensive re-computation � Maintain the number of pins at each location � Sorted by (x,y) to enable batched linear-time operations � Easy detection of duplicates; binary search � Fast maintenance when pins get reassigned to partitions (or move) � Facilitates efficient computation of the 3 costs � If net has large number (> 20) of unique locations, resort to HPWL Optimizing Steiner WL Improvement in Global Placement in Detail Placement * * 1 2 3 4 5 * * � Results depend on the Steiner tree evaluator � We leverage the speed of FLUTE � Surprisingly , running 2 or 3 evaluators and picking with two sliding-window optimizers * * 3 2 5 4 1 * * min wirelength is worse than using a single evaluator � Exhaustive enumeration for 4-5 cells in a single row � Interleaving by dynamic programming (5-8 cells) � Quality of Steiner-tree evaluation for 9+ pins matters � Explores an exponential solution space in polynomial time � But for 20+ unique locations use HPWL (also tried MST) � Fast but not always optimal � We choose FastSteiner * * 1 2 3 4 A B C D * * (versus BI1ST and FLUTE) � Details in Appendix B of our ISPD`06 paper * * 1 A 2 B 3 4 C D * * � Impact of changes to global placement � Steiner WL ↓ 0.69%, routed WL ↓ 1.39% � Results consistent across IBMv2 benchmarks � [global + detail] runtime ↑ 11.83% � Steiner WL ↓ 2.9% , HPWL ↑ 1.3%, runtime ↑ 27%

  6. Congestion-based Cutline Shifting Empirical Results: IBMv2 � Non-uniform whitespace allocation � Performed during global placement ROOSTER: Rigorous Optimization Of Steiner Trees Eases Routing � Uses progressive top-down congestion estimates Routes with Published results: � Main idea: after each min-cut, Routed WL Ratio Via Ratio Violation shift the cutline to balance congestion ROOSTER 1.000 1.000 0/16 � Area constraints must always be met mPL-R+WSA 1.055 1.156 0/16 � More whitespace to the more congested bin APlace 1.0 1.042 1.119 1 /8 15% 15% 10% 20% Capo 9.2 1.056 Not published 0/16 Cutline shifting WS WS WS WS Dragon 3.01 1.107 Not published 1 /16 Congestion Congestion Congestion Congestion FengShui 2.6 1.093 Not published 7 /16 150 150 100 200 Most recent results: � Compared to WSA [Li 2004], no need for legalization, mPL-R+WSA 1.007 1.069 0/16 reduces #vias APlace 2.04 0.968 1.073 2 /16 � Technical difficulty: maintain congestion estimates FengShui 5.1 1.097 1.230 10 /16 efficiently over a slicing floorplan (not a grid) AmoebaPlace vs. ROOSTER with several detail placers: IBMv2 � IWLS 2005 benchmarks � http://iwls.org/iwls2005/benchmarks.html � All IWLS placements routed with NanoRoute Routes with Via Ratio Routed WL Ratio Violation Rooster AmoebaPlace ROOSTER 1.000 1.000 0/16 rWL Vias Viols rWL Vias Viols aes_core 1 1.657 131049 1 ROOSTER+WSA 0.990 1.004 0/16 1.271 126645 ethernet 2 7.745 471800 1 6.145 413323 ROOSTER+ 1.041 1.089 2/16 Dragon 4.0 DP mem_ctrl 0 1.224 90067 0 0.890 89153 ROOSTER+ pci_bridge32 0 1.598 117326 2 1.176 115675 1.114 1.248 16/16 FengShui 5.1 DP usb_funct 0 1.106 85739 0 0.860 85329 vga_lcd 1083504 1 25.405 2 24.447 1076178 Ratio 1.000 1.000 1.265 1.032

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend