A Performance-Driven Standard-Cell A Performance-Driven - - PowerPoint PPT Presentation

▶

Jan 18, 2023 123 likes •328 views

A Performance-Driven Standard-Cell A Performance-Driven Standard-Cell Placer Based on a Modified Force- Placer Based on a Modified Force- Directed Algorithm* Directed Algorithm* Yih-Chih Chou Youn-Long Lin Department of Computer Science

SLIDE 1

A Performance-Driven Standard-Cell Placer Based on a Modified Force- Directed Algorithm* A Performance-Driven Standard-Cell Placer Based on a Modified Force- Directed Algorithm*

Yih-Chih Chou Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, Taiwan, R.O.C

* Supported in part by the National Science Council, R.O.C

SLIDE 2

Outline Outline

Motivation Graph Model Proposed Approach Illustrated Example Experimental Flow Experimental Results Conclusions and Future Work

SLIDE 3

Motivation Motivation

Force-Directed Iterative Refinement

Traditional approach:

Have to resolve overlapping; Convergence problem

This work:

We allow overlapping to get a relative placement Move all cells until force-equilibrium is reached

Path Delay Constraint for Performance-Driven Placement

Traditional Approach:

Indirectly handled by distributing timing slack among

nets along the path (Zero-Slack)

This work:

We introduce pseudo link between start and end points of

each path

SLIDE 4

Preliminaries Preliminaries

Core cells C = { } I/O pad cells P = { } Nets N = { }

c c c , , ,

2 1

K

p p p , , ,

2 1

K

n n n , , ,

2 1

K

p1 p4 p3 p2 c2 c1 c4 c3 c5 c6 CLK

SLIDE 5

Graph Model Graph Model

p1 p4 p3 p2 c2 c1 c4 c3 c5 c6 p1 p4 p3 p2 c2 c1 c4 c3 c5 c6

Node: I/O or core cell Normal (Solid) Link: Cell connectivity Pseudo (Dashed) Link: One per path

     − + − − + − =

→

link pseudo path Delay y y x x link normal y y x x c c f

j i j i j i j i j i j i

_ )) ( )( ) ( ) (( _ ) ( ) ( ) , (

2 2 2 2

Force Definition

SLIDE 6

Proposed Approach Proposed Approach

Step 1:

Floorplan, Fix I/O pad location Construct graph; Add pseudo link between path’ s

starting and ending points

Put all core cells at chip center

Step 2:

Iteratively move core cells until all reach force-

equilibrium positions

Vertically aligned to cell rows Horizontal overlapping is allowed

Step 3:

Form cell rows starting with topmost and bottommost Re-balancing the remaining cells and iterate

SLIDE 7

Step 1: All 6 cells at chip center Step 1: All 6 cells at chip center

Graph Construction Force-Equilibrium Cell Positioning Cell Row Formation

c1 c2 c3 c4 c5 c6

P1 P2 P4 P3

R1 R2 R3

p1 p4 p3 p2 c2 c1 c4 c3 c5 c6

SLIDE 8

Completion of Step 2 Overlapping exists Completion of Step 2 Overlapping exists

Graph Construction Force-Equilibrium Cell Positioning Cell Row Formation

c1 c2 c3 c4 c5 c6

P1 P2 P4 P3

R1 R2 R3

SLIDE 9

Step 3: Form Rows R1 and R3 Step 3: Form Rows R1 and R3

Graph Construction Force-Equilibrium Cell Positioning Cell Row Formation

c1 c2 c3 c4 c5 c6

P1 P2 P4 P3

R1 R2 R3

SLIDE 10

Graph Construction Force-Equilibrium Cell Positioning Cell Row Formation

Re-Balancing remaining 2 cells in the middle Re-Balancing remaining 2 cells in the middle

c1 c2 c3 c4 c5 c6

P1 P2 P4 P3

SLIDE 11

Final placement after forming Row R2 Final placement after forming Row R2

Graph Construction Force-Equilibrium Cell Positioning Cell Row Formation

c1 c2 c3 c4 c5 c6

P1 P2 P4 P3

SLIDE 12

Make Step 2 run faster Make Step 2 run faster

The force associated with pseudo link is much

stronger than that with normal link

First, we only let Flip-Flops move Then, we let all cells move 15% Reduction in Total CPU Time

SLIDE 13

Experimental Flow Experimental Flow

RTL in Verilog HDL Synthesis (Synopsys DA) Floorplan (Cadence SEDSM) Wrap Route (Cadence SEDSM) RC Extraction (Cadence HyperExtract) Cell/Net Delay Calc (Cadence SEDSM) Path Analysis (Synopsys DesignTime) Ours Artisan library for TSMC 0.18µm CMOS Process FDP CT or CT+PBO Commercial Tool (CT) with Placement-Based Optimization (PBO)

SLIDE 14

Benchmark Characteristics Benchmark Characteristics

area(ì m2) # I/O # nets # cells Benchmark 1210814 417 27458 27043 64bMAC 8230874 153 209354 191592 a518k 4392336 153 104683 95765 a259k 657251 323 10542 10063 VP2 362695 213 8941 8655 32bMAC 365698 95 4559 4125 sdram_rdr 227405 119 3603 3375 matrix

Available from: http://www.cs.nthu.edu.tw/~ylin/placement.htm

SLIDE 15

Quality and Run Time Comparison Quality and Run Time Comparison

8.7 11.3 8.5 5.8 9.3 3.4 6.5 146908 110688 33122 1802 662 578 116 10 46979 50124 54185 60489 total 18.8 19.1 17.1 15.0 18.5 7.4 6.9 36276 9914 395 211 122 53 8 16.7 17.7 14.5 13.3 16.9 6.2 7.1 38455 10834 430 207 139 51 8 41349 17.1 46313 14.26 a518k 17.4 14.1 13.9 16.2 5.9 7.3 486 509 4.97 64bMAC 11906 13196 12.35 a259k 229 276 13.66 VP2 150 154 4.85 32bMAC 56 35 2.81 sdram_rdr 9 6 8.42 matrix

Impr. %

CT+PBO CPU

Impr. %

Ours (From CT+PBO) CPU

Impr. %

Ours (From CT) CPU CPU

Impr. %

CPU Delay Ours CT Benchmark

Delay in (ns) CPU in (sec) running on Sun UtraSparc80 PBO Area Overhead: 3.81%

SLIDE 16

42031 12064 492 218 146 56 9 12.05 10.29 4.21 11.89 4.30 2.61 7.85 41349 11906 486 229 150 56 9 11.82 10.20 4.27 11.76 4.06 2.64 7.81 45163 12.94 43356 12.14 42310 12.86 a518k 11.03 4.38 12.35 4.35 2.88 7.90 12697 531 263 163 58 9 625 4.33 507 4.50 64bMAC 13802 10.83 11989 11.15 a259k 358 12.09 241 12.45 VP2 171 4.19 156 4.29 32bMAC 63 2.69 60 2.64 sdram_rdr 11 7.79 8 7.80 matrix CPU Delay 2.5 CPU Delay 2 CPU Delay CPU Delay CPU(s) Delay(ns) 3 1.5 1 Benchmark

How much should we weigh the Pseudo-Link(α) ? How much should we weigh the Pseudo-Link(α) ?

α = 2 is a good choice

     − + − − + − =

→

link pseudo path Delay y y x x link normal y y x x c c f

j i j i j i j i j i j i

_ )) ( )( ) ( ) (( _ ) ( ) ( ) , (

2 2 2 2

Force Definition

SLIDE 17

Is Pseudo Link indeed Effective? How much is Needed? Is Pseudo Link indeed Effective? How much is Needed?

13.10 11.43 4.83 12.69 4.43 3.01 8.82 28062 21088 6310 335 179 98 45 7 11.82 10.20 4.27 11.76 4.06 2.64 7.81 54185 41349 11906 486 229 150 56 9 14.59 12.65 5.04 13.63 4.79 3.15 8.53 23362 16953 5834 272 163 87 46 7 42981 total 32666 12.41 a518k 9525 403 208 125 46 8 4.31 64bMAC 10.61 a259k 12.11 VP2 4.26 32bMAC 2.79 sdram_rdr 8.28 matrix CPU Delay Add link for those longer than 50% of the longest CPU Delay CPU Delay CPU Delay Add link for all paths Add link for those longer than 90% of the longest Add no Link Benchmark

Run-Time/Quality Tradeoff

Delay in (ns) and CPU in (sec)

SLIDE 18

Quality is Essentially Independent of Initial Placement Quality is Essentially Independent of Initial Placement

11.88 10.16 4.25 11.84 4.03 2.64 7.82 50124 38455 10834 430 207 139 51 8 56623 54185 total 11.85 10.22 4.30 11.75 4.13 2.66 7.79 43160 12389 607 211 192 53 11 41349 11.82 a518k 10.20 4.27 11.76 4.06 2.64 7.81 486 64bMAC 11906 a259k 229 VP2 150 32bMAC 56 sdram_rdr 9 matrix Delay(ns) From CT CPU(s) Delay(ns) Random CPU(s) CPU(s) Delay(ns) Chip Center Benchmark

SLIDE 19

Conclusions and Future Work Conclusions and Future Work

Force-directed performance-driven placement Model path delay constraint directly with pseudo link Integrated into an industrial flow Significant timing improvement Computationally efficient Quality is independent of initial placement; good

initial placement helps a little bit in run time

Future work

ECO capability (buffer insertion) Handle macro and preplaced blocks