Construction of Realistic Gate Construction of Realistic Gate - - PowerPoint PPT Presentation

construction of realistic gate construction of realistic
SMART_READER_LITE
LIVE PREVIEW

Construction of Realistic Gate Construction of Realistic Gate - - PowerPoint PPT Presentation

Construction of Realistic Gate Construction of Realistic Gate Sizing Benchmarks Sizing Benchmarks With Known Optimal Solutions With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego International


slide-1
SLIDE 1
  • 1-

UC San Diego / VLSI CAD Laboratory

Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions

Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego

International Symposium on Physical Design March 27th, 2012

slide-2
SLIDE 2
  • 2-

Outline Outline

 Background and Motivation  Benchmark Generation  Experimental Framework and Results  Conclusions and Ongoing Work

slide-3
SLIDE 3
  • 3-

Gate Sizing in VLSI Design Gate Sizing in VLSI Design

 Gate sizing

– Essential for power, delay and area optimization – Tunable parameters: gate-width, gate-length and threshold voltage – Sizing problem seen in all phases of RTL-to-GDS flow

 Common heuristics/algorithms

– LP, Lagrangian relaxation, convex optimization, DP, sensitivity-based gradient descent, ...

  • 1. Which heuristic is better?
  • 2. How suboptimal a given sizing solution is?

 systematic and quantitative comparison is required

slide-4
SLIDE 4
  • 4-

Suboptimality of Sizing Heuristics Suboptimality of Sizing Heuristics

 Eyechart *

– Built from three basic topologies, optimally sized with DP – allow suboptimalities to be evaluated – Non-realistic: Eyechart circuits have different topology from real design – large depth (650 stages) and small Rent parameter (0.17)

 More realistic benchmarks are required along w/

automated generation flow

*Gupta et al., “Eyecharts: Constructive Benchmarking of Gate Sizing Heuristics”, DAC 2010.

Chain MESH STAR

slide-5
SLIDE 5
  • 5-

Our Work: Realistic Benchmark Generation w/ Known Optimal Solution Our Work: Realistic Benchmark Generation w/ Known Optimal Solution

  • 1. Propose benchmark circuits with known optimal solutions
  • 2. The benchmarks resemble real designs

– Gate count, path depth, Rent parameter and net degree

  • 3. Assess suboptimality of standard gate sizing approaches

Automated benchmark generation flow

slide-6
SLIDE 6
  • 6-

Outline Outline

 Background and Motivation  Benchmark Considerations and

Generation

 Experimental Framework and Results  Conclusions and Ongoing Work

slide-7
SLIDE 7
  • 7-

Benchmark Considerations Benchmark Considerations

 Realism vs. Tractability to Analysis – opposing goals  To construct realistic benchmark: use design

characteristic parameters

– # primary ports, path depth, fanin/fanout distribution

 To enable known optimal solutions

– Library simplification as in Gupta et al. 2010: slew-independent library

0.2 0.4 0.6 1 2 3 4 5 6 fanin fanout

design: JPEG Encoder

Fanin distirbution 25% : 1-input 60% : 2-input 15% : > 3-input Path depth: 72

  • Avg. net degree: 1.84

Rent parameter: 0.72

slide-8
SLIDE 8
  • 8-

Benchmark Generation Benchmark Generation

 Input parameters

  • 1. timing budget T
  • 2. depth of data path K
  • 3. number of primary ports N
  • 4. fanin, fanout distribution fid(i), fod(j)

 Constraints

– T should be larger than min. delay of K-stage chain

  •  Generation flow
  • 1. construct N chains with depth K
  • 2. attach connection cells (C )
  • 3. connect chains

 netlist with N* K + C cells

slide-9
SLIDE 9
  • 9-

Benchmark Generation: Construct Chains Benchmark Generation: Construct Chains

  • 1. Construct N chains each with depth k (N* k cells)
  • 2. Assign gate instance according to fid(i)
  • 3. Assign # fanouts to output ports according to fod(o)

 Assignment strategy: arranged and random

slide-10
SLIDE 10
  • 10-

Benchmark Generation: Construct Chains Benchmark Generation: Construct Chains

  • 1. Construct N chains each with depth k (N* k cells)
  • 2. Assign gate instance according to fid(i)
  • 3. Assign # fanouts to output ports according to fod(o)

 Assignment strategy: arranged and random

fanout fanin

Arranged assignment Random assignment

slide-11
SLIDE 11
  • 11-

Benchmark Generation: Find Optimal Solution with DP Benchmark Generation: Find Optimal Solution with DP

  • 1. Attach connection cells to all open fanouts
  • to connect chains keeping optimal solution
  • 2. Perform dynamic programming with timing budget T
  • ptimal solution is achievable w/ slew-independent lib.
slide-12
SLIDE 12
  • 12-

Benchmark Generation: Solving a Chain Optimally (Example) Benchmark Generation: Solving a Chain Optimally (Example)

6

8 20 1

INV1 INV2 INV3 Dmax = 8

1 10 2 2 10 2 3 5 1 4 5 1 5 5 1 6 5 1 7 5 1 8 5 1 3 20 2 4 15 1 5 15 2 6 10 1 7 10 1 8 10 1 4 20 2 5 15 1 6 15 2 7 10 1 8 10 1 Stage 1 Stage 2 Stage 3 Stage 3 Stage 1 Stage 2 Budget Power Size Budget Power Size Budget Power Size

Load = 3 Load = 6 Load = 3 Load = 6

size input cap leakage power delay load 3 load 6 Size 1 3 5 3 4 Size 2 6 10 1 2

2 10 2 3 10 2 4 5 1 5 5 1 6 5 1 7 5 1 8 5 1 8 25 2

OPTIMIZED CHAIN

size 2 size 1 size 1

slide-13
SLIDE 13
  • 13-

Benchmark Generation: Connect Chains Benchmark Generation: Connect Chains

1. Run STA and find arrival time for each gate 2. Connect each connection cell to open fanin port

  • connect only if timing constraints are satisfied
  • connection cells do not change the optimal chain solution
  • 3. Tie unconnected ports to logic high or low

VDD

slide-14
SLIDE 14
  • 14-

Benchmark Generation: Generated Netlist Benchmark Generation: Generated Netlist

 Generated output:

– benchmark circuit of N* K + C cells w/ optimal solution Schematic of generated netlist (N = 10, K = 20) Chains are connected to each other  various topologies

slide-15
SLIDE 15
  • 15-

Outline Outline

 Background and Motivation  Benchmark Generation  Experimental Framework and Results  Conclusions and Ongoing Work

slide-16
SLIDE 16
  • 16-

Experimental Setup Experimental Setup

 Delay and Power model (library)

– LP: linear increase in power – gate sizing context – EP: exponential increase in power – Vt or gate-length

 Heuristics compared

– Two commercial tools (BlazeMO, Cadence Encounter) – UCLA sizing tool – UCSD sensitivity-based leakage optimizer

 Realistic benchmarks: six open-source designs  Suboptimality calculation

Suboptimality = powerheuristic - poweropt poweropt

slide-17
SLIDE 17
  • 17-

Generated Benchmark - Complexity Generated Benchmark - Complexity

 Complexity (suboptimality) of generated benchmark

Chain-only vs. connected-chain topologies

0.0% 5.0% 10.0% 15.0% 20.0%

chain-only connected

0.0% 5.0% 10.0% 15.0% 20.0%

chain-only connected

Suboptimality

Commercial tool Greedy

Chain-only: avg. 2.1% Connected-chain: avg. 12.8%

[library]-[N]-[k]

slide-18
SLIDE 18
  • 18-

Generated Benchmark - Connectivity Generated Benchmark - Connectivity

 Problem complexity and circuit connectivity

1. Arranged assignment: improve connectivity (larger fanin – later stage, larger fanout – earlier stage) 2. Random assignment: improve diversity of topology

arranged random unconnected Subopt.

100% 0% 0.00% 2.60% 75% 25% 0.00% 6.80% 50% 50% 0.25% 10.30% 25% 75% 0.75% 11.20% 0% 100% 17.00% 7.70%

slide-19
SLIDE 19
  • 19-

Suboptimality w.r.t. Parameters Suboptimality w.r.t. Parameters

 For different number of chains

1 10 100 1000 10000 8% 9% 10% 11% 12% 13% 14% 40 80 160 320 640

runtime (min)

suboptimality number of chains subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt)

 For different number of stages

1 10 100 1000 8% 9% 10% 11% 12% 13% 14% 20 40 60 80 100

runtime (min) suboptimality number of stages subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt)

Total # paths increase significantly w.r.t. N and K

slide-20
SLIDE 20
  • 20-

Suboptimality w.r.t. Parameters (2) Suboptimality w.r.t. Parameters (2)

 For different average net degrees  For different delay constraints

0.1 1.0 10.0 100.0 1000.0 0% 20% 40% 60% 80% 100% 120% 1.2 1.6 2 2.4 runtime (min)

suboptimality average net degree subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt)

0.1 1.0 10.0 100.0 0% 5% 10% 15% 20% 25% 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

runtime (min) suboptimality timing constraint (ns) subopt.(Comm) subopt.(Greedy) subopt.(SensOpt) runtime(Comm) runtime(Greedy) runtime(SensOpt)

slide-21
SLIDE 21
  • 21-

Generated Realistic Benchmarks Generated Realistic Benchmarks

 Target benchmarks

– SASC, SPI, AES, JPEG, MPEG (from OpenCores) – EXU (from OpenSPARC T1)

 Characteristic parameters of real and generated benchmarks data depth # instance real designs generated Rent param. net degree Rent param. net degree SASC 20 624 0.858 2.06 0.865 2.06 SPI 33 1092 0.880 1.81 0.877 1.80 EXU 31 25560 0.858 1.91 0.814 1.90 AES 23 23622 0.810 1.89 0.820 1.88 JPEG 72 141165 0.721 1.84 0.831 1.84 MPEG 33 578034 0.848 1.59 0.848 1.60

slide-22
SLIDE 22
  • 22-

Suboptimality of Heuristics Suboptimality of Heuristics

 Suboptimality w.r.t. known optimal solutions for

generated realistic benchmarks

Vt swap context – up to 52.2%

  • avg. 16.3%

0.00% 20.00% 40.00% 60.00%

eyechart SASC SPI AES EXU JPEG MPEG

Comm1 Comm2 Greedy SensOpt

0.00% 20.00% 40.00% 60.00%

eyechart SASC SPI AES EXU JPEG MPEG

Comm1 Comm2 Greedy SensOpt

Gate sizing context – up to 43.7%

  • avg. 25.5%

Suboptimality

* Greedy results for MPEG are missing With EP library With LP library

slide-23
SLIDE 23
  • 23-

Comparison w/ Real Designs Comparison w/ Real Designs

 Suboptimality versus one specific heuristic (SensOpt)

Real designs and real delay/leakage library (TSMC 65nm) case

Actual suboptimaltiy will be greater !

  • 10.00%

0.00% 10.00% 20.00% 30.00% 40.00% 50.00%

SASC SPI AES EXU JPEG MPEG

Comm1 Comm2 Greedy

  • 10.00%

0.00% 10.00% 20.00% 30.00% 40.00%

SASC SPI AES EXU JPEG MPEG

Comm1 Comm2 Greedy

Suboptimality from

  • ur benchmarks

 Discrepancy: simplified delay model, reduced library set, ...

slide-24
SLIDE 24
  • 24-

Conclusions Conclusions

 A new benchmark generation technique for gate

sizing  construct realistic circuits with known

  • ptimal solutions

 Our benchmarks enable systematic and

quantitative study of common sizing heuristics

 Common sizing methods are suboptimal for

realistic benchmarks by up to 52.2% (Vt assignment) and 43.7% (sizing)

 http://vlsicad.ucsd.edu/SIZING/

slide-25
SLIDE 25
  • 25-

Ongoing Work Ongoing Work

 Analyze discrepancies between real and

artificial benchmarks

 Handle more realistic delay model

– Use realistic delay library in the context of realistic benchmarks with tight upper bounds

 Alternate approach for netlist generation

– (1) cutting nets in a real design and find

  • ptimal solution  (2) reconnecting the nets

keeping the optimal solution

slide-26
SLIDE 26
  • 26-

Thank you