Clustering-based signal merging in STA Anton Belov, Adrian Wrixon, - - PowerPoint PPT Presentation

clustering based signal merging in sta
SMART_READER_LITE
LIVE PREVIEW

Clustering-based signal merging in STA Anton Belov, Adrian Wrixon, - - PowerPoint PPT Presentation

Clustering-based signal merging in STA Anton Belov, Adrian Wrixon, Maurice Keller, Himanshu Dadheech Synopsys Inc. TAU 2019 Monterey, CA, USA Introduction: GBA-PBA accuracy gap (1/2) GBA (Graph-Based Analysis) Timing values for the


slide-1
SLIDE 1

Clustering-based signal merging in STA

Anton Belov, Adrian Wrixon, Maurice Keller, Himanshu Dadheech Synopsys Inc. TAU 2019 Monterey, CA, USA

slide-2
SLIDE 2

Introduction: GBA-PBA accuracy gap (1/2)

GBA PBA

#endpoints slack

  • GBA (Graph-Based Analysis)

– Timing values for the entire circuit are computed in a BFS sweep => runtime/memory are linear. – Timing properties are worst-cased (“merged”) at points of convergence => slacks are pessimistic.

  • PBA (Path-Based Analysis)

– Timing values are computed one path at a time => runtime/memory can be exponential. – No convergence => no merging => slacks are accurate.

slide-3
SLIDE 3

Introduction: GBA-PBA accuracy gap (1/2)

GBA PBA

#endpoints slack

GBA-PBA gap

  • GBA (Graph-Based Analysis)

– Timing values for the entire circuit are computed in a BFS sweep => runtime/memory are linear. – Timing properties are worst-cased (“merged”) at points of convergence => slacks are pessimistic.

  • PBA (Path-Based Analysis)

– Timing values are computed one path at a time => runtime/memory can be exponential. – No convergence => no merging => slacks are accurate.

  • GBA-PBA gap

– Difference between GBA and PBA slacks – Large GBA-PBA gap is a problem: Slower and more memory intensive PBA-based signoff Slower and less optimal ECO – GBA-PBA gap is getting worse …

slide-4
SLIDE 4

Introduction: GBA-PBA accuracy gap (2/2)

Reading the plot: 40% of endpoints have gap < 0.05 ns Reading the plot: the lower the curve - the larger the GBA/PBA gap.

Industrial design block, 16 nm, 0.545v. 100K endpoints analysed.

  • GBA-PBA gap increases with the

number of signal dimensions

– Each dimension contributes merge pessimism

slide-5
SLIDE 5

Introduction: GBA-PBA accuracy gap (2/2)

Reading the plot: 40% of endpoints have gap < 0.05 ns Reading the plot: the lower the curve - the larger the GBA/PBA gap.

Industrial design block, 16 nm, 0.545v. 100K endpoints analysed.

  • GBA-PBA gap increases with the

number of signal dimensions

– Each dimension contributes merge pessimism

➢ Problem: how to improve accuracy of GBA ?

– Accuracy is lost in merging – Approach 1: improve quality of merging – Approach 2: do less merging

slide-6
SLIDE 6

Multiple-Signal Propagation

  • Proposed in early 2000s (Blaauw, et al ICCAD 2000; Lee, et al ICCAD 2001)

– Dominance: 𝑇1 dominates 𝑇2 at node 𝑜, if 𝑏𝑢 𝑇1 ≥ 𝑏𝑢(𝑇2) everywhere in fanout of 𝑜. – In some cases it is possible to detect dominance – In some cases it is possible to construct an accurate bounding signal – When neither is possible => propagate multiple signals.

slide-7
SLIDE 7

Multiple-Signal Propagation

distance (AOCVM) slew arrival time

logic depth (AOCVM)

arrival window (SI) waveform

➢ Problem: old techniques do not translate

– Signals were assumed to be 2-D: arrival time and slew – In modern STA signals are k-D

  • Proposed in early 2000s (Blaauw, et al ICCAD 2000; Lee, et al ICCAD 2001)

– Dominance: 𝑇1 dominates 𝑇2 at node 𝑜, if 𝑏𝑢 𝑇1 ≥ 𝑏𝑢(𝑇2) everywhere in fanout of 𝑜. – In some cases it is possible to detect dominance – In some cases it is possible to construct an accurate bounding signal – When neither is possible => propagate multiple signals.

slide-8
SLIDE 8

Multiple-Signal Propagation

distance (AOCVM) slew arrival time

logic depth (AOCVM)

arrival window (SI) waveform

➢ Problem: old techniques do not translate

– Signals were assumed to be 2-D: arrival time and slew – In modern STA signals are k-D In this paper: focus on multiple-signal propagation – How to maximize accuracy with a given runtime/memory budget ?

  • Proposed in early 2000s (Blaauw, et al ICCAD 2000; Lee, et al ICCAD 2001)

– Dominance: 𝑇1 dominates 𝑇2 at node 𝑜, if 𝑏𝑢 𝑇1 ≥ 𝑏𝑢(𝑇2) everywhere in fanout of 𝑜. – In some cases it is possible to detect dominance – In some cases it is possible to construct an accurate bounding signal – When neither is possible => propagate multiple signals.

slide-9
SLIDE 9

Clustering-based Signal Merging

  • Propagate multiple signals, but control resources

– merge-width (mw) = the maximum number of unmerged signals per node

k signals 2k signals -> k signals merge-width (budget) = k k signals

slide-10
SLIDE 10

Clustering-based Signal Merging

  • Propagate multiple signals, but control resources

– merge-width (mw) = the maximum number of unmerged signals per node

  • When forced to merge – partition signals into mw clusters (subsets), but control accuracy:

– Metric: accuracy-loss = endpoint arrival time difference unmerged vs merged

– Infeasible to compute exactly, but can estimate heuristically

– Translate all dimensions into arrival times; sensitivity is important – Find partition that minimizes overall accuracy-loss – Each cluster is merged pessimistically (safe)

k signals 2k signals -> k signals merge-width (budget) = k k signals

slide-11
SLIDE 11

Experimental results: GBA-PBA gap closure

  • 12 blocks
  • 1M-3M instances
  • 7nm-20nm CCS
  • SI, Waveform, POCV and AOCV

➢ Observations:

  • 20-60% gap closure (~35% avg)
  • Sensitive to merge-width, but not

always

slide-12
SLIDE 12

Experimental results: PBA-based signoff

  • PBA-based signoff requires computation of the worst PBA path for each violating endpoint

– “exhaustive PBA”

  • GBA-PBA accuracy gap has large impact on performance of exhaustive PBA
slide-13
SLIDE 13

Experimental results: PBA-based signoff

Runtime x-factor Memory penalty mw = 2 3.08x 17.6 % mw = 3 3.21x 29.9 % mw = 5 3.37x 47.9 % mw = 10 3.32x 81.2 %

➢ Observations

– Highlight: 11.70x speedup, 11.8% memory penalty – 3 designs with 4-5x speedup, under 20% memory penalty – Lowlight: 1.06x speedup, 30.6% memory penalty – Optimal merge width and benefits are design/technology dependent

Averages across all designs

  • PBA-based signoff requires computation of the worst PBA path for each violating endpoint

– “exhaustive PBA”

  • GBA-PBA accuracy gap has large impact on performance of exhaustive PBA
slide-14
SLIDE 14

Summary

  • GBA-PBA accuracy gap is a (growing) problem

– Signoff and ECO are impacted

  • Possible solution: multiple-signal propagation with clustering-based merging

– Experimental results are encouraging – Ripe for heuristics – Ripe for Machine Learning

slide-15
SLIDE 15

Summary Thank you !

  • GBA-PBA accuracy gap is a (growing) problem

– Signoff and ECO are impacted

  • Possible solution: multiple-signal propagation with clustering-based merging

– Experimental results are encouraging – Ripe for heuristics – Ripe for Machine Learning