From Recovering Time to Timing Recovery: Some Challenges for the - - PowerPoint PPT Presentation

from recovering time to timing recovery some challenges
SMART_READER_LITE
LIVE PREVIEW

From Recovering Time to Timing Recovery: Some Challenges for the - - PowerPoint PPT Presentation

From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts. of CSE and ECE UC San Diego abk@ucsd.edu http://vlsicad.ucsd.edu/~abk TAU-2016 Keynote: In Search of Lost Time Recovering


slide-1
SLIDE 1

From Recovering Time to Timing Recovery: Some Challenges for the TAU Community

Andrew B. Kahng

  • Depts. of CSE and ECE

UC San Diego abk@ucsd.edu http://vlsicad.ucsd.edu/~abk

slide-2
SLIDE 2

2

TAU-2016 Keynote: “In Search of Lost Time”

  • “Recovering Time”: machine

learning, optimization, margin reduction, …

slide-3
SLIDE 3

3

  • Motivations

Agenda

slide-4
SLIDE 4

4

Design Crises: Cost, Expertise, Unpredictability

  • Quality: also not scaling
  • Design Capability Gap
  • Available density: 2x/node
  • Realizable density: 1.6x/node
  • Figure: UCSD / 2013 ITRS
  • Design cost: not scaling
  • Design, process roadmaps

not coupled

  • Figure: Andreas Olofsson,

DARPA, ISPD-2018 keynote

slide-5
SLIDE 5

5

Design is Too Difficult !

  • Tools and flows have steadily increased in complexity
  • Modern P&R tool: 10000+ commands/options
  • Hard to design with latest tools in latest technologies
  • Even harder to predict quality, schedule
  • Expert users required
  • Increased cost and risk not good for industry !
  • Still have “CAD” mindset more than “DA” mindset
  • Again: assumes expert users

How do we escape this “local minimum” ?

slide-6
SLIDE 6

6

IDEA: No-Humans, 24-Hours

  • A. Olofsson, DARPA

ISPD-2018 keynote

  • Part of DARPA Electronics Resurgence Initiative
  • Traditional focus: ultimate quality
  • New focus = ultimate ease of use
  • No humans, 24-hour TAT = “equivalent scaling”
  • Overarching goal: designer access to silicon
slide-7
SLIDE 7

7

DARPA IDEA and POSH Programs, 2018-2022

https://vlsicad.ucsd.edu/NEWS18/dac_v5_DISTAR.pdf

slide-8
SLIDE 8

8

theopenroadproject.org

slide-9
SLIDE 9

9

OpenROAD: A New Design Paradigm

 Quality  Schedule  Cost Mindsets

  • Achieve predictability from the user’s POV
  • Use cloud/parallel to recover solution quality
  • Focus on reducing time and effort = schedule, cost

Machine Learning is CENTRAL to this

24 hours, no humans – no PPA loss Design Complexity

Extreme partitioning Parallel

  • ptimization

Machine Learning

  • f tools, flows

Restricted layout

slide-10
SLIDE 10

10

The OpenROAD Project

  • Initial target: digital IC flow “RTL to GDS”
  • Open source
  • No-human-in-loop
  • Limited “knobs”, restricted field of use
  • Must replace intelligent humans (partition, floorplan, …)
slide-11
SLIDE 11

11

  • Motivations
  • OpenROAD + Initial Target

Agenda

slide-12
SLIDE 12

12

Initial Target: RTL-to-GDS Layout Generation

Logic Synthesis Floorplan/PDN Placement Clock Tree Synthesis Global and Detailed Routing Layout Finishing Verilog + .lib, .sdc, .lef GDSII

  • Inputs: .v, .sdc, .lib, .lef
  • .def, .spef in point tools
  • config files required
  • pre-characterizations required
  • Outputs: post-route .def,

timing/power estimates

  • V1.0 release: June 2020
slide-13
SLIDE 13

13

Placement https://github.com/abk-openroad/RePlAce

  • RePlAce features
  • Timing-driven (OpenSTA timer integrated)
  • Mixed-size (macros + cells)
  • Electrostatics analogy in analytic

placement

  • RePlAce used in:
  • Physical synthesis
  • Floorplanning
  • Clock tree synthesis
  • Traditional standard-cell placement
  • BSD-3 License

Placement

.def from FP/PDN (+ .v, .sdc, .lef, .lib) Placed .def

slide-14
SLIDE 14

14

RePlAce: Routability-Driven Placement

  • Global routing during routability-driven global placement

Routability-driven loop

slide-15
SLIDE 15

15

  • OpenSTA: open-sourced static timing analysis tool
  • Developer: James Cherry (Parallax Software)
  • Tested with ASAP7, GF14, TSMC16, ST28, etc.
  • GPLv3 license

Static Timing Analysis https://github.com/abk-openroad/OpenSTA

slide-16
SLIDE 16

16

aes_cipher_top WNS (ps) TNS (ps) #viol. Signoff STA

  • 61
  • 289

7 OpenSTA (arnoldi)

  • 57
  • 314

9

aes_cipher_top (28nm, 12T, clkp=1000ps)

Reg-to-Reg Reg-to-Out/ In-to-Reg

Slack, WNS, TNS 28nm

slide-17
SLIDE 17

17

Signoff STA OpenSTA WNS (ns)

  • 0.660
  • 0.603

TNS (ns)

  • 1758.004
  • 1219.239

#viol. 8096 6926

Coyote (16nm, 9T, clkp=2000ps)

Slack, WNS, TNS 16nm

slide-18
SLIDE 18

18

Challenges for the TAU Community

  • #1. Help improve open-source STA engine
  • In particular: OpenSTA
  • Delay calculation, SI analysis, advanced timing models, MCMM, …
  • Priorities = ?
  • Will revisit:

Signoff STA OpenSTA WNS (ns)

  • 0.660
  • 0.603

TNS (ns)

  • 1758.004
  • 1219.239

#viol. 8096 6926

slide-19
SLIDE 19

19

The OpenROAD Project

  • Initial target: digital IC flow “RTL to GDS”
  • Open source
  • No-human-in-loop
  • Limited “knobs”, restricted field of use
  • Must replace intelligent humans (partition, floorplan, …)
slide-20
SLIDE 20

20

  • Motivations
  • OpenROAD + Initial Target
  • Machine Learning

Agenda

slide-21
SLIDE 21

21

ML in IC Design: Not Like Chess or Cat Pics

  • Getting to self-driving IC design: not so obvious
  • Do recent ML successes transfer well?
  • 3-week SP&R&Opt run is NOT like playing chess!
  • Design lives in a {servers, licenses, schedule} box
  • Distributions of outcomes matter cloud, parallel
  • A “stack of models” is mandatory: Predictions of

downstream outcomes are also optimization objectives

  • Still uncharted road to self-driving tools and flows
  • How do we overcome “small, expensive data” challenges?
  • Standards: Learning comes from {design + tool + technology},

all of which are highly proprietary

  • Need mechanisms for IP-preserving sharing of data and models
slide-22
SLIDE 22

22

4 Stages of ML to Recover Time, Effort

Four Stages of Machine Learning

  • 1. Mechanization and Automation
  • 2. Orchestration of Search and

Optimization

  • 3. Pruning via Predictors and

Models

  • 4. From Reinforcement Learning

through Intelligence

Huge space of tool, command,

  • ption trajectories through design

flow

slide-23
SLIDE 23

23

  • Prediction of tool- and design-specific outcomes over

longer and longer subflows

  • Wiggling of longer and longer ropes
  • Enables pruning and termination  avoid wasted

design resources

  • Simple way to think about it: “identify doomed X”
  • Doomed floorplan, Opt run, DRoute run, …
  • Allocate resources elsewhere
  • Better outcome within given resource budget
  • Complementary dream: New heuristics and tools that

are inherently more predictable and modelable  lessen chaos

  • Ensembles might be modeled/predicted
  • Prediction requirement might be relaxed “get user into a ballpark”?

Stage 3. Modeling and Prediction

slide-24
SLIDE 24

24

  • NOTE: “Doomed” often wrt timing, or due to fear of timing!!!
  • Picture: progressions of #DR violations in commercial router
  • Simple approach: track and project metrics as time series
  • Can use Markov decision process (MDP): “GO” vs. “STOP”

strategy card to terminate “doomed runs” early

Generic Need: Predicting Doomed Runs

slide-25
SLIDE 25

25

Obtaining Golden From Non-Golden ML shifts the Accuracy-Cost Tradeoff Curve (for free) !

slide-26
SLIDE 26

26

(Old) Example: ML-based Timer Correlation

Artificial Circuits Train Validate Test New Designs

MODELS

(Path slack, setup time, stage, cell, wire delays)

If error > threshol d

Outliers (data points) ONE-TIME INCREMENTAL Real Designs

T1 Path Slack (ns) T2 Path Slack (ns)

31 ps ~4 reduction

  • 0.6
  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1

  • 0.6
  • 0.5
  • 0.4
  • 0.3
  • 0.2
  • 0.1

0.1

T2 Path Slack (ns) T1 Path Slack (ns)

123 ps ML Modeling

BEFORE AFTER

DATE14, SLIP15

slide-27
SLIDE 27

27

  • PBA (Path-Based Analysis) is less pessimistic than GBA

(Graph-Based Analysis)

  • But, can have MUCH more expensive runtime !
  • ML task: Predict PBA timing from GBA timing
  •  Improved quality of results in P&R, optimization
  •  Less-expensive timing analysis usable earlier in flow

Lately: Predicting PBA from GBA

GBA Mode PBA Mode

ICCD18

slide-28
SLIDE 28

28

Bigram- and CART-based Modeling

Reduced GBA pessimism

  • vs. PBA
  • Bigram-based path modeling
  • Classification and regression tree

(CART) approach

  • Model based on 13 bigram parameters

https://vlsicad.ucsd.edu/Publications/Conferences/361/c361.pdf

slide-29
SLIDE 29

29

  • Want all the benefits of STA at N corners, but want to pay for

analysis at only M << N corners

  • “Missing Corner Prediction” (“matrix completion”) saves runtime, licenses
  • “Primary corners” methodology  errors caught at signoff cause iteration

Lately: Reduce #Corners in STA and Opt

DATE19

slide-30
SLIDE 30

30

“Missing Corners” = Matrix Completion

STA at relatively few known corners  reasonably accurate prediction of timing at all unknown corners PCA: low-dimensional modeling problem Predicting missing delay values = matrix completion problem

slide-31
SLIDE 31

31

Recent: Strong Design-Independent Models

Error # Corners Trained using initial artificial testcases

megaboom (990K instances, 350K FF) Trained using richer artificial testcases 10X improvement !!

slide-32
SLIDE 32

32

Recent: “ML-LEAK” (leakage recovery predictor)

  • ML to predict how much leakage will be recovered if user runs

{Tweaker, Tempus ECO, PTSI ECO, homegrown script, …}

  • Gives expectation of post-recovery power
  • Beneficial to methodology team when trying out various DOEs.
  • Saves time for implementation team: skip leakage recovery if it won’t help
  • Blended model of design and instance level predictions gives

best results.

Power recovered in this design was 0.076%. Our model predicts 1% power recovery for this graph

Plot showing actual vs predicted percentage change in leakage power after recovery

slide-33
SLIDE 33

33

Recent: STA Modeling  Project Optimization

  • TAU16 keynote: “pack tapeouts into design center” (ACM TODAES ’17)
  • Today: “pack signoff STA runs into compute”
  • Peak memory mismatch: job dies, tapeout schedule compromised
  • Runtimes poorly estimated: tapeout schedule compromised
  • Poor packing: tapeout schedule compromised
  • Two optimizations
  • ML to predict runtime, memory as function of resource (server, cores, cache,

RAM, contentiousness, timer knobs, design, corner …)

  • Scheduling/packing optimization (robust, incremental, …)
slide-34
SLIDE 34

34

  • Extensive DOEs ongoing (e.g., tool phases, contentiousness,

run-to-run variation, …); interest/guidance from industry

Runtime, Memory Predictors: Not Trivial (!)

Runtime Memory

slide-35
SLIDE 35

35

“Challenges for the TAU Community”

  • #2. “TAU in service to …” a world of needed models
  • Timing analysis is a means to an end!
  • One stage’s model is another stage’s optimization
  • bjective
  • Compact LLE derates: diffusion breaks, gate cuts, coloring/mask
  • rder, ... ASP-DAC19 SDB-DDB: https://vlsicad.ucsd.edu/Publications/Conferences/366/c366.pdf
  • Compact dynamic IR drop impacts DATE19 M1 power stapling
  • #3. TAU introspection
  • “Features that ML models would want to use, provided by domain

experts”

  • Optimization trajectories, timing graph topology, switching windows
  • (+ when layout info/costs available: congestion, legalization, etc.)
  • Contexts: leakage reduction, DVD fix, … (during next runs of block)
  • Customers want more: “Timing opt tools typically stop and report reasons why

they can’t make further fixes or optimizations. It would be helpful if tools can continue to try out other options and present what-if results, i.e., automatically explore solution space w.r.t. power, performance, runtime (e.g., cell displacement and additional ECO cycles).”

slide-36
SLIDE 36

36

  • Motivations
  • Initial Target
  • Machine Learning
  • Infrastructure for ML: METRICS

Agenda

slide-37
SLIDE 37

37

  • Support for ML in IC design
  • Standards for model encapsulation, model application, and IP

preservation when models are shared

  • Standard ML platform for EDA modeling
  • Design metrics collection, (design-specific) modeling,

prediction of tool/flow outcomes

  • This recalls “METRICS” http://vlsicad.ucsd.edu/GSRC/metrics
  • Datasets to support ML
  • Real designs, Artificial designs and “Eyecharts”
  • Shared training data – e.g., analysis correlation, post-route

DRV prediction, sizer move trajectories and outcomes, …

  • Challenges and incentives: “Kaggle for ML in IC design”

ML in IC Design Requires Infrastructure !

slide-38
SLIDE 38

38

“METRICS”

[DAC00, ISQED01]

  • METRICS (1999; DAC00, ISQED01):

“Measure to Improve”

  • Goal #1: Predict outcome
  • Goal #2: Find sweet spot (field of use) of tool, flow
  • Goal #3: Dial in design-specific tool, flow knobs

http://vlsicad.ucsd.edu/GSRC/metrics

slide-39
SLIDE 39

39

Original METRICS Architecture

  • Instrumentation of design tools:
  • Wrapper scripts to extract data from outputs and logfiles,
  • Callable API codes that allow direct interaction from within

the design tools

  • METRICS server: central data collection (Oracle8i)
  • Data mining process: analyzes existing data to

improve existing design flow (CUBIST, etc.)

slide-40
SLIDE 40

40

A Proposed METRICS 2.0 Architecture

White paper, WOSET-2018 woset.org

slide-41
SLIDE 41

41

METRICS 2.0 Dictionary  Standard Naming

  • JSON & MongoDB enable learning across the flow

through cross referencing

  • Currently: sharing draft privately
  • https://github.com/The-

OpenROAD-Project/METRICS-2.0

  • Collaboration welcome!

 email to abk@ucsd.edu

tool1 tool2 {“net_name”:”n123”, “length”:45} …. {“net_name”:”n123”, “parasitics”:5} …. MongoDB

slide-42
SLIDE 42

42

METRICS 2.0++ (Grid, Federated, …)

  • METRICS2.0 can open entirely new worlds
  • METRICS + Grid Computing
  • Privacy-preserving Federated ML
slide-43
SLIDE 43

43

Idea: Federated Learning (with METRICS) !!!

  • Centralized
  • Have storage and computation need
  • n server
  • Exposure of METRICS to public

domain

  • Federated
  • Light server, distributed, spare

cycle-aware training

  • Data remains private

Client Server Client Server Federated Centralized

slide-44
SLIDE 44

44

“Challenges for the TAU Community”

  • #4. Contribute to METRICS2.0 names, semantics in timing

and optimization space (see #3)

  • #5. Contribute to development of standard methods to

generate data for machine learning in/around IC design tools: artificial data, eyechart data, mutant data, obfuscated data …

  • E.g., with provable privacy-preserving attributes, industry concurrence, …
  • #6. Get out of comfort zone (= out of silo)
  • Sorry, but incremental/ECO for leakage, IR is still in comfort zone
  • Must understand layout (detailed placement, especially) better
  • P&R tool should really NOT say this has zero violations:

Signoff STA OpenSTA WNS (ns)

  • 0.660
  • 0.603

TNS (ns)

  • 1758.004
  • 1219.239

#viol. 8096 6926

slide-45
SLIDE 45

45

  • Motivations
  • Initial Target
  • Machine Learning
  • Infrastructure for ML: METRICS
  • Conclusions

Remember: (1) Timing is now central to everything; (2) where there’s smoke, there’s fire (ML)

Agenda

slide-46
SLIDE 46

46

  • Two sides of same coin
  • Slack, margin, schedule all tied together
  • What’s changed over the years?
  • Machine learning “inside and outside” (to reduce errors and

margins, avoid runs, reduce iterations, …)  on the way

  • Open-source  on the way
  • Stronger interactions (spatial, topological, temporal contexts)

demand “going outside comfort zone” in very broad sense

  • Challenges for the TAU Community
  • Improve open-source STA engine
  • “TAU in service to X” models: LLE derates, dynIR impact…
  • TAU introspection (features for ML modeling) (+ what-ifs)
  • Contribute to METRICS 2.0 names in timing, opt spaces
  • Standardized data generation (artificial, obfuscated…)
  • Get out of comfort zone!

(always happy to discuss, collaborate… )

“From Recovering Time to Timing Recovery”

slide-47
SLIDE 47

THANK YOU !