[PPT] - In Search of Lost Time Andrew B. Kahng UCSD CSE and ECE Departments PowerPoint Presentation

SLIDE 1

1

A. B. Kahng, TAU 2016

In Search of Lost Time

Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu http://vlsicad.ucsd.edu TAU-2016 Friday keynote, Santa Rosa

SLIDE 2

2

A. B. Kahng, TAU 2016

In Search of Lost Time

SLIDE 3

3

A. B. Kahng, TAU 2016

What is Time? How do we lose Time? How do we regain Time?

SLIDE 4

4

A. B. Kahng, TAU 2016

What is Time?

SLIDE 5

5

A. B. Kahng, TAU 2016

What is Time?

Time = Schedule
Moore’s Law: 1% = 1 week
Time = Things convertible to time
mV, σ, uW, nm, $, µm2

Margin

Time

Product Quality Model and Analysis Accuracy

nm, mV, {skew, jitter, OCV…} power, area, fmax, Iddq,… rms, %, σ

SLIDE 6

6

A. B. Kahng, TAU 2016

What is Time?

Time = Schedule
Moore’s Law: 1% = 1 week
Time = Things convertible to time
mV, σ, uW, nm, $, µm2
Time = time itself
Flavors: slack, trans, xd, d-trans, …

SLIDE 7

7

A. B. Kahng, TAU 2016

What is Time?

Time = Schedule
Moore’s Law: 1% = 1 week
Time = Things convertible to time
mV, σ, uW, nm, $, µm2
Time = time itself
Flavors: slack, trans, xd, d-trans, …

Time = Money

SLIDE 8

8

A. B. Kahng, TAU 2016

What is Time? How do we lose Time?

SLIDE 9

9

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …

SLIDE 10

10

A. B. Kahng, TAU 2016

Context I: Race to End of Roadmap

Paper model to v1.0 SPICE model: ~12 months @N10
Many near-term “red bricks”: ArF, Cu, low-k, …
Foundry-fabless dynamics: who gives up margin ?
Time constants limit design-manufacturing co-evolution

(Years) Tech development, app market definition, architecture/front-end design (Months) RTL-to-GDS implementation, reliability qualification (Weeks) Fab latency, cycles of yield learning, design re-spins, mask flows (Days) Process tweaks, design ECOs Mismatches among these time constants

Model-hardware

miscorrelation

Model guardbanding
Faster node enablement

is challenging !!

SLIDE 11

11

A. B. Kahng, TAU 2016

Context II: Low-Power Grand Challenge

Low power = High complexity

multiple supply voltages, power and clock gating, DVFS, MTCMOS, multi-Lgate, …

Increased timing closure burden

Mobility Big data Green datacenters Cloud Internet of Things

SLIDE 12

12

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …
The margining imperative …

SLIDE 13

13

A. B. Kahng, TAU 2016

Nobody Wants to Own the Scrap

Timing model not 100% accurate
Add margin to cover unknowns

SLIDE 14

14

A. B. Kahng, TAU 2016

 Stacks of Margins

performance PDF

Process

Signoff

Temperature

source: Wu 08

Nominal Vdd

Static IR drop Power grid IR gradient Dynamic IR HCI/NBTI

Signoff

Voltage

Signoff

Design margin = stack of layers of conservatism

Reliability

SLIDE 15

15

A. B. Kahng, TAU 2016

 Consequences

Diminishing ROI from next node
Typical: Moore’s Law-like scaling
Worst-case: scales, but worse ROI
Signoff with excessive margin: potential gain wiped out

SLIDE 16

16

A. B. Kahng, TAU 2016

Time: Lose Some, Win Some

20nm 90nm 45/40nm 28nm 16/14nm 10nm ≤7nm 65nm BTI Temp inversion Noise MCMM Maxtrans EM AOCV / POCV PBA Fixed-margin spec patterning Multi- patterning Cell-POCV MOL, BEOL R ↑ Dynamic IR Fill effects Layout rules BEOL, MOL variations Signoff criteria with AVS SOC complexity LVF MIS Phys-aware timing ECO Min implant

SLIDE 17

17

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …
The margining imperative …
We give it away
Intentionally

c2q-setup-hold surface

SLIDE 18

18

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …
The margining imperative …
We give it away
Intentionally

C

3σ

Layer M2

3σ C

3σ

Layer M1

3σ

Interconnect stack with M1 and M2

M1 C M2 C 3σ Pessimism

Homogeneous BEOL corners (e.g., Cworst)

Homogeneous Cw corner

SLIDE 19

19

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …
The margining imperative …
We give it away
Intentionally
By miscorrelating
0.6
0.5
0.4
0.3
0.2
0.1

0.1

0.6
0.5
0.4
0.3
0.2
0.1

0.1

T2 Path Slack (ns) T1 Path Slack (ns)

123 ps

SLIDE 20

20

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …
The margining imperative …
We give it away
Intentionally
By miscorrelating
By wasting it

2013 Contest NDA: Without [EDA vendor’s] prior approval, I shall not write or publish any article or presentation that references [EDA vendor’s tool name].

SLIDE 21

21

A. B. Kahng, TAU 2016

How Do We Lose Time?

It’s tough not to …
The margining imperative …
We give it away
Intentionally
By miscorrelating
By wasting it

“We don’t have enough time to do it right, but we have enough time to do it wrong”

SLIDE 22

22

A. B. Kahng, TAU 2016

Not Enough Time To Do It Right…

Option #1: go with latest available technology = 0.01 AU/year speed Option #2: spend the next ten years to come up with a spaceship = 0.1 AU/year speed 2016 2026 2027 2031

Earth to Mars

Option #1 = 0.5 / 0.01 = 50 years Option #2 = 0.5 / 0.1 + 10 years = 15 years (B<< A)

Issue: investment for the long haul

Option #1 Option #2 Corner-based STA Statistical STA Planar 3D Homogeneous CMOS Heterogeneous CMOS

Need a faster ship

Year:

SLIDE 23

23

A. B. Kahng, TAU 2016

What is Time? How do we lose Time? How do we regain Time?

SLIDE 24

24

A. B. Kahng, TAU 2016

How Do We Regain Time?

Learn !!! (machine learning, Big Data mindset)

SLIDE 25

25

A. B. Kahng, TAU 2016

Timer Miscorrelation

T1 and T2 : commercial signoff STA tools with same inputs

(.v, .spef, .lib)

123ps slack divergence  20% performance difference

 one node of Moore’s Law scaling

0.6
0.5
0.4
0.3
0.2
0.1

0.1

0.6
0.5
0.4
0.3
0.2
0.1

0.1

T2 Path Slack (ns) T1 Path Slack (ns)

123 ps

[DATE14]

SLIDE 26

26

A. B. Kahng, TAU 2016

Erase Miscorrelation with Machine Learning!

Can also erase P&R vs. signoff STA miscorrelation

Artificial Circuits Train Validate Test New Designs

MODELS

(Path slack, setup time, stage, cell, wire delays)

If error > threshold

Outliers (data points) ONE-TIME INCREMENTAL Real Designs

T1 Path Slack (ns) T2 Path Slack (ns)

31 ps ~4× reduction

0.6
0.5
0.4
0.3
0.2
0.1

0.1

0.6
0.5
0.4
0.3
0.2
0.1

0.1

T2 Path Slack (ns) T1 Path Slack (ns)

123 ps

ML Modeling

BEFORE AFTER

[DATE14]

SLIDE 27

27

A. B. Kahng, TAU 2016

Harder: Non-SI to SI Calibration

.v

.db, .lib .spef .v .sdc Post P & R Database Calibration: Recipe to Convert Non-SI Timing Report to SI Timing Report

Non-SI Timing Report Non-SI Timing Report SI Timing Report SI Timing Report SI Timing Report

Complex interplay of electrical,

logic structure, and layout parameters

Black-box code in STA tools
Slack diverges by 81ps (clock

period = 1.0ns)

~4 stages of logic at 28nm

FDSOI

81ps SI Path Slack (ns) ($$$) Non-SI Path Slack (ns) ($) [SLIP15]

SLIDE 28

28

A. B. Kahng, TAU 2016

“SI for Free” with Machine Learning

Machine learning of

incremental transition time, delay due to SI

Accurate SI-aware

path delays, slacks

Timing Reports in SI Mode Timing Reports in Non-SI Mode Create Training, Validation and Testing Sets ANN (2 Hidden Layers, 5-Fold Cross-Validation) Save Model and Exit SVM (RBF Kernel, 5-Fold Cross-Validation) HSM (Weighted Predictions from ANN and SVM) Actual Path Delay (ps) Predicted Path Delay (ps) 8.2ps Worst absolute error = 8.2ps Average absolute error = 1.7ps

81ps SI Path Slack (ns) ($$$) Non-SI Path Slack (ns) ($) ML Modeling

BEFORE AFTER

[SLIP15]

SLIDE 29

29

A. B. Kahng, TAU 2016

Sim Results (Dyn.) Activity Factor (Static) Timing/ Noise MTTF & Aging P&R + Optimization Power Analysis Thermal Analysis Task Mapping/ Migration/ (DVFS) Temp Map Power Trace Reliability Report

Tech files, signoff criteria, corners

Slack IR Drop Map Timing / Glitches AVS

Sim vectors Benchmark RTL

Functional Sim

Similar: Closing Multiphysics Analysis Loops

[ASPDAC16]

SLIDE 30

30

A. B. Kahng, TAU 2016

Sim Results (Dyn.) Activity Factor (Static) Timing/ Noise MTTF & Aging P&R + Optimization Power Analysis Thermal Analysis Task Mapping/ Migration/ (DVFS) Temp Map Power Trace Reliability Report

Tech files, signoff criteria, corners

Slack IR Drop Map Timing / Glitches AVS

Sim vectors Benchmark RTL

Functional Sim

STA-IR loop STA-Thermal loop Workload-Thermal loop STA-Reliability loop

Similar: Closing Multiphysics Analysis Loops

[ASPDAC16]

SLIDE 31

31

A. B. Kahng, TAU 2016

Multiphysics Analysis is Difficult to Predict

IR drop, thermal, reliability, crosstalk, etc.
Example: Can we predict “risk map” for embedded

memories at floorplan stage ?

SRAM #1

SRAM Slack (ps)

SRAM #5

25ps 29ps

[ASPDAC16]

SLIDE 32

32

A. B. Kahng, TAU 2016

Multiphysics Analysis is Difficult to Predict

Implementation Index SRAM Slack (ps) [ASPDAC16]

IR drop, thermal, reliability, crosstalk, etc.
Example: Can we predict “risk map” for embedded

memories at floorplan stage ?

SLIDE 33

33

A. B. Kahng, TAU 2016

Floorplan Pathfinding

Filter bad floorplans (e.g., embedded memory placements,

power plans) comprehending downstream PD flow

Model f estimates combined effects of netlist, constraints,

placement, CTS, routing, optimization, STA

= Slack (w/, w/o IR)

= netlist, constraints, floorplan parameters

= ()

= ???

Signoff

Extraction, Timing, Verification Placement Floorplan, Powerplan Routing

Gate Netlist

Slack (w/, w/o IR) Modeling Scope

Constraints

Clock network synthesis Extraction, Timing

Costly Iteration [ASPDAC16]

SLIDE 34

34

A. B. Kahng, TAU 2016

Floorplan Pathfinding Model

False negatives = 3%
Pessimistic predictions  floorplan change that is actually

not required

False positives = 4%
Model incorrectly deems a floorplan to be good

False positives False negatives

Actual Pass Fail Pass Fail Predicted 584 42 384 31

Positive slack data points: Precision: tp/(tp +fp) = 93.3% Recall: tp/(tp +fn) = 95.0% Negative slack data points: Precision: tn/(tn +fp) = 92.5% Recall: tn/(tn +fn) = 90.1% Precision Recall Precision Recall [ASPDAC16]

SLIDE 35

35

A. B. Kahng, TAU 2016

Related: Library Groups  New k-Factors ?

Library interpolation with each “physics” modeled as

equivalent voltage delta (for example)

Voltage
Process variation
Temperature
Aging / reliability
Per-instance timing derating for signoff
In spirit of old “k-factors”, perhaps

Derating(V1, P1, T1, A1) Derating(V2, P2, T2, A2) Derating(V3, P3, T3, A3)

Voltage Process variation Temperature Aging / reliability

SLIDE 36

36

A. B. Kahng, TAU 2016

How Do We Regain Time?

Learn !!! (machine learning, Big Data mindset)
Embrace the “era of optimization”: 1% = 1 week

SLIDE 37

37

A. B. Kahng, TAU 2016

METRICS (1999): Measure to Improve

[ISQED01]

Goal #1: Predict outcome
Goal #2: Find sweet spot (field of use) of tool, flow
Goal #3: Dial in design-specific tool, flow knobs

SLIDE 38

38

A. B. Kahng, TAU 2016

Pure Optimization is a Big Lever

Project planning and management
Unforeseen events (late RTL bugs, timing ECO)
Resource co-constraints (e.g., 2 cores per EDA license, 3 concurrent

tapeouts)

( ) A4 (3) ( ) A5 (1) ( ) A1 (1) ( ) A2 (1) ( ) A1 (1) ( ) A2 (1) ( ) A2 (1) ( ) A3 (1) ( ) A3 (1) ( ) A4 (1) ( ) A4 (2) ( ) A4 (1) ( ) A5 (1) ( ) A4 (3) ( ) A5 (2) ( ) A1 (2) ( ) A3 (2) ( ) A2 (2) ( ) A1 (2) ( ) A2 (2) ( ) A2 (2) ( ) A3 (1) ( ) A3 (2) ( ) A4 (1) ( ) A4 (2) ( ) A4 (1) ( ) A5 (2) ( ) A4 (3) ( ) A5 (3) ( ) A1 (3) ( ) A3 (3) ( ) A2 (3) ( ) A1 (3) ( ) A2 (3) ( ) A2 (3) ( ) A3 (2) ( ) A3 (3) ( ) A4 (2) ( ) A4 (2) ( ) A4 (3) ( ) A5 (3)

20 22 24 26 28 30 32 34 36 38 40 42 Current servers Work Weeks Usage (Across Three Projects) Datacenter capacity

( ) A3 (3)

“How to pack 14 tapeouts into my design center during 2H15?”
Schedule cost minimization (SCM)
Minimize overall project makespan subject to delay penalties, resource

bounds, resource co-constraints, etc.

Resource cost minimization (RCM)
Minimize number of resources required across all projects

[DAC15 WIP]

SLIDE 39

39

A. B. Kahng, TAU 2016

Example Solver Use Cases (from a design center of a world top-5 semi)

Schedule modification after late-breaking bug
Three projects, 11 activities/project (e.g., placement, routing, RCX,

STA, etc.)

Five resource types (#cores, #memory, licenses for P&R, RCX, STA,

tools)

Industry solution: Makespan of 41 days across all projects
SCM solution: Makespan of 34 days across all projects (1.4 weeks

saved)

Datacenter resource allocation
24 projects, five activities (synthesis, P&R, RCX, STA, PV) per project
Forecast-based allocation for #servers in datacenter
Industry solution: Purchase 600 additional servers
SCM solution: Zero additional servers
Human resource allocation
Four large projects
Four types of human resources (synthesis, P&R, verification, STA)
RCM solution: ~$5.2M headcount cost savings for company
MILP solver at http://vlsicad.ucsd.edu/MILP/

SLIDE 40

40

A. B. Kahng, TAU 2016

DARPA’s CRAFT Program (2016-)

“Circuit Realization At Faster Timescales” (UCSD leads a team)
Goal: reduce SOC design time from 130 weeks to 30 weeks
“Iso-PAP” (Performance At Power) at 14/16nm and below

SLIDE 41

41

A. B. Kahng, TAU 2016

How Do We Regain Time?

Learn !!! (machine learning, Big Data mindset)
Embrace the “era of optimization”: 1% = 1 week
Take back what we’ve given away

SLIDE 42

42

A. B. Kahng, TAU 2016

Flexible FF Timing  Margin Recovery

setup c2q hold c2q

c2q-setup-hold surface setup hold c2q

setup hold c2q1 c2qn ...

setup-hold-c2q flexible model

Setup time, hold time and clock-to-q

(c2q) delay of FF ⇒ NOT fixed values

Flexible FF timing model considering
perating (function/test) modes, path

partitioning ⇒ Reduce pessimism in timing analysis

Sequential LP
setup-c2q
ptimization + hold-

c2q optimization

Objective: Find the best setup/hold time/c2q for each FF

setup-hold-c2q fixed model

[ISQED14]

SLIDE 43

43

A. B. Kahng, TAU 2016

“Free” Improvement of Timing

Extract path timing information LP formulation with flexible flip-flop timing model Solve Sequential LP

(STA_FTmax , STA_FTmin)

Annotate new timing model for each flip-flop Solution Netlist (and SPEF, if routed) Timing signoff with annotated timing

Fix timing violations “for free”
48ps average WNS improvement
ver 5 designs in foundry 65nm

technology

Other opportunities (jitter, ERCs,

non-full rail swing, …)

SLIDE 44

44

A. B. Kahng, TAU 2016

Homogeneous Corners

(1) Define RC corners of each layer separately
(2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C

3σ

Layer M2

3σ C

3σ

Layer M1

3σ

Interconnect stack with M1 and M2

M1 C M2 C 3σ Pessimism

Example: worst-case capacitance corner

Homogeneous Cw corner

[ICCD14]

SLIDE 45

45

A. B. Kahng, TAU 2016

Homogeneous Corners

(1) Define RC corners of each layer separately
(2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack Interconnect stack with M1 and M2

M1 C M2 C 3σ Homogeneous Cw corner C

3σ

Layer M2

3σ C

3σ

Layer M1

3σ Pessimism

Example: worst-case capacitance corner

When variations in different layers are not fully correlated, pessimism of homogeneous corners increase with #layers

SLIDE 46

46

A. B. Kahng, TAU 2016

Tightened BEOL Corners (“TBC”)

Routed design Timing analysis using conventional BEOL corners (CBC) ECO using CBC violation = 0? done

Conventional Signoff

No

Routed design Classify timing critical paths GTBC GCBC ECO using CBC

Timing analysis

using TBC

violation = 0?

Timing analysis using CBC

violation = 0? ECO using TBC done

UCSD, 2014

No No

[ICCD14]

SLIDE 47

47

A. B. Kahng, TAU 2016

Pessimism in Conventional BEOL Corners (CBC)

Assumption: a max (setup) path pj is “safe” when the delay

evaluated at a given CBC is larger than nominal delay + 3σj dj(YCBC) ≥ 3σj + dj(Ytyp)

For a given path, we can compare the statistical delay

variation and the delay obtained from a given CBC αj = 3σj / ∆dj(YCBC) ∆dj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC ∈ {Ycw, Ycb, Yrcw, Yrcb}

A small αj implies there is a large pessimism

delay

3σ

dj(YCBC)-dj(Ytyp)

3σj Large pessimism

SLIDE 48

48

A. B. Kahng, TAU 2016

Scaling Factor α and Delay Variation

Paths with small ∆drcw and ∆dcw have large α
E.g., there are αj > 0.6 when ((∆drcw < 3%) AND (∆dcw < 3%))
Identify paths for tightened BEOL corners based on ∆drcw and ∆dcw

α

Δd(Ycw)/d(Ytyp) Δd(Yrcw)/d(Ytyp)

SLIDE 49

49

A. B. Kahng, TAU 2016

Find Paths for Which TBCs Can Be Used Acw Arcw

Gtbc = paths which can be safely signed off using tightened corners: Path with ((∆dcw larger than Acw) OR (Path with ∆drcw larger than Arcw))

Δd(Ycw)/d(Ytyp) Δd(Yrcw)/d(Ytyp)

SLIDE 50

50

A. B. Kahng, TAU 2016

Benefits of Tightened BEOL Corners

WNS and TNS are reduced by

up to 100ps and 53ns

#paths with timing violations

is reduced by 24% to 100%

TBC-0.5 configuration has little

benefits because there are not many paths in Gtbc

Correlation factor, γ = 0.5

0.2
0.15
0.1
0.05

LEON SUPERBLUE12 NETCARD WNS (ns) CBC TBC-0.5 TBC-0.6 TBC-0.7

100
80
60
40
20

LEON SUPERBLUE12 NETCARD TNS (ns) CBC TBC-0.5 TBC-0.6 TBC-0.7 500 1000 1500 LEON SUPERBLUE12 NETCARD #Timing violations CBC TBC-0.5 TBC-0.6 TBC-0.7

SLIDE 51

51

A. B. Kahng, TAU 2016

How Do We Regain Time?

Learn correlations!
Enter the “era of optimization”: 1% = 1 week
Take back what we’ve given away
Stop wasting time

SLIDE 52

52

A. B. Kahng, TAU 2016

Poor Enablement, Poor Results

Academic libraries
OpenPDK
SAED 32/28
15nm FreePDK
ISPD Sizing Contest Library
Academic enablements (PDKs, libraries, etc.) quite weak

SLIDE 53

53

A. B. Kahng, TAU 2016

Example: 15nm OpenPDK

Cell delays not realistic
RC information missing
Cannot extract wire

capacitances with commercial RCX tools

Complex LEF rules missing

Des/Clust/Port Wire Load Model Library

wb_dma_top

wl_zero NanGate_15nm_OCL Point Fanout Cap Trans Incr Path

clock clk_i (rise edge) 0.000 0.000

clock network delay (ideal) 0.000 0.000 u3_u1_slv_adr_reg_9_/CLK (DFFRNQ_X1) 0.000 0.000 0.000 r u3_u1_slv_adr_reg_9_/Q (DFFRNQ_X1) 2.494 10.094 10.094 f slv0_adr[9] (net) 1 0.807 0.000 10.094 f U3390/ZN (NOR2_X1) 5.483 3.642 13.735 r n2388 (net) 1 1.616 0.000 13.735 r U2231/ZN (NAND2_X2) 4.255 3.216 16.952 f n2593 (net) 3 2.164 0.000 16.952 f U3389/ZN (INV_X1) 3.705 2.917 19.868 r n3228 (net) 3 1.990 0.000 19.868 r U3387/ZN (NAND2_X1) 6.314 4.207 24.075 f n3230 (net) 3 2.198 0.000 24.075 f U4136/Z (OR2_X1) 3.102 5.762 29.837 f n2318 (net) 2 1.509 0.000 29.837 f U3373/ZN (INV_X1) 2.093 1.799 31.636 r n3435 (net) 1 0.840 0.000 31.636 r U3372/Z (BUF_X2) 15.410 10.845 42.481 r n2367 (net) 31 21.353 0.000 42.481 r U3992/ZN (AOI22_X1) 9.081 5.892 48.373 f n3388 (net) 1 0.631 0.000 48.373 f U3185/ZN (NAND4_X1) 7.109 2.862 51.235 r u0_N3065 (net) 1 0.485 0.000 51.235 r u0_wb_rf_dout_reg_22_/D (DFFRNQ_X1) 7.109 0.000 51.235 r data arrival time 51.235 clock clk_i (rise edge) 60.000 60.000 clock network delay (ideal) 0.000 60.000 u0_wb_rf_dout_reg_22_/CLK (DFFRNQ_X1) 0.000 60.000 r library setup time -8.764 51.236 data required time 51.236

data required time 51.236

data arrival time -51.235

slack (MET) 0.002

Clock Period = 60ps? (1.5ns with 28nm foundry) Stage delay: 2ps~30ps STA report from [EDA tool]

SLIDE 54

54

A. B. Kahng, TAU 2016

Example: ISPD13 Gate Sizing Contest Library

“Gap” between academic benchmarks and industry

designs

Unrealistic timing library
Missing MCMM
Missing multiple power domains
Missing multiple clock domains
Missing memories / macro cells
No support for standard formats (.spef, .v, .sdc, .lib)
…

See “A2A” from UCSD: “horizontal benchmark extension” http://vlsicad.ucsd.edu/Publications/Conferences/313/c313.pdf

SLIDE 55

55

A. B. Kahng, TAU 2016

Poor Research Enablement Has Costs

“Good” academic sizers cannot be used for industry designs
No MCMM vs. MCMM  Resource (memory/runtime) problem
Simple vs. Complicated timing models  Timing accuracy problem
Few benchmarks vs. industry designs  Academic sizers don’t port well
Overtrained on a particular suite of “benchmarks”
Timing/power characteristics, intuition mismatched to reality, actual
utcomes

Benchmark: netcard

aSizer1 aSizer1 aSizer1

[GLSVLSI14]

Commercial sizer wins with foundry technologies (similar leakage, better timing slack, better runtime)

cSizer1: commercial sizer aSizer1: academic sizer

SLIDE 56

56

A. B. Kahng, TAU 2016

How Do We Regain Time?

Learn correlations!
Enter the “era of optimization”: 1% = 1 week
Take back what we’ve given away
Stop wasting time
Harvest low-hanging fruits

SLIDE 57

57

A. B. Kahng, TAU 2016

FEOL: Layout-Dependent Effects

Layout dependent variations
Variation in poly pitch
Well-proximity effects :

Closer to well edge  more Vth shift

Intentional and unintentional Stress:

LOD, STI, DSL and SiGe

Pattern dependent dishing and oxide erosion

[Mark Zwolinski, ISPD2013]

SLIDE 58

58

A. B. Kahng, TAU 2016

BEOL: Statistical RC Extraction

[source: R. Jiang, Synopsys, 2005]

Statistical RC extraction flow comprehends spatial

correlation of interconnect variations

Proposed by industry, then dropped …. (???)

SLIDE 59

59

A. B. Kahng, TAU 2016

Multi-Die: Signoff Corners

Example: inter-die process variation limits performance

improvement of 3DICs

What if SS Tier 0 and SS Tier 1 will never be stacked

together?

3D integration SS Tier 1 wafer/die FF Tier 0 wafer Wafer-to-wafer (die-to-wafer) bonding: integrate SS wafer/die with FF wafer/die (SS Tier 0 wafer/die + FF Tier 1 wafer or FF Tier 0 wafer/die + SS Tier 1 wafer)

75ps

180
140
100
60
20

SS-SS SS-FF FF-SS WNS (ps)

XX-YY = XX Tier 0 + YY Tier 1
Technology: 28FDSOI
3D netlist is bipartitioned with min-cut

Mix-and-match

[DATE16]

SLIDE 60

60

A. B. Kahng, TAU 2016

Multi-Die Design for “Mix-and-Match”

Partition netlist such that paths have balanced delay

across two tiers  Maximizes timing benefit from mix-and-match

16% performance increase at signoff compared to

existing flows

Design Clk period M0 1.2ns AES 1.1ns VGA 1.0ns

300
250
200
150
100
50

50 100 150

ARM M0 AES VGA WNS (ps)

Brute-force (orig) Brute-force (opt) Shrunk2D (orig) Shrunk2D (opt) GT2012 (opt) GT2012 (orig) Technology: 28FDSOI

SLIDE 61

61

A. B. Kahng, TAU 2016
Self-aligned multiple patterning (SAMP) + Cutmask
Cut shapes and locations determine dummy wires,

end-of-line (EOL) extensions of wire segments ⇒ affect performance

BACUS15: Co-optimization of
Cut mask minimum spacing rules
EOL extension with usage of multiple cut masks
Metal density constraints (dummy fills)
Insight into achievable tradeoff of performance and cost

SAMP + Cutmask: Dummy and EOL ΔTiming

Original layout dummy fill Final layout extension 1D wires Cut masks cut

[BACUS15]

SLIDE 62

62

A. B. Kahng, TAU 2016

Timing Impacts

Best vs. Worst EOL extension
BEST ILP solution: little impact of EOL

extensions on timing

WORST ILP solution in N5 degrades

up to 196ps compared to N7

Post-ILP optimization is beneficial

to timing

Different metal density with up to

14ps difference

0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05

ARM Cortex M0 N7 ARM Cortex M0 N5 AES N7 AES N5 JPEG N7 JPEG N5 Changes in WNS (ns)

Changes in WNS

BEST WORST

0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01

ARM Cortex M0 AES JPEG Change in WNS (ns)

Change in WNS for different target metal density

40% 42.5% 45%

SLIDE 63

63

A. B. Kahng, TAU 2016

How Do We Regain Time?

Learn correlations, predictions:  kill loops, risk
Enter the “era of optimization”: 1% = 1 week
Take back what we’ve given away
Stop wasting time
Harvest low-hanging fruits
+ resilience, adaptivity (signoff at TT), reliability pessimism
Min cost of resilience (“MinRazor”; OD, AVS-BTI signoff)
PVS ROs
+ new paradigms (stochastic, approximate)
+ pure optimization QOR (P&R&Opt for N10, N7)
? faster RTL design/optimization; DFT …

SLIDE 64

64

A. B. Kahng, TAU 2016

Costs of Reliability

AF (α) Jrms Temp Wire width MTTF Driver size

A B Inverse relation; if A increases then B decreases A B Direct relation; if A increases then B increases

Supply voltage Timing slack |ΔVthp | Wire spacing

TDDB TDDB EM EM

Freq. |ΔVthn | Slew rate Load/ fanout Gate length Junction resistance

EM, TDDB, NBTI, HCI HCI NBTI HCI HCI HCI HCI HCI HCI NBTI

Tunable at design or runtime Tunable at design

general general general general general general general general general general general general general general general general general HCI HCI NBTI

SLIDE 65

65

A. B. Kahng, TAU 2016

N10, N7 P&R: “Opt” Methods

DB violation MinIW violation MinIW violation MinOW violation

flipped Cells are moved

SLIDE 66

66

A. B. Kahng, TAU 2016

In Search of Lost Time

SLIDE 67

67

A. B. Kahng, TAU 2016

Summary

Time = schedule, universal currency, $$$
We lose time in many ways
Some unavoidable
Learning, big data, optimization
Take back what we’ve given away
Stop pointless waste
Many low-hanging fruits
Recovering lost Time = “equivalent scaling”
EDA continues the Moore’s-Law value trajectory

SLIDE 68

68

A. B. Kahng, TAU 2016