USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY Mark - - PowerPoint PPT Presentation

using machine learning for vlsi
SMART_READER_LITE
LIVE PREVIEW

USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY Mark - - PowerPoint PPT Presentation

USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY Mark Ren, Miloni Mehta TAKE-HOME MESSAGES Machine learning can improve approximate solutions for hard problems. Machine learning can accurately predict and replace brute force


slide-1
SLIDE 1

Mark Ren, Miloni Mehta

USING MACHINE LEARNING FOR VLSI TESTABILITY AND RELIABILITY

slide-2
SLIDE 2

2

TAKE-HOME MESSAGES

  • Machine learning can improve approximate solutions for hard

problems.

  • Machine learning can accurately predict and replace brute force

methods for computational expensive problems.

slide-3
SLIDE 3

3 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VLSI TESTABILITY AND RELIABILITY

Design Manufacturing Wafer Chip Testing Fail Pass Years

Testability Reliability

slide-4
SLIDE 4

4

PART 1

Testability Prediction and Test Point Insertion with Graph Convolutional Network (GCN)

Mark Ren, Brucek Khailany, Harbinder Sikka, Lijuan Luo, Karthikeyan Natarajan Yuzhe Ma, Bei Yu

“High Performance Graph Convolutional Networks with Applications in Testability Analysis”, to appear in Proceedings of Design Automation Conference, 2019

slide-5
SLIDE 5

5

PART 2

Full Chip FinFET Self-heat Prediction using Machine Learning

Miloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski

slide-6
SLIDE 6

6

PART 1 OUTLINE

Introduction Learning model for testability analysis and enhancement Practical issues

Scalability Data imbalance

slide-7
SLIDE 7

7

HOW DO WE TEST A CHIP

100010 000101 100111 011101 010101 101111 001011 110101 010101 101111 001011 110101

Input patterns

  • utput patterns

golden patterns

?

Stuck-at-0 fault

GND

010101 111111 001111 110101

slide-8
SLIDE 8

8

TESTABILITY PROBLEM

i0 O

Almost always 0 B’s faults unobservable → Difficult-to-test (DT)

i1 i2 i3 i4 i5 i6 i7 i8 i9 i10

B A

Almost always 0

B’s faults are observable with an inserted register Test Point (TP) Stuck-at-0 fault

GND

slide-9
SLIDE 9

9

MOTIVATION

Test Point Insertion Problem:

Pick the smallest number of test points to achieve the largest testability enhancement Number of test points → chip area cost Number of test patterns → test time

Hard problem, only approximate solutions exist

Commercial solution: Synopsys TetraMax

Can we improve it with Machine Learning?

Predict testability Select test points

slide-10
SLIDE 10

10

ML BASED TESTABILITY PREDICTION

Given a circuit, predict which gate outputs are difficult-to-test (DT)

Gate Features: [logic level, SCOAP_C0, SCOAP_C1, SCOAP_OB] Gate Label: DT (0 or 1) generated by TetraMax

Input Features N1: 0,0,1,1 N2: 1,0,1,0 N3: 2,0,1,1 . . . Output classification N1: 0 N2: 1 N3: 0 . . .

ML Model

slide-11
SLIDE 11

11

BASIC MACHINE LEARNING MODELING

Did not fully leverage the inductive bias of circuit structure

1 a 3 4 2 5 6 7 8 9 10 11 12 13

F(a) = [Fa, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10]

fanin fanout

ML Models

LR RF SVM MLP

a is DT a is not DT fanin fanout

slide-12
SLIDE 12

12

GRAPH CONVOLUTIONAL NETWORK (GCN)

9 2 5 4 6 7 3 1 8 Aggregation (mean, sum) Encoding (R4 → R32,Relu)

slide-13
SLIDE 13

13

GCN BASED TESTABILITY PREDICTION

Weighted sum & Relu(R4 → R32) Weighted sum & Relu(R32 → R64) 1 1

Layer 1 Layer 2 Layer 3 Fully Connected Layers

Weighted sum & Relu(R64 → R128) (64,64,128,2)

slide-14
SLIDE 14

14

ACCURACY IMPACT OF GCN LAYERS (K)

60 65 70 75 80 85 90 95 100 1 31 61 91 121 151 181 211 241 271 Epochs

Training Accuracy (%)

K=1 K=2 K=3 60 65 70 75 80 85 90 95 100 1 31 61 91 121 151 181 211 241 271 Epochs

Testing Accuracy(%)

slide-15
SLIDE 15

15

EMBEDDING VISUALIZATION

K=1 K=2 K=3

  • Embeddings looks more discriminative as stage increase;
slide-16
SLIDE 16

16

MODEL COMPARISON ON BALANCED DATASET

Compare with basic ML modeling: LR, RF, MLP, SVM N=500 nodes in fanin cone and 500 nodes in fanout cone, a total of 1000 nodes Compare to 3-layer GCN

Less than 1000 nodes influence each node, comparable with the baseline

GCN has the best accuracy (93%).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Precision Recall F1 score Accuracy

slide-17
SLIDE 17

17

TEST POINT INSERTION WITH GCN MODEL

An iterative process to select TPs enabled by GCN model Select TP candidate based on predicted impact

Number of reduced DTs in the fanin cone of TP Graph Circuit GCN Model Graph Modification GCN Model Impact Estimation Point Selection Graph Modification Done? N Y TP Candidates new TP new TP Final TPs

slide-18
SLIDE 18

18

TEST POINT INSERTION RESULTS COMPARISON

11% less test points with 6% less test pattern under same coverage vs TetraMax.

Machine learning can improve approximate solutions for hard problems

  • 15.00%
  • 10.00%
  • 5.00%

0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 1 2 3 4

Test point reduction Test pattern reduction

slide-19
SLIDE 19

19

MODEL SCALABILITY

Choices of model implementation

Batch processing: Recursion Full graph: Sparse matrix multiplication 𝐹𝑙 = 𝑆𝑓𝑀𝑉((𝐵 ∗ 𝐹𝑙−1) ∗ 𝑋

𝑙)

Tradeoff

Memory vs speed

1M nodes/second on Volta GPU

slide-20
SLIDE 20

20

MULTI GPU TRAINING

Training dataset has multiple million gates designs that can not fit on one GPU Data parallelism, each GPU computes one design/graph Replicate models across multiple GPUs Leverage PyTorch DataParallel module Trained with 4 Tesla V100 GPUs on DGX1 Shared model GPU1 Shared model GPU2

Graph1 Graph2

Shared model GPU3 Shared model GPU4

Graph3 Graph4

Δ

slide-21
SLIDE 21

21

IMBALANCE ISSUE

It is very common to have much more non-DTs (negative class) than DTs (positive class), imbalance ratio more than 100X

Predict: 0 Predict: 1 Fact: 0 133576 290 Fact: 1 3681 432

Classifier 1: ok precision, low recall

Predict: 0 Predict: 1 Fact: 0 100919 32927 Fact: 1 114 4069

Classifier 2: high recall, low precision

Recall: 10.5% Precision: 59.8% Recall: 97.3% Precision: 11.0%

slide-22
SLIDE 22

22

MULTI-STAGE CLASSIFICATION

The networks on initial stages only filter out negative data points with high confidence

High recall, low precision

Positive predictions are sent to the network on the next stage

Network 1 Network 2 Network 3

  • +
  • +
  • +
slide-23
SLIDE 23

23

MULTI-STAGE CLASSIFICATION RESULT

Balanced Recall and Precision

Pred: 0 Pred: 1 Fact: 0 100919 32927 Fact: 1 114 4069 Pred: 0 Pred: 1 Fact: 0 26935 5992 Fact: 1 221 3848 Pred: 0 Pred: 1 Fact: 0 5207 785 Fact: 1 309 3539 Pred: 0 Pred: 1 Fact: 0 133061 785 Fact: 1 574 3539

Overall Recall: 86.0% Precision: 81.8% Stage 1 Recall: 97.3% Precision: 11.0% Stage 2 Recall:94.6% Precision: 39.1% Stage 3 Recall: 92.05 Precision: 81.8%

slide-24
SLIDE 24

24

PART 1 - SUMMARY

Machine learning can improve VLSI design testability beyond the existing solution

Predictive power of ML model

Graph based model is suitable for VLSI problems Practical issues such as scalability and data imbalance need to be dealt with

slide-25
SLIDE 25

25

PART 2

Full Chip FinFET Self-heat Prediction using Machine Learning

Miloni Mehta, Chi Keung Lee, Chintan Shah, Kirk Twardowski

slide-26
SLIDE 26

26 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VLSI TESTABILITY AND RELIABILITY

Design Manufacturing Wafer Chip Testing Fail Pass Years

Testability Reliability

slide-27
SLIDE 27

27

SEMICONDUCTOR RELIABILITY

Source: https://semiengineering.com/improving-automotive-reliability/

slide-28
SLIDE 28

28

RELIABILITY

Active power in transistors dissipated as heat to the surroundings FinFETs are more sensitive to SH than planar devices Why do we care?

Exacerbates Electro-migration (EM) on interconnects Transistor threshold voltage (Vt) shifts Time dependent dielectric breakdown (TDDB)

DEVICE SELF-HEAT (SH)

0.5 1 1.5 2 2.5 90 110 130 150 170

EM rating factor (Imax) Temperature

16ff EM limit reduction vs Temperature

slide-29
SLIDE 29

29

SH METHODOLOGIES SO FAR

No sign-off tool that can handle full chip SH analysis Limitations using Spice simulations

Impractical to run on billions of transistors Teams review high power density cells

2D Look-up Table approach

Based on frequency and capacitive loading for different clock drivers Reduced run time by more than 90% over full Spice simulations Pessimistic wrt Spice

LUT SPICE 2D LUT vs Spice comparison

Temperature(C)

slide-30
SLIDE 30

30

SELF-HEAT TRENDS

Frequency ∝ SH Capacitive loading ∝ SH Cell size ∝ 1/SH Resistance ∝ 1/SH (non- linear)

Predicted SH Frequency R/C (1e12) Normalised SH

slide-31
SLIDE 31

31

MOTIVATION TO USE ML

Identify problematic cells in the design without exhaustive Spice simulations

Complex relationship between design and SH Design database available for several projects Reusability across projects

Focus

Clock inverters and buffers Quick, easy, light-weight Rank cells above certain SH threshold for thorough analysis

slide-32
SLIDE 32

32

MACHINE LEARNING MODEL

Select Training Data Get Attributes from PrimeTime Simulate in HSPICE Generate ML Model

Equation: Y^ = ?

Ytraining Xtraining

Select Test Data

Get Attributes from PrimeTime

Simulate in HSPICE

Prediction on Test Set

X Y No Yes

(Predicted == Spice)?

Ready for Deployment

Validation

test test

Ypred-

test

slide-33
SLIDE 33

33

DATASET SELECTION

Cover a wide range of frequencies Cover different types of standard cell sizes Prevent duplication in training data due to replicated partitions/chiplets Outliers in the design chosen Labels obtained through Spice simulations (supported from foundry spice models) TSMC 16nm FinFET training model used 4300 training samples with 9 features

slide-34
SLIDE 34

34

DNN REGRESSOR MODEL

Xn1 Xn2 Xn9

Input Layer hidden layer 1 hidden layer 2 hidden layer 3 Output layer

Predicted Self-Heat Yn^

X11 X12 . . . X19 X21 X22 . . . X29 . . . Xn1 Xn2 . . . Xn9

Cost = Σ (Ypred- Y)2

Features:

Output Capacitance Frequency Cell size Net resistance Input slew Output slew # of output loads Input Capacitance of loads Avg transition on load

N

slide-35
SLIDE 35

35

MINIMIZING COST FUNCTION

Gradient descent Adam optimizer which has adaptive learning rate Exponential Linear Unit (ELU) used as activation function 300,000 training steps

slide-36
SLIDE 36

36

RESULTS

Xavier CPU 2000 validation samples Good correlation between DNN prediction and Spice SH Average err % wrt Spice = 6.5% MSE = 0.05

Spice SH

slide-37
SLIDE 37

37

QUANTITATIVE BENEFITS

Trained model is deployed for inference on millions of clock cells

Training time: 37 minutes (DGX1 used) Inference time: <1min

>99% cells filtered from Spice simulations! Top 1000 prediction results simulated and verified Found small clock tree cells had highest SH Outlier detection improved inference by 2.65% in Turing

slide-38
SLIDE 38

38

COMPARISON TO PRIOR WORK

Instance #

slide-39
SLIDE 39

39

PART 2 - SUMMARY

FinFET Self-Heat is a growing reliability concern Proposed supervised ML model using DNN

Accurately predict Self-heat 100x runtime improvement

Displayed techniques to select representative dataset for training Model deployed for Xavier and Turing projects Use ML techniques to improve productivity and solve challenging problems in VLSI

slide-40
SLIDE 40