Imbalance Aware Lithography Hotspot Detection: A Deep Learning - - PowerPoint PPT Presentation

imbalance aware lithography hotspot detection a deep
SMART_READER_LITE
LIVE PREVIEW

Imbalance Aware Lithography Hotspot Detection: A Deep Learning - - PowerPoint PPT Presentation

Imbalance Aware Lithography Hotspot Detection: A Deep Learning Approach Haoyu Yang 1 , Luyang Luo 1 , Jing Su 2 , Chenxi Lin 2 , Bei Yu 1 1 The Chinese University of Hong Kong 2 ASML Brion Inc. Mar. 1, 2017 1 / 34 Outline Introduction Network


slide-1
SLIDE 1

Imbalance Aware Lithography Hotspot Detection: A Deep Learning Approach

Haoyu Yang1, Luyang Luo1, Jing Su2, Chenxi Lin2, Bei Yu1

1The Chinese University of Hong Kong 2ASML Brion Inc.

  • Mar. 1, 2017

1 / 34

slide-2
SLIDE 2

Outline

Introduction Network Architecture Imbalance Aware Learning Experimental Results

2 / 34

slide-3
SLIDE 3

Outline

Introduction Network Architecture Imbalance Aware Learning Experimental Results

3 / 34

slide-4
SLIDE 4

Moore’s Law to Extreme Scaling

3 / 34

slide-5
SLIDE 5

4 / 34

slide-6
SLIDE 6

Lithography Hotspot Detection

◮ What you see = what you get ◮ Even w. RET: OPC, SRAF, MPL ◮ Still hotspot: low fidelity patterns ◮ Simulations: extremely CPU intensive

Ra#o%of%lithography%simula#on%#me% (normalized%by%40nm%node)% Technology%node

Required(computa/onal( /me(reduc/on!

5 / 34

slide-7
SLIDE 7

Layout Verification Hierarchy

Increasing verification accuracy

Sampling Hotspot Detection Lithography Simulation

(Relative) CPU runtime at each level

◮ Sampling:

scan and rule check each region

◮ Hotspot Detection:

verify the sampled regions and report potential hotspots

◮ Lithography Simulation:

final verification on the reported hotspots

6 / 34

slide-8
SLIDE 8

Pattern Matching based Hotspot Detection

library'

hotspot&

Pa)ern' matching'

hotspot& hotspot&

7 / 34

slide-9
SLIDE 9

Pattern Matching based Hotspot Detection

library'

hotspot&

Pa)ern' matching'

hotspot& hotspot&

detected

hotspot&

undetected detected

Cannot&detect& hotspots&not&in& the&library&

◮ Fast and accurate ◮ [Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15] ◮ Fuzzy pattern matching [Wen+,TCAD’14] ◮ Hard to detect non-seen pattern

7 / 34

slide-10
SLIDE 10

Machine Learning based Hotspot Detection

Hotspot& detec*on& model&

Classifica*on& Extract&layout& features&

8 / 34

slide-11
SLIDE 11

Machine Learning based Hotspot Detection

Non$ Hotspot Hotspot

Hotspot& detec*on& model&

Classifica*on& Extract&layout& features& Hard,to,trade$off, accuracy,and,false, alarms,

◮ Predict new patterns ◮ Decision-tree, ANN, SVM, Boosting ... ◮ [Drmanac+,DAC’09] [Ding+,TCAD’12] [Yu+,JM3’15] [Matsunawa+,SPIE’15] [Yu+,TCAD’15][Zhang+,ICCAD’16] ◮ Crafted features are not satisfactory ◮ Hard to handle ultra-large datasets.

8 / 34

slide-12
SLIDE 12

Why Deep Learning?

◮ Feature Crafting v.s. Feature Learning

Although prior knowledge is considered during manually feature design, information loss is inevitable. Feature learned from mass dataset is more reliable.

◮ Scalability

With shrinking down circuit feature size, mask layout becomes more complicated. Deep learning has the potential to handle ultra-large-scale instances while traditional machine learning may suffer from performance degradation.

◮ Mature Libraries

Caffe [Jia+,ACMMM’14] and Tensorflow [Martin+,TR’15]

9 / 34

slide-13
SLIDE 13

Hotspot-Oriented Deep Learning

Deep Learning has been widely appied in object recognition tasks. Nature of mask layout impedes the availability of existing frameworks.

◮ Imbalanced Dataset

Lithographic hotspots are always the minority.

◮ Larger Image Size

Effective clip region (> 1000 × 1000 pixels) is much larger than the image size in traditional computer vision problems.

◮ Sensitive to Scaling

Scaling of mask layout patterns modifies its attributes.

10 / 34

slide-14
SLIDE 14

Deep Learning based Hostpot Detection Flow

Training Data Set Validation Testing Data Set Upsampling

Validation Trained Model Training Model Testing Accuracy False Alarm Random Mirroring

11 / 34

slide-15
SLIDE 15

Outline

Introduction Network Architecture Imbalance Aware Learning Experimental Results

12 / 34

slide-16
SLIDE 16

CNN Architecture Overview

◮ Convolution Layer ◮ Rectified Linear Unit (ReLU) ◮ Pooling Layer ◮ Fully Connected Layer

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot 12 / 34

slide-17
SLIDE 17

Convolution Layer

Convolution Operation: I ⊗ K(x, y) =

c

  • i=1

m

  • j=1

m

  • k=1

I(i, x − j, y − k)K(j, k)

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot 13 / 34

slide-18
SLIDE 18

Convolution Layer (cont.)

Effect of different convolution kernel sizes:

(a) 7 × 7 (b) 5 × 5 (c) 3 × 3 Kernel Size Padding Test Accuracy∗

7 × 7

3 87.50%

5 × 5

2 93.75%

3 × 3

1 96.25% ∗Stop after 5000 iterations.

14 / 34

slide-19
SLIDE 19

Rectified Linear Unit

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

◮ Alleviate overfitting with sparse feature map ◮ Avoid gradient vanishing problem

Activation Function Expression Validation Loss ReLU

max{x, 0}

0.16 Sigmoid

1 1+exp(−x)

87.0 TanH

exp(2x)−1 exp(2x)+1

0.32 BNLL

log(1 + exp(x))

87.0 WOAF NULL 87.0

15 / 34

slide-20
SLIDE 20

Pooling Layer

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot

◮ Extracts the local region statistical attributes in the feature map

1 2 3 5 6 7 9 10 11 4 8 12 13 14 15 16 6 16 14 8

MAXPOOL

(a) max pooling

3.5 13.5 11.5 5.5

AVEPOOL

1 2 3 5 6 7 9 10 11 4 8 12 13 14 15 16

(b) avg pooling

16 / 34

slide-21
SLIDE 21

Pooling Layer (cont.)

◮ Translation invarient (✘) ◮ Dimension reduction

Effect of pooling methods: Pooling Method Kernel Test Accuracy Max

2 × 2

96.25% Ave

2 × 2

96.25% Stochastic

2 × 2

90.00%

17 / 34

slide-22
SLIDE 22

Fully Connected Layer

◮ Fully connected layer transforms high dimension feature maps

into flattened vector.

CONV

max(0,x)

ReLU POOL CONV

max(0,x)

ReLU POOL …… FC Hotspot Non-hotspot 18 / 34

slide-23
SLIDE 23

Fully Connected Layer (cont.)

◮ A percentage of nodes are dropped out (i.e. set to zero) ◮ avoid overfitting

Effect of dropout ratio:

0.5 1 90.00 95.00 100.00

Dropout Ratio Accuracy (%)

… … 16x16x32 2048 512 C5-3 P5

……

Convolutional Hidden Layers

19 / 34

slide-24
SLIDE 24

Fully Connected Layer (cont.)

◮ A percentage of nodes are dropped out (i.e. set to zero) ◮ avoid overfitting

Effect of dropout ratio:

0.5 1 90.00 95.00 100.00

Dropout Ratio Accuracy (%)

… … 16x16x32 2048 512 C5-3 P5

……

Convolutional Hidden Layers

19 / 34

slide-25
SLIDE 25

Architecture Summary

◮ Total 21 layers with 13 convolution layers and 5 pooling layers. ◮ A ReLU is applied after each convolution layer.

… …

Hotspot Non-Hotspot

512x512x4 256x256x4 256x256x8 128x128x16 128x128x8 64x64x16 64x64x32 32x32x32 32x32x32 16x16x32 2048 512 C1 P1 C2-1C2-2 C2-3 P2 C3-1 C3-2 C4-1 P3 C3-3 C4-2 C4-3 C5-1 C5-2 C5-3 P4 P5

20 / 34

slide-26
SLIDE 26

Architecture Summary

Layer Kernel Size Stride Padding Output Vertexes Conv1-1

2 × 2 × 4

2

512 × 512 × 4

Pool1

2 × 2

2

256 × 256 × 4

Conv2-1

3 × 3 × 8

1 1

256 × 256 × 8

Conv2-2

3 × 3 × 8

1 1

256 × 256 × 8

Conv2-3

3 × 3 × 8

1 1

256 × 256 × 8

Pool2

2 × 2

2

128 × 128 × 8

Conv3-1

3 × 3 × 16

1 1

128 × 128 × 16

Conv3-2

3 × 3 × 16

1 1

128 × 128 × 16

Conv3-3

3 × 3 × 16

1 1

128 × 128 × 16

Pool3

2 × 2

2

64 × 64 × 16

Conv4-1

3 × 3 × 32

1 1

64 × 64 × 32

Conv4-2

3 × 3 × 32

1 1

64 × 64 × 32

Conv4-3

3 × 3 × 32

1 1

64 × 64 × 32

Pool4

2 × 2

2

32 × 32 × 32

Conv5-1

3 × 3 × 32

1 1

32 × 32 × 32

Conv5-2

3 × 3 × 32

1 1

32 × 32 × 32

Conv5-3

3 × 3 × 32

1 1

32 × 32 × 32

Pool5

2 × 2

2

16 × 16 × 32

FC1 – – –

2048

FC2 – – –

512

FC3 – – –

2

21 / 34

slide-27
SLIDE 27

Outline

Introduction Network Architecture Imbalance Aware Learning Experimental Results

22 / 34

slide-28
SLIDE 28

Minority Upsampling

Layout datasets are highly imbalanced as after resolution enhancement techniques (RETs) the lithographic hotspots are always the minority.

I C C A D

  • 1

I C C A D

  • 2

I C C A D

  • 3

I C C A D

  • 4

I C C A D

  • 5

50 100

Percentage (%) non-hotspot hotspot

22 / 34

slide-29
SLIDE 29

Minority Upsampling

Layout datasets are highly imbalanced as after resolution enhancement techniques (RETs) the lithographic hotspots are always the minority.

I C C A D

  • 1

I C C A D

  • 2

I C C A D

  • 3

I C C A D

  • 4

I C C A D

  • 5

50 100

Percentage (%) non-hotspot hotspot

◮ Multi-label learning

[Zhang+,IJCAI’15]

◮ Majority downsampling

[Ng+,TCYB’15]

◮ Pseudo instance generation

[He+,IJCNN’08] Artifically generated instances might not be available because of mask layout nature.

22 / 34

slide-30
SLIDE 30

Minority Upsampling

Layout datasets are highly imbalanced as after resolution enhancement techniques (RETs) the lithographic hotspots are always the minority.

I C C A D

  • 1

I C C A D

  • 2

I C C A D

  • 3

I C C A D

  • 4

I C C A D

  • 5

50 100

Percentage (%) non-hotspot hotspot

◮ Multi-label learning

[Zhang+,IJCAI’15]

◮ Majority downsampling

[Ng+,TCYB’15]

◮ Pseudo instance generation

[He+,IJCNN’08] Artifically generated instances might not be available because of mask layout nature.

◮ Naïve upsampling ()

  • 1. Gradient descent
  • 2. Insufficient training samples

22 / 34

slide-31
SLIDE 31

Random Mirror Flipping

◮ Before fed into neural network ◮ Each instance is taking one of 4 orientations ◮ Resolve insufficient data

Mirror

23 / 34

slide-32
SLIDE 32

Effectiveness of Upsampling

5 10 15 20 25 0.00 50.00 100.00

Upsampling Factor Accuracy (%) Validation performance does not show further improvement when the upsampling factor increases beyond a certain value.

24 / 34

slide-33
SLIDE 33

Learning Rate

γ: defines how fast the neuron weights are updated

wi = wi − γ ∂l ∂wi . 500 1,000 1,500 2,000 0.00 0.50 1.00 1.50

Iteration Count Validation Loss

γ=0.1 γ=0.01 γ=0.001 γ=0.0001

25 / 34

slide-34
SLIDE 34

Momentum and Weight Decay

◮ Momentum

Physical meaning is involved into gradient descent.

v = µv − γ ∂l ∂wi , wi = wi + v.

◮ Weight Decay

An alternative to achieve L2 regularization on neuron weights.

v = µv − γ ∂l ∂wi − γλwi, wi = wi + v.

26 / 34

slide-35
SLIDE 35

Momentum and Weight Decay (cont.)

◮ Momentum Effects:

µ

Learning Rate Validation Loss 0.5 0.001 0.21 0.9 0.001 0.22 0.95 0.001 0.21 0.99 0.001 0.16

◮ Weight Decay Effects:

λ

Learning Rate Momentum Validation Loss

10−3

0.001 0.99 0.95

10−4

0.001 0.99 1.19

10−5

0.001 0.99 0.37

10−6

0.001 0.99 0.2

27 / 34

slide-36
SLIDE 36

Weight Initialization

The weight initialization procedure determines what initial values assigned to each neuron before the gradient descent update starts.

◮ Random Gaussian (✘)

Cannot guarantee input &

  • utput have similar variance.

28 / 34

slide-37
SLIDE 37

Weight Initialization

The weight initialization procedure determines what initial values assigned to each neuron before the gradient descent update starts.

◮ Random Gaussian (✘)

Cannot guarantee input &

  • utput have similar variance.

◮ Xavier [Xavier+,AISTATS’10]

Initialized weights are determined by input node number.

ˆ V(wi) = 1 N .

500 1,000 1,500 2,000 0.00 0.50 1.00 1.50

Iteration Count Validation Loss Gaussian_std=0.1 Gaussian_std=0.01 Gaussian_std=0.001 Xavier 28 / 34

slide-38
SLIDE 38

Outline

Introduction Network Architecture Imbalance Aware Learning Experimental Results

29 / 34

slide-39
SLIDE 39

Experimental Setup

◮ Based on Caffe [Jia+,ACMMM’14] ◮ Evaluated on ICCAD-2012 CAD contest benchmark

Evaluation metrics:

Accuracy

The ratio between the number of correctly detected hotspot clips and the number of all hotspot clips.

ODST

The sum of all lithographic simulation time for false alarm† and the deep learning model testing time. ODST = Test Time + 10s × # of False Alarm

†False alarm: the number of non-hotspot clips that are reported as hotspots by

detector.

29 / 34

slide-40
SLIDE 40

Layer Visualization

Origin Pool1 Pool2 Pool3 Pool4 Pool5 30 / 34

slide-41
SLIDE 41

Compare Accuracy with State-of-the-Art‡

ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average

85 90 95 100

Accuracy (%) JM3’16 TCAD’15 ICCAD’16 Ours ‡JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based.

31 / 34

slide-42
SLIDE 42

Compare ODST with State-of-the-Art

◮ Improve the performance of ODST by at least 24.80% on average.

ICCAD-1 ICCAD-2 ICCAD-3 ICCAD-4 ICCAD-5 Average

103 104 105

ODST (s) JM3’16 TCAD’15 ICCAD’16 Ours

JM3’16: CNN based; TCAD’15: SVM based; ICCAD’16: Boosting based.

32 / 34

slide-43
SLIDE 43

Conclusion

We explore the feasibility of deep learning as an alternative approach for hotspot detection.

◮ Hotspot-detection-oriented hyper-parameter tuning ◮ Imbalance Issue: Upsampling & Random mirror flipping ◮ Outperform state-of-the-art solutions

… … Hotspot Non-Hotspot 512x512x4 256x256x4 256x256x8 128x128x16 128x128x8 64x64x16 64x64x32 32x32x32 32x32x32 16x16x32 2048 512 C1 P1 C2-1C2-2 C2-3 P2 C3-1 C3-2 C4-1 P3 C3-3 C4-2 C4-3 C5-1 C5-2 C5-3 P4 P5

33 / 34

slide-44
SLIDE 44

Conclusion

We explore the feasibility of deep learning as an alternative approach for hotspot detection.

◮ Hotspot-detection-oriented hyper-parameter tuning ◮ Imbalance Issue: Upsampling & Random mirror flipping ◮ Outperform state-of-the-art solutions

… … Hotspot Non-Hotspot 512x512x4 256x256x4 256x256x8 128x128x16 128x128x8 64x64x16 64x64x32 32x32x32 32x32x32 16x16x32 2048 512 C1 P1 C2-1C2-2 C2-3 P2 C3-1 C3-2 C4-1 P3 C3-3 C4-2 C4-3 C5-1 C5-2 C5-3 P4 P5

Future Works

◮ Test on larger scale test cases ◮ Further simplify architecture to speedup ◮ Seek other VLSI layout applications (e.g., OPC, SRAF)

33 / 34

slide-45
SLIDE 45

Thank You

Haoyu Yang (hyyang@cse.cuhk.edu.hk) Luyang Luo (lyluo4@cse.cuhk.edu.hk) Jing Su (jing.su@asml.com) Chenxi Lin (chenxi.lin@asml.com) Bei Yu (byu@cse.cuhk.edu.hk)

34 / 34