Layout Hotspot Detection with Feature Tensor Generation and Deep Biased Learning
Haoyu Yang1, Jing Su2, Yi Zou2, Bei Yu1, Evangeline F. Y. Young1
1The Chinese University of Hong Kong 2ASML Brion Inc.
1 / 15
Layout Hotspot Detection with Feature Tensor Generation and Deep - - PowerPoint PPT Presentation
Layout Hotspot Detection with Feature Tensor Generation and Deep Biased Learning Haoyu Yang 1 , Jing Su 2 , Yi Zou 2 , Bei Yu 1 , Evangeline F. Y. Young 1 1 The Chinese University of Hong Kong 2 ASML Brion Inc. 1 / 15 Outline Introduction
1The Chinese University of Hong Kong 2ASML Brion Inc.
1 / 15
2 / 15
3 / 15
◮ RET: OPC, SRAF, MPL ◮ Still hotspot: low fidelity patterns ◮ Simulations: extremely CPU intensive
Ra#o%of%lithography%simula#on%#me% (normalized%by%40nm%node)% Technology%node
Required(computa/onal( /me(reduc/on!
3 / 15
4 / 15
◮ Fast and accurate ◮ [Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15] ◮ Fuzzy pattern matching [Wen+,TCAD’14] ◮ Hard to detect non-seen pattern
4 / 15
5 / 15
◮ Predict new patterns ◮ Decision-tree, ANN, SVM, Boosting, Bayesian, ... ◮ [Ding+,TCAD’12][Yu+,JM3’15][Matsunawa+,SPIE’15][Yu+,TCAD’15][Zhang+,ICCAD’16][Wen+,TCAD’14] ◮ Feature reliability and model scalability
5 / 15
◮ Manually designed feature–> Inevitable information loss ◮ Learned feature–> Reliable
◮ More pattern types ◮ More complicated patterns ◮ Hard to fit millions of data with simple ML model
◮ Caffe [Jia+,ACMMM’14] ◮ Tensorflow [Martin+,TR’15]
6 / 15
◮ Compared to ImageNet (≈ 200 × 200) ◮ Associated CNN model is large ◮ Not storage and computational efficient
◮ Hotspot –> Circuit Failure ◮ False Alarm –> Runtime Overhead ◮ Consider methods for better trade-off between
Layout clip with 1nm precision has resolution 1200 × 1200
7 / 15
8 / 15
◮ Clip Partition ◮ Discrete Cosine Transform ◮ Discarding High Frequency Components ◮ Feature Tensor
Division
8 / 15
◮ Clip Partition ◮ Discrete Cosine Transform ◮ Discarding High Frequency Components ◮ Feature Tensor
Division DCT
50 100 20 40 60 80 100 5 10 15 20 25 5 10 15 208 / 15
◮ Clip Partition ◮ Discrete Cosine Transform ◮ Discarding High Frequency Components ◮ Feature Tensor
Division DCT
50 100 20 40 60 80 100 5 10 15 20 25 5 10 15 202 6 6 6 4 C11,1 C12,1 C13,1 . . . C1n,1 C21,1 C22,1 C23,1 . . . C2n,1 . . . . . . . . . ... . . . Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1 3 7 7 7 5 2 6 6 6 4 C11,k C12,k C13,k . . . C1n,k C21,k C22,k C23,k . . . C2n,k . . . . . . . . . ... . . . Cn1,k Cn2,k Cn3,k . . . Cnn,k 3 7 7 7 5
(
k
Encoding
8 / 15
◮ k-channel hyper-image ◮ Compatible with CNN ◮ Storage and computional efficiency Layer Kernel Size Stride Output Node # conv1-1 3 1 12 × 12 × 16 conv1-2 3 1 12 × 12 × 16 maxpooling1 2 2 6 × 6 × 16 conv2-1 3 1 6 × 6 × 32 conv2-2 3 1 6 × 6 × 32 maxpooling2 2 2 3 × 3 × 32 fc1 N/A N/A 250 fc2 N/A N/A 2
… Hotspot Non-Hotspot Convolution + ReLU Layer Max Pooling Layer Full Connected Node
2 6 6 6 4 C11,1 C12,1 C13,1 . . . C1n,1 C21,1 C22,1 C23,1 . . . C2n,1 . . . . . . . . . ... . . . Cn1,1 Cn2,1 Cn3,1 . . . Cnn,1 3 7 7 7 5 2 6 6 6 4 C11,k C12,k C13,k . . . C1n,k C21,k C22,k C23,k . . . C2n,k . . . . . . . . . ... . . . Cn1,k Cn2,k Cn3,k . . . Cnn,k 3 7 7 7 5
(
k
9 / 15
10 / 15
◮ Minimize difference with ground truths
n = [1, 0], y∗ h = [0, 1].
10 / 15
◮ Minimize difference with ground truths
n = [1, 0], y∗ h = [0, 1].
◮ Shifting decision boundary
10 / 15
◮ Minimize difference with ground truths
n = [1, 0], y∗ h = [0, 1].
◮ Shifting decision boundary (✗)
◮ Biased ground truth
n = [1 − ǫ, ǫ]
10 / 15
Training Set Update ε yh=[0,1] yn=[1-ε, ε] MGD: end-to-end training Stop Criteria Trained Model Yes No Biased Learning v.s. Shift Boundary
11 / 15
12 / 15
◮ Detection accuracy improved from 89.6% to 95.5%
12 / 15
◮ Comparable false alarm penalty
13 / 15
◮ Comparable testing runtime
14 / 15
15 / 15