 
              Layout Hotspot Detection with Feature Tensor Generation and Deep Biased Learning Haoyu Yang 1 , Jing Su 2 , Yi Zou 2 , Bei Yu 1 , Evangeline F. Y. Young 1 1 The Chinese University of Hong Kong 2 ASML Brion Inc. 1 / 15
Outline Introduction Feature Tensor Generation Biased Learning Experimental Results 2 / 15
Outline Introduction Feature Tensor Generation Biased Learning Experimental Results 3 / 15
Lithography Hotspot Detection Ra#o%of%lithography%simula#on%#me% Required(computa/onal( /me(reduc/on! � (normalized%by%40nm%node)% ◮ RET: OPC, SRAF, MPL ◮ Still hotspot: low fidelity patterns ◮ Simulations: extremely CPU intensive 3 / 15 Technology%node �
Pattern Matching based Hotspot Detection library' hotspot& Pa)ern' hotspot& hotspot& matching' 4 / 15
Pattern Matching based Hotspot Detection detected � undetected � library' hotspot& hotspot& Pa)ern' hotspot& detected � hotspot& matching' Cannot&detect& hotspots¬&in& the&library& ◮ Fast and accurate ◮ [Yu+,ICCAD’14] [Nosato+,JM3’14] [Su+,TCAD’15] ◮ Fuzzy pattern matching [Wen+,TCAD’14] ◮ Hard to detect non-seen pattern 4 / 15
Machine Learning based Hotspot Detection Hotspot& detec*on& Classifica*on& model& Extract&layout& features& 5 / 15
Machine Learning based Hotspot Detection Non$ Hotspot � Hotspot& Hard,to,trade$off, detec*on& accuracy,and,false, Classifica*on& Hotspot � alarms, model& Extract&layout& features& ◮ Predict new patterns ◮ Decision-tree, ANN, SVM, Boosting, Bayesian, ... ◮ [Ding+,TCAD’12][Yu+,JM3’15][Matsunawa+,SPIE’15][Yu+,TCAD’15][Zhang+,ICCAD’16][Wen+,TCAD’14] ◮ Feature reliability and model scalability 5 / 15
Why Deep Learning? 1. Feature Crafting v.s. Feature Learning ◮ Manually designed feature–> Inevitable information loss ◮ Learned feature–> Reliable 2. Scalability ◮ More pattern types ◮ More complicated patterns ◮ Hard to fit millions of data with simple ML model 3. Mature Libraries ◮ Caffe [Jia+,ACMMM’14] ◮ Tensorflow [Martin+,TR’15] 6 / 15
Special Issues for Layout Hotspot Detection Layout image size is large ( ≈ 1000 × 1000 ) ◮ Compared to ImageNet ( ≈ 200 × 200 ) ◮ Associated CNN model is large ◮ Not storage and computational efficient Hotspot detection accuracy is more important ◮ Hotspot –> Circuit Failure Layout clip with 1 nm precision has resolution ◮ False Alarm –> Runtime Overhead 1200 × 1200 ◮ Consider methods for better trade-off between accuracy and falsealarm 7 / 15
Outline Introduction Feature Tensor Generation Biased Learning Experimental Results 8 / 15
Feature Tensor Generation ◮ Clip Partition ◮ Discrete Cosine Transform ◮ Discarding High Frequency Components ◮ Feature Tensor Division 8 / 15
Feature Tensor Generation ◮ Clip Partition ◮ Discrete Cosine Transform ◮ Discarding High Frequency Components ◮ Feature Tensor 20 25 15 20 DCT Division 15 10 10 5 5 0 50 0 0 20 40 60 0 80 100 100 8 / 15
Feature Tensor Generation ◮ Clip Partition ◮ Discrete Cosine Transform ◮ Discarding High Frequency Components ◮ Feature Tensor 20 2 C 11 ,k C 12 ,k C 13 ,k . . . C 1 n,k 3 25 C 21 ,k C 22 ,k C 23 ,k . . . C 2 n,k 6 7 15 6 . . . ... . 7 20 . . . . Encoding 6 7 . . . . DCT Division 4 5 15 C n 1 ,k C n 2 ,k C n 3 ,k . . . C nn,k 10 ( 10 2 3 C 11 , 1 C 12 , 1 C 13 , 1 . . . C 1 n, 1 5 5 0 C 21 , 1 C 22 , 1 C 23 , 1 . . . C 2 n, 1 6 7 50 6 . . . ... . 7 0 k . . . . 0 6 . . . . 7 20 40 60 0 4 5 80 100 100 C n 1 , 1 C n 2 , 1 C n 3 , 1 . . . C nn, 1 8 / 15
CNN Architecture Layer Kernel Size Stride Output Node # conv1-1 3 1 12 × 12 × 16 12 × 12 × 16 conv1-2 3 1 Feature Tensor maxpooling1 2 2 6 × 6 × 16 ◮ k -channel hyper-image 6 × 6 × 32 conv2-1 3 1 6 × 6 × 32 conv2-2 3 1 ◮ Compatible with CNN 3 × 3 × 32 maxpooling2 2 2 ◮ Storage and computional efficiency fc1 N/A N/A 250 fc2 N/A N/A 2 Max Pooling Layer Full Connected Node Convolution + ReLU Layer Hotspot 2 C 11 ,k C 12 ,k C 13 ,k . . . C 1 n,k 3 C 21 ,k C 22 ,k C 23 ,k . . . C 2 n,k 6 7 6 . . . ... . 7 . . . . 6 7 . . . . 4 5 … C n 1 ,k C n 2 ,k C n 3 ,k . . . C nn,k ( 2 C 11 , 1 C 12 , 1 C 13 , 1 . . . C 1 n, 1 3 C 21 , 1 C 22 , 1 C 23 , 1 . . . C 2 n, 1 6 7 . . . . 6 ... 7 k . . . . 6 . . . . 7 4 5 C n 1 , 1 C n 2 , 1 C n 3 , 1 . . . C nn, 1 Non-Hotspot 9 / 15
Outline Introduction Feature Tensor Generation Biased Learning Experimental Results 10 / 15
Recall The Training Procedure ◮ Minimize difference with ground truths y ∗ n = [ 1 , 0 ] , y ∗ h = [ 0 , 1 ] . (1) � N , if y ( 0 ) > 0 . 5 F ∈ (2) H , if y ( 1 ) > 0 . 5 10 / 15
Recall The Training Procedure ◮ Minimize difference with ground truths y ∗ n = [ 1 , 0 ] , y ∗ h = [ 0 , 1 ] . (1) � N , if y ( 0 ) > 0 . 5 F ∈ (2) H , if y ( 1 ) > 0 . 5 ◮ Shifting decision boundary � N , if y ( 0 ) > 0 . 5 + λ F ∈ (3) H , if y ( 1 ) > 0 . 5 − λ 10 / 15
Recall The Training Procedure ◮ Minimize difference with ground truths y ∗ n = [ 1 , 0 ] , y ∗ h = [ 0 , 1 ] . (1) � N , if y ( 0 ) > 0 . 5 F ∈ (2) H , if y ( 1 ) > 0 . 5 ◮ Shifting decision boundary ( ✗ ) � N , if y ( 0 ) > 0 . 5 + λ F ∈ (3) H , if y ( 1 ) > 0 . 5 − λ ◮ Biased ground truth n = [ 1 − ǫ, ǫ ] y ∗ (4) 10 / 15
The Biased Learning Algorithm Training Set Biased Learning v.s. Shift Boundary Shift-Boundary Bias Update ε MGD: y h =[0,1] end-to-end y n =[1- ε , ε ] training 4 , 000 False Alarm 3 , 000 No Stop Criteria 2 , 000 Yes 80 85 90 Trained Model Accuracy (%) 11 / 15
Outline Introduction Feature Tensor Generation Biased Learning Experimental Results 12 / 15
Comparison with Two Hotspot Detectors ◮ Detection accuracy improved from 89.6% to 95.5% 100 Accuracy (%) 80 60 40 ICCAD Industry1 Industry2 Industry3 Average SPIE’15 ICCAD’16 Ours 12 / 15
Comparison with Two Hotspot Detectors ◮ Comparable false alarm penalty 8 , 000 False Alarm 6 , 000 4 , 000 2 , 000 0 ICCAD Industry1 Industry2 Industry3 Average SPIE’15 ICCAD’16 Ours 13 / 15
Comparison with Two Hotspot Detectors ◮ Comparable testing runtime 2 , 500 2 , 000 Runtime (s) 1 , 500 1 , 000 500 0 ICCAD Industry1 Industry2 Industry3 Average SPIE’15 ICCAD’16 Ours 14 / 15
Thank You 15 / 15
Recommend
More recommend