Segmental Semi-Markov Models for Endpoint Detection in Plasma - - PowerPoint PPT Presentation

segmental semi markov models for endpoint detection in
SMART_READER_LITE
LIVE PREVIEW

Segmental Semi-Markov Models for Endpoint Detection in Plasma - - PowerPoint PPT Presentation

Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching Xianping Ge and Padhraic Smyth Information and Computer Science University of California, Irvine www.ics.uci.edu/ ~datalab Acknowledgements Thanks to Wenli Collison, Tom


slide-1
SLIDE 1

Ge and Smyth, AEC/APC XII: 1

Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching

Xianping Ge and Padhraic Smyth Information and Computer Science University of California, Irvine www.ics.uci.edu/ ~datalab

Acknowledgements

Thanks to Wenli Collison, Tom Ni, and David Hemker

  • f LAM Research for providing the data.
slide-2
SLIDE 2

Ge and Smyth, AEC/APC XII: 2

Outline

  • Problem Statement
  • two techniques for endpoint detection in plasma

etching

  • Change-point detection
  • Pattern matching
  • Segmental Semi-Markov Model
  • standard hidden Markov model (HMM)
  • semi-Markov model
  • segmental Markov model
  • Algorithms and Experimental Results
slide-3
SLIDE 3

Ge and Smyth, AEC/APC XII: 3

Change-Point Detection Problem

  • Single wavelength interferometry data from LAM 9400 Plasma Etch
  • Problem: can one automate online detection of the change-point ?

35 40 45 50 55 60 4000 4500 5000 5500 6000 6500 7000 Time (seconds) BEST "VISUAL" ESTIMATE OF CHANGE POINT

slide-4
SLIDE 4

Ge and Smyth, AEC/APC XII: 4

Fitting Two Quadratic Segments

200 210 220 230 240 250 260 4500 5000 5500 6000 6500 7000 Time Y

slide-5
SLIDE 5

Ge and Smyth, AEC/APC XII: 5

Segmental Semi-Markov Model for Change-point Detection

  • Each segment corresponds to one state in the model.

S=1 S=2 Segments States

  • Change-point = boundary between the two states.
  • If only we can infer the (hidden) states from the data!
slide-6
SLIDE 6

Ge and Smyth, AEC/APC XII: 6

Pattern-Based End-Point Detection

50 100 150 200 250 300 350 400 200 300 400 500

Example Pattern

SENSOR OUTPUT

End-Point

  • f Main Etch
slide-7
SLIDE 7

Ge and Smyth, AEC/APC XII: 7

Pattern-Based End-Point Detection

TIME (SECONDS)

50 100 150 200 250 300 350 400 200 300 400 500 50 100 150 200 250 300 350 400 200 300 400 500

Example Pattern New Pattern

SENSOR OUTPUT SENSOR OUTPUT

slide-8
SLIDE 8

Ge and Smyth, AEC/APC XII: 8

Example Pattern vs. New Pattern:

Example Pattern New Pattern

  • Different
  • Dynamic Range
  • Mean Amplitude
  • Duration
slide-9
SLIDE 9

Ge and Smyth, AEC/APC XII: 9

Sketch of the proposed method

1 Represent the example pattern as piecewise

linear (or quadratic, polynomial, …)

2 Build a probabilistic template model from the

piecewise linear representation

  • Each segment of the piecewise linear

representation corresponds to a state in the model

States S=1 S=2 S=M Segments

slide-10
SLIDE 10

Ge and Smyth, AEC/APC XII: 10

3 Given new data (candidate pattern)

  • Are the (hidden) states the same as in the model?
  • If yes, the new data is similar to the example pattern.

Segments States S=1 S=2 S=M

Candidate Pattern Model

slide-11
SLIDE 11

Ge and Smyth, AEC/APC XII: 11

Problem Statement: a summary

  • Data are represented as segments
  • Change-point detection: two quadratic segments
  • Pattern matching: piecewise linear representation
  • The states in the model correspond to the

segments in the data.

  • The problems will be solved, if we can
  • Infer the hidden states in the data !
slide-12
SLIDE 12

Ge and Smyth, AEC/APC XII: 12

Next ...

  • Problem Statement
  • Segmental Semi-Markov Model
  • Algorithms and Experimental Results
  • Change-point detection
  • Pattern matching
slide-13
SLIDE 13

Ge and Smyth, AEC/APC XII: 13

Segmental Semi-Markov Model

State transitions Semi-Markov state duration Regression in segment t

States S=1 S=2 S=M Data

slide-14
SLIDE 14

Ge and Smyth, AEC/APC XII: 14

Markov Model

  • M states
  • The states correspond to segments of the data
  • At time t= 0, the system is in state i with probability

P(S0 = i )

  • Transition probability matrix A
  • A (i, j) = P(St+ 1 = j | St = i)
  • I.e., A (i, j) is the probability of switching from state i

to state j

slide-15
SLIDE 15

Ge and Smyth, AEC/APC XII: 15

Hidden Markov Model

  • The states St are not directly observable (hidden)
  • The observed data Y t depends on the state St
  • P(Yt = y | St = i )
  • From Y 1 Y 2 ...Y t ... Y T , the most likely state

sequence S1 S2 … St ... ST can be computed by the Viterbi algorithm in time linear in T .

slide-16
SLIDE 16

Ge and Smyth, AEC/APC XII: 16

Limitation of standard Markov model

  • A Markov model imposes a geometric distribution
  • ver the state duration:
  • The probability of staying in state i for n units of

time is A (i,i) n-1[1-A (i,i)]

slide-17
SLIDE 17

Ge and Smyth, AEC/APC XII: 17

Semi-Markov Model: Explicit state duration modeling

  • Can specify non-geometric distribution for state

duration (Gamma, normal, etc.)

  • E.g., “The system will stay in state i for about 10 seconds”
slide-18
SLIDE 18

Ge and Smyth, AEC/APC XII: 18

Limitation of standard Markov model

  • Given the current state St = i , the observed data Y t is

independent of time t : P(Y t = y | St = i )

  • When the system is staying in state i, the observed

data Y t will have a constant distribution:

  • Cannot model the shape of the linear, quadratic

segments !

S1 S2 ST

  • - - - - - - -
slide-19
SLIDE 19

Ge and Smyth, AEC/APC XII: 19

Segmental Markov Model: Modeling the shape of the segments

  • Each segment corresponds to a regression function,

e.g., linear, quadratic, polynomial

  • For example, the two quadratic segments in the

change-point detection problem:

S=1 S=2 Segments States

slide-20
SLIDE 20

Ge and Smyth, AEC/APC XII: 20

From Standard Markov Models to Segmental Semi-Markov Models

  • The length of the segments can be directly modeled.
  • The shape of the segments can be linear, quadratic,

polynomial …

  • From Y 1 Y 2 ...Y t ... Y T , find the most likely state

sequence S1 S2 … St ... ST

  • Generalization of the Viterbi algorithm
  • Online, efficient
slide-21
SLIDE 21

Ge and Smyth, AEC/APC XII: 21

Next ...

  • Problem Statement
  • Segmental Semi-Markov Model
  • Algorithms and Experimental Results
  • Change-point detection
  • Pattern matching
slide-22
SLIDE 22

Ge and Smyth, AEC/APC XII: 22

Change-point Detection

  • Given a segmental Semi-Markov model, compute the

most likely state sequence S1 S2 … St ... ST from

  • bserved data Y 1 Y 2 ...Y t ... Y T , and the change-

point will be the smallest t such that St = 2 (I.e., when

switching from state 1 to state 2)

  • The parameters of the model can be estimated from

training data, or, if no training data are available, can be estimated from real time data using Expectation- Maximization (EM) algorithm.

slide-23
SLIDE 23

Ge and Smyth, AEC/APC XII: 23

On-line estimation of model parameters using EM algorithm

  • Guess at some initial parameters θ

θ θ θ

  • Calculate the state probabilities given θ

θ θ θ

  • Now re-estimate the θ

θ θ θ parameters given the state

probabilities

  • Use weighted least-squares regression
  • Repeat the cycle until convergence

θ parameters State Probabilities

slide-24
SLIDE 24

Ge and Smyth, AEC/APC XII: 24

Change-point Detection Experimental Results: Plasma Etching

10 20 30 40 50 60 4500 5000 5500 6000 6500 7000 TIM E (s ec onds ) IB4 Interferometry Sensor TIM E A T W H IC H O N LIN E M A R K O V A LG O R ITH M D E TE C TE D C H A N G E E S TIM A TE D C H A N G E P O IN T

slide-25
SLIDE 25

Ge and Smyth, AEC/APC XII: 25

Comparison with Classic SSE Method: Simulated Data

10 20 30 40 50

  • 20

20 40 60 80 100 120 140 TIM E TR U E TIM E O F C H A N G E C H A N G E P O IN T D E TE C TE D B Y M A R K O V M E TH O D C H A N G E P O IN T D E TE C TE D B Y C LA S S IC A L M E TH O D

  • SSE (Sum of Squared Errors) Method: Minimizes the

sum of squared errors when fitting the two segments.

  • 2 linear segments
  • Gaussian noise
slide-26
SLIDE 26

Ge and Smyth, AEC/APC XII: 26

Histograms of Detection Errors Simulated Data, σ

σ σ σ =5

  • 10
  • 8
  • 6
  • 4
  • 2

2 4 6 8 10 50 100 150 200 250

  • 10
  • 8
  • 6
  • 4
  • 2

2 4 6 8 10 100 200 300 DETECTED TIME - TRUE TIME

SSE METHOD

  • Detection Error = DETECTED TIME - TRUE TIME
  • New method has smaller errors.

NEW METHOD

slide-27
SLIDE 27

Ge and Smyth, AEC/APC XII: 27

Detection Errors as a Function of Noise

2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 SIGMA OF ADDITIVE NOISE MODEL RMSE OF DETECTION TIME

SSE METHOD NEW METHOD

slide-28
SLIDE 28

Ge and Smyth, AEC/APC XII: 28

Next ...

  • Problem Statement
  • Segmental Semi-Markov Model
  • Algorithms and Experimental Results
  • Change-point detection
  • Pattern matching
slide-29
SLIDE 29

Ge and Smyth, AEC/APC XII: 29

Build a model from the example pattern

  • Each segment of the piecewise linear representation

corresponds to a state in the model

  • The state duration distribution on state i is a

truncated Gaussian with µ = length of segment i and 3σ = (20% x length of segment i )

States S=1 S=2 S=M Segments

slide-30
SLIDE 30

Ge and Smyth, AEC/APC XII: 30

Pattern Matching Algorithm

  • Given a candidate pattern y

i y i+ 1...y j :

  • Compute the most likely state sequence Si Si+ 1… Sj
  • The pattern matching is successful if
  • Si Si+ 1… Sj = 1… M

Segments States S=1 S=2 S=M

Candidate Pattern Model

slide-31
SLIDE 31

Ge and Smyth, AEC/APC XII: 31

How can we detect a pattern?

E.g., Sliding Window Matching

50 100 150 200 250 300 350 400 200 250 300 350 400 450 AMPLITUDE TIME

slide-32
SLIDE 32

Ge and Smyth, AEC/APC XII: 32

Pre-Pattern and Post-Pattern States

50 100 150 200 250 300 350 400 200 250 300 350 400 450 500 TIME AMPLITUDE

Pre-Pattern State Post-Pattern State

slide-33
SLIDE 33

Ge and Smyth, AEC/APC XII: 33

The Augmented Model with Pre-Pattern and Post-Pattern States

S=1 S=2 S=M S=0 S=M+1

Pre-Pattern State Post-Pattern State

  • Match the whole time series with the augmented

model.

  • No need for repeated sliding window matching.
  • Similar to “keyword spotting” in speech recognition
slide-34
SLIDE 34

Ge and Smyth, AEC/APC XII: 34

Experimental Results for Plasma Etching

slide-35
SLIDE 35

Ge and Smyth, AEC/APC XII: 35

Experimental Results for Plasma Etching

50 100 150 200 250 300 350 400 50 100 150 TIME AMPLITUDE 50 100 150 200 250 300 350 400 100 200 300 400 500 TIME AMPLITUDE

slide-36
SLIDE 36

Ge and Smyth, AEC/APC XII: 36

Comparison with Other Methods

  • Other competing methods:
  • Squared errors based
  • Compute the distance/dissimilarity between two patterns

as the root mean squared error (RMSE)

  • How to set the threshold for on-line pattern

matching?

  • May need preprocessing to allow shifting and scaling in

amplitude.

slide-37
SLIDE 37

Ge and Smyth, AEC/APC XII: 37

  • DTW (Dynamic time warping)
  • Same as the squared errors based method, but allows

time compression and stretching.

  • How to define the edit distances for time

compression and stretching?

slide-38
SLIDE 38

Ge and Smyth, AEC/APC XII: 38

Squared Error-based Method Result

slide-39
SLIDE 39

Ge and Smyth, AEC/APC XII: 39

Dynamic Time Warping Result

slide-40
SLIDE 40

Ge and Smyth, AEC/APC XII: 40

Comparison

  • Our method
  • Similarity results from a probabilistic model
  • Can incorporate prior knowledge and learning
  • DTW, Squared errors
  • Need to define distance (can be difficult!)
  • No principled way to handle prior knowledge and

learning

slide-41
SLIDE 41

Ge and Smyth, AEC/APC XII: 41

Summary

  • Two techniques for plasma etching endpoint detection:
  • Change-point detection
  • Pattern Matching
  • Segmental semi-Markov Models
  • Extension of the standard hidden Markov Models
  • Provide a unified theoretical framework for both change-

point detection and pattern-matching problems.

  • Can incorporate domain-specific knowledge
  • Good experimental results on both real data and

simulated data