robust hidden markov models inference in the presence of
play

Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay February 7, 2014 What is Machine Learning ? What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set


  1. Robust Hidden Markov Models Inference in the Presence of Label Noise Benoît Frénay February 7, 2014

  2. What is Machine Learning ?

  3. What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

  4. What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

  5. What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

  6. Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

  7. Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

  8. Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

  9. Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

  10. Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

  11. Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

  12. What does it Mean for a Machine to Learn ? Machine learning studies how machine can learn automatically . Learning means to find a model of data . Three steps : specify a type of model (e.g. a linear model) specify a criterion (e.g. mean square error) find the best model w.r.t. the criterion

  13. Example of Learning Process: Linear Regression Model : linear model f ( x 1 , . . . , x n ) = w 1 x 1 + · · · + w d x d + w 0 Criterion : mean square error n � ( y i − f ( x 1 , . . . , x n )) 2 i = 1 Algorithm : linear regression n � − 1 X ′ y ( y i − f ( x 1 , . . . , x n )) 2 = � X ′ X � w = arg min w i = 1

  14. Overview of the Presentation Segmentation of electrocardiogram signals :

  15. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease

  16. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform

  17. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors)

  18. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour

  19. Overview of the Presentation Segmentation of electrocardiogram signals :

  20. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease

  21. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform

  22. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors)

  23. Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour

  24. Electrocardiogram Signal Segmentation

  25. What is an Electrocardiogram Signal ? An ECG is a measure of the electrical activity of the human heart . Patterns of interest: P wave, QRS complex, T wave, baseline.

  26. Where Does it Come from ? The ECG results from the superposition of several signals.

  27. What it Looks Like in Real-World Cases Real ECGs are polluted by various sources of noise .

  28. What is our Goal in ECG Segmentation ? Task : split/segment an entire ECG into patterns . Available data : a few manual segmentations from experts. Issue : some of the annotations of the experts are incorrect . Probabilistic model of sequences with labels hidden Markov Models (with wavelet transform)

  29. Hidden Markov Models

  30. Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 )

  31. Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 ) O 1 , . . . , O T is the sequence of observations (ex.: measured voltage). P ( O t = o t | S t = s t )

  32. Hypotheses Behind Hidden Markov Models (1) Markov hypothesis : the next state only depend on the current state.

  33. Hypotheses Behind Hidden Markov Models (2) Observations are conditionally independent w.r.t. the hidden states: P ( O 1 , . . . , O T | S 1 , . . . , S T ) = � T t = 1 P ( O t | S t )

  34. Learning Hidden Markov Models Learning an HMM means to estimate probabilities: P ( S t ) are prior probabilities P ( S t | S t − 1 ) are transition probabilities P ( O t | S t ) are emission probabilities . Parameters Θ = ( q , a , b ) : q i is the prior of state i a ij is the transition probability from state i to state j b i is the observation distributions for state i

  35. Standard Inference Algorithms for HMMs Supervised learning : assumes the observed labels are correct ; maximises the likelihood P ( S , O | Θ) ; learns the correct concepts; sensitive to label noise. Baum-Welch algorithm: unsupervised , i.e. observed labels are discarded; iteratively (i) label samples and (ii) learn a model; may learn concepts which differs significantly; theoretically insensitive to label noise.

  36. Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) .

  37. Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) . Transition probabilities P ( S t | S t − 1 ) are estimated by counting a ij = #( transitions from i to j ) / #( transitions from i ) Emission probabilities P ( O t | S t ) are obtained by PDF estimation standard models in ECG analysis: Gaussian mixture models (GMMs)

  38. Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) .

  39. Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) . Non-convex function to optimise: � T T � � � � log P ( O | Θ) = log q s 1 a s t − 1 s t b s t ( o t ) S t = 2 t = 1 Solution: expectation-maximisation algorithm (a.k.a. Baum-Welch).

  40. Unsupervised Learning for Hidden Markov Models (2) The log-likelihood is intractable, but what about a convex lower bound ? Source: Pattern Recognition and Machine Learning, C. Bishop, 2006. Two steps : find a tractable lower bound maximise this lower bound w.r.t. Θ

  41. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S

  42. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S

  43. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S

  44. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S

  45. Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S Best lower bound with q ( S ) = P ( S | O , Θ) .

  46. The Expectation-Maximisation / Baum-Welch Algorithm Expectation step : estimate the posteriors γ t ( i ) = P ( S t = i | O , Θ old ) ǫ t ( i , j ) = P ( S t − 1 = i , S t = j | O , Θ old )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend