Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benoît Frénay February 7, 2014

What is Machine Learning ?

What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set to make predictions .

Examples of Tasks: Regression Example : predict children weight from anthropometric measures.

Examples of Tasks: Classification Examples : disease diagnosis, spam filtering, image classification.

What does it Mean for a Machine to Learn ? Machine learning studies how machine can learn automatically . Learning means to find a model of data . Three steps : specify a type of model (e.g. a linear model) specify a criterion (e.g. mean square error) find the best model w.r.t. the criterion

Example of Learning Process: Linear Regression Model : linear model f ( x 1 , . . . , x n ) = w 1 x 1 + · · · + w d x d + w 0 Criterion : mean square error n � ( y i − f ( x 1 , . . . , x n )) 2 i = 1 Algorithm : linear regression n � − 1 X ′ y ( y i − f ( x 1 , . . . , x n )) 2 = � X ′ X � w = arg min w i = 1

Overview of the Presentation Segmentation of electrocardiogram signals :

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors)

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour

Overview of the Presentation Segmentation of electrocardiogram signals :

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors)

Overview of the Presentation Segmentation of electrocardiogram signals : goal: allow automated diagnosis of heart disease tools: hidden Markov models and wavelet transform issue: robustness to label noise (i.e. expert errors) solution: modelling of expert behaviour

Electrocardiogram Signal Segmentation

What is an Electrocardiogram Signal ? An ECG is a measure of the electrical activity of the human heart . Patterns of interest: P wave, QRS complex, T wave, baseline.

Where Does it Come from ? The ECG results from the superposition of several signals.

What it Looks Like in Real-World Cases Real ECGs are polluted by various sources of noise .

What is our Goal in ECG Segmentation ? Task : split/segment an entire ECG into patterns . Available data : a few manual segmentations from experts. Issue : some of the annotations of the experts are incorrect . Probabilistic model of sequences with labels hidden Markov Models (with wavelet transform)

Hidden Markov Models

Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 )

Hidden Markov Models in a Nutshell Hidden Markov models (HMMs) are probabilistic models of sequences. S 1 , . . . , S T is the sequence of annotations (ex.: state of the heart). P ( S t = s t | S t − 1 = s t − 1 ) O 1 , . . . , O T is the sequence of observations (ex.: measured voltage). P ( O t = o t | S t = s t )

Hypotheses Behind Hidden Markov Models (1) Markov hypothesis : the next state only depend on the current state.

Hypotheses Behind Hidden Markov Models (2) Observations are conditionally independent w.r.t. the hidden states: P ( O 1 , . . . , O T | S 1 , . . . , S T ) = � T t = 1 P ( O t | S t )

Learning Hidden Markov Models Learning an HMM means to estimate probabilities: P ( S t ) are prior probabilities P ( S t | S t − 1 ) are transition probabilities P ( O t | S t ) are emission probabilities . Parameters Θ = ( q , a , b ) : q i is the prior of state i a ij is the transition probability from state i to state j b i is the observation distributions for state i

Standard Inference Algorithms for HMMs Supervised learning : assumes the observed labels are correct ; maximises the likelihood P ( S , O | Θ) ; learns the correct concepts; sensitive to label noise. Baum-Welch algorithm: unsupervised , i.e. observed labels are discarded; iteratively (i) label samples and (ii) learn a model; may learn concepts which differs significantly; theoretically insensitive to label noise.

Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) .

Supervised Learning for Hidden Markov Models Supervised: uses annotations , which are assumed to be reliable . � T � T Maximises the likelihood P ( S , O | Θ) = q s 1 t = 2 a s t − 1 s t t = 1 b s t ( o t ) . Transition probabilities P ( S t | S t − 1 ) are estimated by counting a ij = #( transitions from i to j ) / #( transitions from i ) Emission probabilities P ( O t | S t ) are obtained by PDF estimation standard models in ECG analysis: Gaussian mixture models (GMMs)

Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) .

Unsupervised Learning for Hidden Markov Models (1) Unsupervised : uses only observations, guesses hidden states . Maximises the likelihood P ( O | Θ) = � S P ( S , O | Θ) . Non-convex function to optimise: � T T � � � � log P ( O | Θ) = log q s 1 a s t − 1 s t b s t ( o t ) S t = 2 t = 1 Solution: expectation-maximisation algorithm (a.k.a. Baum-Welch).

Unsupervised Learning for Hidden Markov Models (2) The log-likelihood is intractable, but what about a convex lower bound ? Source: Pattern Recognition and Machine Learning, C. Bishop, 2006. Two steps : find a tractable lower bound maximise this lower bound w.r.t. Θ

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S

Unsupervised Learning for Hidden Markov Models (3) Idea: use Jensen inequality to find a lower bound to the log-likelihood. � log P ( O | Θ) = log P ( S , O | Θ) S q ( S ) P ( S , O | Θ) � = log q ( S ) S q ( S ) log P ( S , O | Θ) � ≥ q ( S ) S q ( S ) log P ( S | O , Θ) � = + const q ( S ) S Best lower bound with q ( S ) = P ( S | O , Θ) .

The Expectation-Maximisation / Baum-Welch Algorithm Expectation step : estimate the posteriors γ t ( i ) = P ( S t = i | O , Θ old ) ǫ t ( i , j ) = P ( S t − 1 = i , S t = j | O , Θ old )

Robust Hidden Markov Models Inference in the Presence of Label Noise - PowerPoint PPT Presentation

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay February 7, 2014 What is Machine Learning ? What is Machine Learning ? Machine learning is about learning from data . A model is inferred from a training set

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Robust Hidden Markov Models Inference in the Presence of Label Noise Benot Frnay 25 August

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

Sequential Data Oliver Schulte - CMPT 726 Bishop PRML Ch. 13 Russell and Norvig, AIMA Hidden

Graphical Models - Part II Oliver Schulte - CMPT 726 Bishop PRML Ch. 8 Markov Random Fields

CS480/680 Machine Learning Lecture 5: January 21 st , 2020 Information Theory Zahra Sheikhbahaee

Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

Algorithms for Distributed Functional Monitoring

Wasserstein barycenters over Riemannian manifolds Brendan Pass (joint work with Y.H. Kim (UBC))

H Filtering of Uncertain LPV Systems with Time-Delay C.Briat, O.Sename and JF.Lafay August

Probability, Entropy, and Inference Ensemble X is a triple ( x, A X , P X ) , where Based on

Todays exercises 5.17: Football Pools 5.18: Cells of Line and Hyperplane Arrangements

Basic Definitions and Facts Iftach Haitner Tel Aviv University. October 28, 2014 Iftach Haitner