Machine Learning Learning HMMs A Hidden Markov model A set of - PowerPoint PPT Presentation

10-701 Machine Learning Learning HMMs

A Hidden Markov model • A set of states {s 1 … s n } - In each time point we are in exactly one of these states denoted by q t •  i , the probability that we start at state s i • A transition probability model, P(q t = s i | q t-1 = s j ) • A set of possible outputs  - At time t we emit a symbol  • An emission probability model, p(o t =  | s i ) 0.8 0.8 0.2 A B 0.5 0.5 0.2

Inference in HMMs  • Computing P(Q) and P(q t = s i )  • Computing P(Q | O) and P(q t = s i |O)  • Computing argmax Q P(Q)

P1= P(O100=A, O101=B, O102=A, O103=B) for HMM1 P2= P(O100=A, O101=B, O102=A, O103=B) for HMM2. HMM 1 0.5 0.5 0.5 0.5 0.5 B A 0.5 HMM 2 0.8 0.8 0.2 0.5 0.5 B A 0.2

Learning HMMs • Until now we assumed that the emission and transition probabilities are known • This is usually not the case - How is “AI” pronounced by different individuals? - What is the probability of hearing “class” after “AI”? While we will discuss learning the transition and emission models, we will not discuss selecting the states. This is usually a function of domain knowledge.

Example • Assume the model below • We also observe the following sequence: 1,2,2,5,6,5,1,2,3,3,5,3,3,2 ….. • How can we determine the initial, transition and emission probabilities? A B

Initial probabilities Q: assume we can observe the following sets of states: AAABBAA AABBBBB BAABBAB how can we learn the initial probabilities? k is the number of sequences avialable for A: Maximum likelihood estimation training Find the initial probabilities  such that T       * arg max ( ) ( | ) q p q q   1 1 t t  2 k t     * arg max ( ) q  1 k  A = #A/ (#A+#B) A B

Transition probabilities Q: assume we can observe the set of states: AAABBAAAABBBBBAAAABBBB how can we learn the transition probabilities? A: Maximum likelihood estimation remember that we Find a transition matrix a such that defined a i,j =p(q t =s j |q t-1 =s i ) T  ( q 1 )  a *  argmax a   p ( q t | q t  1 ) t  2 k T  a *  argmax a p ( q t | q t  1 ) t  2 a A,B = #AB / (#AB+#AA) A B ฀

Emission probabilities Q: assume we can observe the set of states: A A A B B A A A A B B B B B A A and the set of dice values 1 2 3 5 6 3 2 1 1 3 4 5 6 5 2 3 how can we learn the emission probabilities? A: Maximum likelihood estimation b A (5)= #A5 / (#A1+#A2 + … +#A6) A B

Learning HMMs • In most case we do not know what states generated each of the outputs (fully unsupervised) • … but had we known, it would be very easy to determine an emission and transition model! • On the other hand, if we had such a model we could determine the set of states using the inference methods we discussed

Expectation Maximization (EM) • Appropriate for problems with ‘missing values’ for the variables. • For example, in HMMs we usually do not observe the states

Expectation Maximization (EM): Quick reminder • Two steps • E step: Fill in the expected values for the missing variables • M step: Regular maximum likelihood estimation (MLE) using the values computed in the E step and the values of the other variables • Guaranteed to converge (though only to a local minima). expected values for (missing) variables M step E step parameters

Forward-Backward • We already defined a forward looking variable  t ( i )  P ( O O t  q t  s i ) 1 • We also need to define a backward looking variable      ( ) ( , , | ) i P O O s i 1 t t T t ฀

Forward-Backward • We already defined a forward looking variable  t ( i )  P ( O O t  q t  s i ) 1 • We also need to define a backward looking variable      ( ) ( , , | ) i P O O q s  1 t t T t i ฀   ( ) ( ) a b O j   , 1 1 i j j t t j

Forward-Backward • We already defined a forward looking variable  t ( i )  P ( O O t  q t  s i ) 1 • We also need to define a backward looking variable      ( ) ( , , | ) i P O O q s 1 t t T t i ฀ • Using these two definitions we can show P(A|B)=P(A,B)/P(B)   ( ) ( ) def i i     t t ( | , , ) ( ) P q s O O S i    1 t i T t ( ) ( ) j j t t j

State and transition probabilities • Probability of a state   ( ) ( ) def i i     t t ( | , , ) ( ) P q s O O S i    1 t i T t ( ) ( ) j j t t j • We can also derive a transition probability     ( , | , , ) ( , ) P q s q s o o S i j  1 1 t i t j T t     ( , | , , ) P q s q s o o  1 1 t i t j T      ( ) ( | ) ( | ) ( ) i P q s q s P o q s j def       1 1 1 1 t t j t i t t j t ( , ) S i j    t ( ) ( ) j j t t j

E step • Compute S t (i) and S t (i,j) for all t, i, and j ( 1≤t≤n , 1≤i≤k , 2≤j≤k )    ( | , , ) ( ) P q s O O S i 1 t i T t     ( , | , , ) ( , ) P q s q s o o S i j  1 1 t i t j T t

M step (1) Compute transition probabilities: ˆ ( , ) n i j  a  , i j ˆ ( , ) n i k k where   ˆ ( , ) ( , ) n i j S i j t t

M step (2) Compute emission probabilities (here we assume a multinomial distribution): define:   ( ) ( ) B j S k k t  | t o j t then ( ) B j  ( ) k b j  k ( ) B i k i

Complete EM algorithm for learning the parameters of HMMs (Baum-Welch) • Inputs: 1 .Observations O 1 … O T 2. Number of states, model 1. Guess initial transition and emission parameters 2. Compute E step: S t (i) and S t (i,j) 3. Compute M step No 4. Convergence? 5. Output complete model We did not discuss initial probability estimation. These can be deduced from multiple sets of observation (for example, several recorded customers for speech processing)

Building HMMs – Topology Deletion states Matching states Insertion states No of matching states = average sequence length in the family PFAM Database - of Protein families ( http://pfam.wustl.edu)

Building – from an existing alignment ACA - - - ATG TCA ACT ATC ACA C - - AGC AGA - - - ATC ACC G - - ATC insertion Transition probabilities Output Probabilities A HMM model for a DNA motif alignments, The transitions are shown with arrows whose thickness indicate their probability. In each state, the histogram shows the probabilities of the four bases.

Machine Learning Learning HMMs A Hidden Markov model A set of - PowerPoint PPT Presentation

10-701 Machine Learning Learning HMMs A Hidden Markov model A set of states {s 1 s n } - In each time point we are in exactly one of these states denoted by q t i , the probability that we start at state s i A transition

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Matrix-Free Preconditioning in Online Learning Ashok Cutkosky, Tamas Sarlos Google Research

Practical Bioinformatics Mark Voorhies 4/10/2018 Mark Voorhies Practical Bioinformatics

The future landscape for Heart Failure Adriaan Voors, UMCG University Medical Center Groningen

Major Adverse Outcomes in Patients with Atrial Fibrillation: The AFFIRM Study Marco Proietti,

Drafting Your Smoke-Free Law Doug Blanke June 2-4, 2010 Washington, D.C. The Tobacco Control

Selecting and Using Views To Compute Aggregate Queries Foto Afrati (NTUA Greece) and Rada

Non-linear interlinkages and key objectives amongst the Paris Agreement and the Sustainable

Ambedkar University Delhi Work participation rates of rural women in 15 major states of India (per