MLE 04-09-2019 For Gaussian and Mixture Gaussian Models - - PowerPoint PPT Presentation

mle
SMART_READER_LITE
LIVE PREVIEW

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models - - PowerPoint PPT Presentation

E9 205 Machine Learning for Signal Processing MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy (sriramg@iisc.ac.in) Teaching Assistant - Prachi Singh (prachisingh@iisc.ac.in). Finding the parameters of the


slide-1
SLIDE 1

E9 205 Machine Learning for Signal Processing

04-09-2019

MLE

For Gaussian and Mixture Gaussian Models

Instructor - Sriram Ganapathy (sriramg@iisc.ac.in) Teaching Assistant - Prachi Singh (prachisingh@iisc.ac.in).

slide-2
SLIDE 2

Finding the parameters of the Model

  • The Gaussian model has the following parameters
  • Total number of parameters to be learned for D

dimensional data is

  • Given N data points how do we estimate the

parameters of model.

  • Several criteria can be used
  • The most popular method is the maximum

likelihood estimation (MLE).

slide-3
SLIDE 3

MLE

Define the likelihood function as The maximum likelihood estimator (MLE) is The MLE satisfies nice properties like

  • Consistency (covergence to true value)
  • Efficiency (has the least Mean squared error).
slide-4
SLIDE 4

MLE

For the Gaussian distribution To estimate the parameters

slide-5
SLIDE 5

MLE

Using matrix differentiation rules, for a symmetric matrix Using matrix differentiation rules for log determinant and trace

∂tr(AB) ∂A = B + BT − diag(B)

Sample mean and Sample Covariance

slide-6
SLIDE 6

Gaussian Distribution

Often the data lies in clusters (2-D example) Fitting a single Gaussian model may be too broad.

slide-7
SLIDE 7

Gaussian Distribution

Need mixture models Can fit any arbitrary distribution.

slide-8
SLIDE 8

Gaussian Distribution

1-D example

slide-9
SLIDE 9

Gaussian Distribution Summary

  • The Gaussian model - parametric distributions
  • Simple and useful properties.
  • Can model unimodal (single peak distributions)
  • MLE gives intuitive results
  • Issues with Gaussian model
  • Multi-modal data
  • Not useful for complex data distributions
  • Need for mixture models
slide-10
SLIDE 10

Basics of Information Theory

  • Entropy of distribution
  • KL divergence
  • Jensen’s inequality
  • Expectation Maximization Algorithm for MLE
slide-11
SLIDE 11

Gaussian Mixture Models

A Gaussian Mixture Model (GMM) is defined as The weighting coefficients have the property

slide-12
SLIDE 12

The number of parameters is

Gaussian Mixture Models

  • Properties of GMM
  • Can model multi-modal

data.

  • Identify data clusters.
  • Can model arbitrarily

complex data distributions

The set of parameters for the model are

slide-13
SLIDE 13

MLE for GMM

  • The log-likelihood function over the entire data in this

case will have a logarithm of a summation

  • Solving for the optimal parameters using MLE for

GMM is not straight forward.

  • Resort to the Expectation Maximization (EM) algorithm
slide-14
SLIDE 14

Basics of Information Theory

  • Entropy of distribution
  • KL divergence
  • Jensen’s inequality
  • Expectation Maximization Algorithm for MLE