Gaussian Discriminant Analysis material thanks to Andrew Ng - - PowerPoint PPT Presentation

gaussian discriminant analysis
SMART_READER_LITE
LIVE PREVIEW

Gaussian Discriminant Analysis material thanks to Andrew Ng - - PowerPoint PPT Presentation

Gaussian Discriminant Analysis material thanks to Andrew Ng @Stanford Course Map / module3 module 3: generative methods LEARNING PERFORMANCE REPRESENTATION DATA PROBLEM CLUSTERING RAW DATA EVALUATION FEATURES EM algorithm artificial


slide-1
SLIDE 1

Gaussian Discriminant Analysis

material thanks to Andrew Ng @Stanford

slide-2
SLIDE 2

Course Map / module3

  • Gaussian Discriminant Analysis

RAW DATA artificial data spam data coin flips LABELS FEATURES SUPERVISED LEARNING likelihoods GDA naive bayes graphical models CLUSTERING EVALUATION ANALYSIS SELECTION DIMENSIONS DATA PROCESSING TUNING

DATA PROBLEM REPRESENTATION LEARNING PERFORMANCE

module 3: generative methods

EM algorithm

slide-3
SLIDE 3

Density Estimation Problem

  • P(y|x) = P(y|x1,x2,…,xd) joint (d+1)-dim distribution
  • … actually we cannot estimate this joint
  • if each feature has 10 buckets, and we have

100 features (very reasonable assumptions)

  • then the joint distribution has 10100 cells -

impossible

slide-4
SLIDE 4

how to get around estimating the joint P(x1,x2,…,xd|y) ?

  • SOLUTION: model/restrict the joint, instead of

estimating any possible such joint distribution

  • fore example with a well known parametrized form
  • such as multi-dim gaussian distribution
  • estimate the parameters of the imposed model
  • called Gaussian Discriminant Analysis (when

the model imposed is gaussian)

  • easy to implement due to math tools facilitating

gaussian parameters estimation (mean, covariance)

  • multidim implies “covariance” matrix instead of

simple variance

  • doesnt fit data in many cases
slide-5
SLIDE 5

Gaussian Fit

  • Idea: fit a parametrized

distribution to histogram (density or counts)

  • The gaussian (normal) density

is controlled by mean and variance

  • the best fit is the one that

maximizes likelihood of the data

P(x|µ, σ2) = normal(x, µ, σ2) = 1 σ √ 2π e− (x−µ)2

2σ2

log L = log

m

Y

i=1

P(x|µ, σ2) =

m

X

i=1

logP(x|µ, σ2)

slide-6
SLIDE 6

Lets impose a nice probabilistic model

  • Multi-variate normal

distribution

  • plotted Σ=identity (or

independent variables)

  • θ = (µ, Σ)
slide-7
SLIDE 7

Lets impose a nice probabilistic model

  • Multi-variate normal

distribution

  • plotted Σ=variance only
  • r independent variables
  • Σ =

 1 1

  • Σ =

 0.6 0.6

  • Σ =

 2 2

slide-8
SLIDE 8

Lets impose a nice probabilistic model

  • Multi-variate normal

distribution

  • plotted Σ≠identity
  • dependent variables
slide-9
SLIDE 9

Lets impose a nice probabilistic model

  • Multi-variate normal

distribution

  • Σ≠identity=>dependent

variables

slide-10
SLIDE 10

GDA Setup

  • multi normal density estimation for each y

(common Σ)

  • log likelihood
slide-11
SLIDE 11

GDA parameter solution

  • max likelihood for GDA has close form solution!
  • can be derived using differentials
  • estimate mean for each class
  • estimate covariance for entire training set
  • r separately for each class
  • no need for Gradient Descent or other optimizers
slide-12
SLIDE 12

GDA visual classification

  • if common Σ, the

two gaussians are identical except for the mean

  • the separation is

a line of equidistant points to the two means