gaussian discriminant analysis
play

Gaussian Discriminant Analysis material thanks to Andrew Ng - PowerPoint PPT Presentation

Gaussian Discriminant Analysis material thanks to Andrew Ng @Stanford Course Map / module3 module 3: generative methods LEARNING PERFORMANCE REPRESENTATION DATA PROBLEM CLUSTERING RAW DATA EVALUATION FEATURES EM algorithm artificial


  1. Gaussian Discriminant Analysis material thanks to Andrew Ng @Stanford

  2. Course Map / module3 module 3: generative methods LEARNING PERFORMANCE REPRESENTATION DATA PROBLEM CLUSTERING RAW DATA EVALUATION FEATURES EM algorithm artificial data spam data ANALYSIS SUPERVISED coin flips SELECTION LEARNING LABELS likelihoods TUNING GDA DIMENSIONS DATA naive bayes PROCESSING graphical models • Gaussian Discriminant Analysis

  3. Density Estimation Problem • P(y|x) = P(y|x 1 ,x 2 ,…,x d ) joint (d+1)-dim distribution • … actually we cannot estimate this joint • if each feature has 10 buckets, and we have 100 features (very reasonable assumptions) • then the joint distribution has 10 100 cells - impossible

  4. how to get around estimating the joint P(x 1 ,x 2 ,…,x d |y) ? • SOLUTION: model/restrict the joint, instead of estimating any possible such joint distribution - fore example with a well known parametrized form - such as multi-dim gaussian distribution - estimate the parameters of the imposed model • called Gaussian Discriminant Analysis (when the model imposed is gaussian) - easy to implement due to math tools facilitating gaussian parameters estimation (mean, covariance) - multidim implies “covariance” matrix instead of simple variance - doesnt fit data in many cases

  5. Gaussian Fit - Idea: fit a parametrized distribution to histogram (density or counts) � - The gaussian (normal) density is controlled by mean and variance � 1 2 π e − ( x − µ )2 P ( x | µ, σ 2 ) = normal ( x, µ, σ 2 ) = 2 σ 2 √ � σ � - the best fit is the one that maximizes likelihood of the data m m Y X P ( x | µ, σ 2 ) = logP ( x | µ, σ 2 ) log L = log i =1 i =1

  6. Lets impose a nice probabilistic model • Multi-variate normal θ = ( µ, Σ ) distribution � � � - plotted Σ =identity (or independent variables) � � � � �

  7. Lets impose a nice probabilistic model • Multi-variate normal distribution � � - plotted Σ =variance only or independent variables � � � � �  2  1  0 . 6 � � � 0 0 0 Σ = Σ = Σ = 0 2 0 1 0 0 . 6

  8. Lets impose a nice probabilistic model • Multi-variate normal distribution � � � - plotted Σ≠ identity - dependent variables � � � � �

  9. Lets impose a nice probabilistic model • Multi-variate normal distribution � � - Σ≠ identity=>dependent variables � � � � �

  10. GDA Setup • multi normal density estimation for each y (common Σ ) � � � � • log likelihood

  11. GDA parameter solution • max likelihood for GDA has close form solution! • can be derived using differentials - estimate mean for each class - estimate covariance for entire training set - or separately for each class - no need for Gradient Descent or other optimizers

  12. GDA visual classification • if common Σ , the two gaussians are identical except for the mean � • the separation is a line of equidistant points to the two means

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend