introduction to machine learning classification naive
play

Introduction to Machine Learning Classification: Naive Bayes - PowerPoint PPT Presentation

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the idea of Naive 10 Response Bayes x2 a b 5 Understand in which sense 0 Naive Bayes is a special QDA 0 5 10 15 x1 model NAIVE BAYES


  1. Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the idea of Naive 10 Response Bayes x2 a b 5 Understand in which sense 0 Naive Bayes is a special QDA 0 5 10 15 x1 model

  2. NAIVE BAYES CLASSIFIER NB is a generative multiclass technique. Remember: We use Bayes’ theorem and only need p ( x | y = k ) to compute the posterior as: π k ( x ) = P ( y = k | x ) = P ( x | y = k ) P ( y = k ) p ( x | y = k ) π k = g P ( x ) � p ( x | y = j ) π j j = 1 NB is based on a simple conditional independence assumption : the features are conditionally independent given class y . p � p ( x | y = k ) = p (( x 1 , x 2 , ..., x p ) | y = k ) = p ( x j | y = k ) . j = 1 So we only need to specify and estimate the distribution p ( x j | y = k ) , which is considerably simpler as this is univariate. � c Introduction to Machine Learning – 1 / 5

  3. NB: NUMERICAL FEATURES We use a univariate Gaussian for p ( x j | y = k ) , and estimate ( µ j , σ 2 j ) in p the standard manner. Because of p ( x | y = k ) = � p ( x j | y = k ) , the j = 1 joint conditional density is Gaussian with diagonal but non-isotropic covariance structure, and potentially different across classes. Hence, NB is a (specific) QDA model, with quadratic decision boundary. 15 10 Response x2 a b 5 0 0 5 10 15 x1 � c Introduction to Machine Learning – 2 / 5

  4. NB: CATEGORICAL FEATURES We use a categorical distribution for p ( x j | y = k ) and estimate the probabilities p kjm that, in class k , our j -th feature has value m , x j = m , simply by counting the frequencies. p [ x j = m ] � p ( x j | y = k ) = kjm m Because of the simple conditional independence structure it is also very easy to deal with mixed numerical / categorical feature spaces. � c Introduction to Machine Learning – 3 / 5

  5. LAPLACE SMOOTHING If a given class and feature value never occur together in the training data, then the frequency-based probability estimate will be zero. This is problematic because it will wipe out all information in the other probabilities when they are multiplied. A simple numerical correction is to set these zero probabilities to a small value to regularize against this case. � c Introduction to Machine Learning – 4 / 5

  6. NAIVE BAYES: APPLICATION AS SPAM FILTER In the late 90s, Naive Bayes became popular for e-mail spam filter programs Word counts were used as features to detect spam mails (e.g., "Viagra" often occurs in spam mail) Independence assumption implies: occurrence of two words in mail is not correlated Seems naive ("Viagra" more likely to occur in context with "Buy now" than "flower"), but leads to less required parameters and therefore better generalization, and often works well in practice. � c Introduction to Machine Learning – 5 / 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend