generative models and na ve bayes
play

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], - PowerPoint PPT Presentation

Generative Models and Nave Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB] COMP24111 Machine Learning Outline Background and Probability Basics Probabilistic Classification Principle Probabilistic discriminative


  1. Generative Models and Naïve Bayes Ke Chen Reading: [14.3, EA], [3.5, KPM], [1.5.4, CMB] COMP24111 Machine Learning

  2. Outline • Background and Probability Basics • Probabilistic Classification Principle – Probabilistic discriminative models – Generative models and their application to classification – MAP and converting generative into discriminative • Naïve Bayes – an generative model – Principle and Algorithms (discrete vs. continuous) – Example: Play Tennis • Zero Conditional Probability and Treatment • Summary 2 COMP24111 Machine Learning

  3. Background • There are three methodologies: a ) Model a classification rule directly Examples: k-NN, linear classifier, SVM , neural nets, .. b ) Model the probability of class memberships given input data Examples: logistic regression, probabilistic neural nets (softmax),… c ) Make a probabilistic model of data within each class Examples: naive Bayes, model-based …. • Important ML taxonomy for learning models probabilistic models vs non-probabilistic models discriminative models vs generative models 3 COMP24111 Machine Learning

  4. Background • Based on the taxonomy, we can see different the essence of learning models (classifiers) more clearly. Probabilistic Non-Probabilistic • Logistic Regression • K-nn • Probabilistic neural nets • Linear classifier Discriminative • …….. • SVM • Neural networks • …… • Naïve Bayes • Model-based (e.g., GMM) N.A. (?) Generative • …… 4 COMP24111 Machine Learning

  5. Probability Basics • Prior, conditional and joint probability for random variables P ( x ) – Prior probability: , P ( x | x ) P(x | x ) – Conditional probability: 1 2 2 1 = = x ( x , x ), P ( x ) P(x ,x ) – Joint probability: 1 2 1 2 = = P(x 1 ,x ) P ( x | x ) P ( x ) P ( x | x ) P ( x ) – Relationship: 2 2 1 1 1 2 2 – Independence: = = = P ( x | x ) P ( x ), P ( x | x ) P ( x ), P(x ,x ) P ( x ) P ( x ) 2 1 2 1 2 1 1 2 1 2 • Bayesian Rule × P ( x | c ) P ( c ) Likelihood Prior = = P ( c | x ) Posterior P ( x ) Evidence Discriminative Generative 5 COMP24111 Machine Learning

  6. Probabilistic Classification Principle • Establishing a probabilistic model for classification Discriminative model – = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ , P ( c | x ) c c , , c x (x , , x ) 1 L 1 n P ( 1 x c | ) P ( c 2 x | ) P ( c | x ) • To train a discriminative classifier L • • • regardless its probabilistic or non- probabilistic nature , all training examples of different classes must Discriminative be jointly used to build up a single Probabilistic Classifier discriminative classifier. • Output L probabilities for L class labels in a probabilistic classifier • • • while a single label is achieved by x x x n 1 2 a non-probabilistic classifier . = ⋅ ⋅ ⋅ x ( x , x , , x ) 1 2 n 6 COMP24111 Machine Learning

  7. Probabilistic Classification Principle • Establishing a probabilistic model for classification (cont.) Generative model (must be probabilistic) – = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ P ( x | c ) c c , , c , x (x , , x ) 1 L 1 n • L probabilistic models have P x ( | c ) P ( x | c ) 1 L to be trained independently Generative Generative • Each is trained on only the • • • Probabilistic Model Probabilistic Model examples of the same label for Class 1 for Class L • Output L probabilities for a • • • • • • given input with L models x x x x x x 1 2 n 1 2 n • “Generative” means that = ⋅ ⋅ ⋅ such a model produces data x ( x , x , , x ) 1 2 n subject to the distribution via sampling. 7 COMP24111 Machine Learning

  8. Probabilistic Classification Principle M aximum A P osterior ( MAP ) classification rule • For an input x , find the largest one from L probabilities output by – P ( 1 c | x ) , ..., P ( c | x ). a discriminative probabilistic classifier L ( * x Assign x to label c* if is the largest. – P c | ) • Generative classification with the MAP rule – Apply Bayesian rule to convert them into posterior probabilities P ( x | c ) P ( c ) = ∝ i i P ( c | x ) P ( x | c ) P ( c ) i i i Common factor for P ( x ) all L probabilities = ⋅ ⋅⋅ for i 1 , 2 , , L – Then apply the MAP rule to assign a label 8 COMP24111 Machine Learning

  9. Naïve Bayes • Bayes classification ∝ = ⋅ ⋅ ⋅ = P ( c | ) P ( | c ) P ( c ) P ( x , , x | c ) P ( c ) for c c ,..., c . x x 1 n 1 L ⋅ ⋅ ⋅ P ( 1 x , , x | c ) Difficulty: learning the joint probability is infeasible! n • Naïve Bayes classification – Assume all input features are class conditionally independent! ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ P ( x , x , , x | c ) P ( x | x , , x , c ) P ( x , , x | c ) 1 2 n 1 2 n 2 n = ⋅ ⋅ ⋅ Applying the P ( x | c ) P ( x , , x | c ) 1 2 n independence = ⋅ ⋅ ⋅ P ( x | c ) P ( x | c ) P ( x | c ) assumption 1 2 n = ⋅ ⋅ ⋅ ' ( a , a , , a ) x Apply the MAP classification rule: assign to c* if – 1 2 n ⋅ ⋅ ⋅ > ⋅ ⋅ ⋅ ≠ = ⋅ ⋅ ⋅ * * * * [ P ( a | c ) P ( a | c )] P ( c ) [ P ( a | c ) P ( a | c )] P ( c ), c c , c c , , c 1 n 1 n 1 L ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ esitmate of P ( a , , a | c ) * estimate of P ( a , , a | c ) 1 n 1 n 9 COMP24111 Machine Learning

  10. Naïve Bayes = ⋅ ⋅ ⋅ For each target va lue of c (c c , , c ) i i 1 L ˆ ← P ( c ) estimate P ( c ) with examples in S ; i i = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ For every feature value x of each feature x ( j 1 , , F ; k 1 , , N ) jk j j ˆ = ← P x x c estimate P x c with examples in S; ( | ) ( | ) j jk i jk i ′ ′ = ⋅ ⋅ ⋅ x ' ( a , , a ) 1 n ′ ′ ′ ′ ˆ ⋅ ⋅ ⋅ ˆ ˆ > ˆ ⋅ ⋅ ⋅ ˆ ˆ ≠ = ⋅ ⋅ ⋅ * * * * [ P ( a | c ) P ( a | c )] P ( c ) [ P ( a | c ) P ( a | c )] P ( c ), c c , c c , , c 1 n 1 i n i i i i 1 L 10 COMP24111 Machine Learning

  11. Example • Example: Play Tennis 11 COMP24111 Machine Learning

  12. Example • Learning Phase Outlook Play= Yes Play= No Temperature Play= Yes Play= No Sunny Hot 2/9 3/5 2/9 2/5 Overcast Mild 4/9 0/5 4/9 2/5 Rain Cool 3/9 2/5 3/9 1/5 Humidity Play= Yes Play=N o Wind Play= Yes Play= No Strong High 3/9 3/5 3/9 4/5 Weak Normal 6/9 2/5 6/9 1/5 P (Play =Yes) = 9/14 P (Play =No) = 5/14 12 COMP24111 Machine Learning

  13. Example • Test Phase – Given a new instance, predict its label x ’=(Outlook= Sunny, Temperature= Cool, Humidity =High, Wind= Strong ) – Look up tables achieved in the learning phrase P(Outlook=S unny |Play= No ) = 3/5 P(Outlook= Sunny |Play= Yes ) = 2/9 P(Temperature= Cool |Play= =No ) = 1/5 P(Temperature= Cool |Play= Yes ) = 3/9 P(Huminity= High |Play= No ) = 4/5 P(Huminity= High |Play= Yes ) = 3/9 P(Wind= Strong |Play= No ) = 3/5 P(Wind= Strong |Play= Yes ) = 3/9 P(Play= No ) = 5/14 P(Play= Yes ) = 9/14 – Decision making with the MAP rule P( Yes | x ’) ≈ [P( Sunny |Y es )P( Cool | Yes )P( High |Y es )P( Strong | Yes )]P(Play= Yes ) = 0.0053 P( No | x ’) ≈ [P( Sunny |N o ) P( Cool |N o )P( High | No )P( Strong | No )]P(Play= No ) = 0.0206 Given the fact P( Yes | x ’) < P( No | x ’), we label x ’ to be “ No ”. 13 COMP24111 Machine Learning

  14. Naïve Bayes • Algorithm: Continuous-valued Features – Numberless values taken by a continuous-valued feature – Conditional probability often modeled with the normal distribution   − µ 2 ( x ) 1   ˆ = − j ji P ( x | c ) exp   j i σ π σ 2 2 2   ji ji µ = : mean (avearage) of feature values x of examples for which c c ji j i σ = : standard deviation of feature values x of examples for which c c ji j i = ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ for ( X , , X ), C c , , c X Learning Phase: – 1 F 1 L F × = = ⋅ ⋅ ⋅ L P ( C c ) i 1 , , L Output: normal distributions and i ′ ′ ′ = ⋅ ⋅ ⋅ X ( 1 a , , a ) – Test Phase: Given an unknown instance n • Instead of looking-up tables, calculate conditional probabilities with all the normal distributions achieved in the learning phrase • Apply the MAP rule to assign a label (the same as done for the discrete case) 14 COMP24111 Machine Learning

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend