naive bayes and gaussian bayes classifier
play

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides - PowerPoint PPT Presentation

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule: p ( t | x ) = p ( x | t ) p ( t ) p (


  1. Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21

  2. Naive Bayes Bayes‘ Rule: p ( t | x ) = p ( x | t ) p ( t ) p ( x ) Naive Bayes Assumption: D ∏ p ( x | t ) = p ( x j | t ) j =1 Likelihood function: L ( θ ) = p ( x , t | θ ) = p ( x | t , θ ) p ( t | θ ) Naive Bayes and Gaussian Bayes Classifier February 22, 2016 2 / 21

  3. Example: Spam Classification Each vocabulary is one feature dimension. We encode each email as a feature vector x ∈ { 0 , 1 } | V | x j = 1 iff the vocabulary x j appears in the email. We want to model the probability of any word x j appearing in an email given the email is spam or not. Example: $10,000, Toronto, Piazza, etc. Idea: Use Bernoulli distribution to model p ( x j | t ) Example: p (“$10 , 000” | spam) = 0 . 3 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 3 / 21

  4. Bernoulli Naive Bayes Assuming all data points x ( i ) are i.i.d. samples, and p ( x j | t ) follows a Bernoulli distribution with parameter µ jt D x ( i ) jt ( i ) (1 − µ jt ( i ) ) (1 − x ( i ) ∏ ) p ( x ( i ) | t ( i ) ) = j µ j j =1 N N D x ( i ) jt ( i ) (1 − µ jt ( i ) ) (1 − x ( i ) ) ∏ p ( t ( i ) ) p ( x ( i ) | t ( i ) ) = ∏ p ( t ( i ) ) ∏ j p ( t | x ) ∝ µ j i =1 i =1 j =1 where p ( t ) = π t . Parameters π t , µ jt can be learnt using maximum likelihood. Naive Bayes and Gaussian Bayes Classifier February 22, 2016 4 / 21

  5. Derivation of maximum likelihood estimator (MLE) θ = [ µ, π ] log L ( θ ) = log p ( x , t | θ )   N D x ( i ) log µ jt ( i ) + (1 − x ( i ) ∑ ∑ =  log π t ( i ) + j ) log(1 − µ jt ( i ) ) j  i =1 j =1 Want: arg max θ log L ( θ ) subject to ∑ k π k = 1 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 5 / 21

  6. Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ    x ( i ) 1 − x ( i ) N ∂ log L ( θ ) ( t ( i ) = k ) j j ∑  = 0 = 0 ⇒ − 1 ∂µ jk µ jk 1 − µ jk i =1 N ( t ( i ) = k ) [ ( ) ] x ( i ) 1 − x ( i ) ∑ j (1 − µ jk ) − µ jk = 0 1 j i =1 N N ( t ( i ) = k ) ( t ( i ) = k ) x ( i ) ∑ ∑ µ jk = 1 1 j i =1 i =1 t ( i ) = k x ( i ) ∑ N ( ) i =1 1 j µ jk = ∑ N t ( i ) = k ( ) i =1 1 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 6 / 21

  7. Derivation of maximum likelihood estimator (MLE) Use Lagrange multiplier to derive π ) 1 N ∂ L ( θ ) + λ∂ ∑ κ π κ ( t ( i ) = k ) ∑ = 0 ⇒ λ = − 1 ∂π k ∂π k π k i =1 t ( i ) = k ) ∑ N ( ) i =1 1 π k = − λ Apply constraint: ∑ k π k = 1 ⇒ λ = − N t ( i ) = k ) ∑ N ( ) i =1 1 π k = N Naive Bayes and Gaussian Bayes Classifier February 22, 2016 7 / 21

  8. Spam Classification Demo Naive Bayes and Gaussian Bayes Classifier February 22, 2016 8 / 21

  9. Gaussian Bayes Classifier Instead of assuming conditional independence of x j , we model p ( x | t ) as a Gaussian distribution and the dependence relation of x j is encoded in the covariance matrix. Multivariate Gaussian distribution: 1 ( − 1 ) 2( x − µ ) T Σ − 1 ( x − µ ) f ( x ) = exp (2 π ) D det(Σ) √ µ : mean, Σ: covariance matrix, D : dim( x ) Naive Bayes and Gaussian Bayes Classifier February 22, 2016 9 / 21

  10. Derivation of maximum likelihood estimator (MLE) √ (2 π ) D det(Σ) θ = [ µ, Σ , π ] , Z = p ( x | t ) = 1 ( − 1 ) 2( x − µ ) T Σ − 1 ( x − µ ) Z exp log L ( θ ) = log p ( x , t | θ ) = log p ( t | θ ) + log p ( x | t , θ ) N log π t ( i ) − log Z − 1 ) T ( x ( i ) − µ t ( i ) ( x ( i ) − µ t ( i ) ) ∑ Σ − 1 = t ( i ) 2 i =1 Want: arg max θ log L ( θ ) subject to ∑ k π k = 1 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 10 / 21

  11. Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. µ N ∂ log L ( t ( i ) = k ) Σ − 1 ( x ( i ) − µ k ) = 0 ∑ = − 1 ∂µ k i =0 t ( i ) = k ∑ N ( ) x ( i ) i =1 1 µ k = ∑ N t ( i ) = k ( ) i =1 1 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 11 / 21

  12. Derivation of maximum likelihood estimator (MLE) Take derivative w.r.t. Σ − 1 (not Σ) Note: ∂ det( A ) = det( A ) A − 1 T ∂ A det( A ) − 1 = det A − 1 ) ( ∂ x T Ax = xx T ∂ A Σ T = Σ N ) [ ] ∂ log L − ∂ log Z k − 1 2( x ( i ) − µ k )( x ( i ) − µ k ) T ( t ( i ) = k ∑ = − = 0 1 ∂ Σ − 1 ∂ Σ − 1 k k i =0 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 12 / 21

  13. Derivation of maximum likelihood estimator (MLE) √ (2 π ) D det(Σ k ) Z k = ) − 1 Σ − 1 ( 2 ∂ det ∂ log Z k = 1 2 ∂ Z k 2 det(Σ k ) − 1 = (2 π ) − D D 2 (2 π ) k ∂ Σ − 1 ∂ Σ − 1 ∂ Σ − 1 Z k k k k ( − 1 ) k = − 1 ) − 3 1 2 det = det(Σ − 1 Σ − 1 Σ − 1 ( ( ) Σ T k ) det 2Σ k 2 k k 2 N ) [ 1 ] ∂ log L 2Σ k − 1 t ( i ) = k 2( x ( i ) − µ k )( x ( i ) − µ k ) T ( ∑ = − = 0 1 ∂ Σ − 1 k i =0 t ( i ) = k x ( i ) − µ k x ( i ) − µ k ) T ∑ N ( ) ( ) ( i =1 1 Σ k = t ( i ) = k ∑ N ( ) i =1 1 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 13 / 21

  14. Derivation of maximum likelihood estimator (MLE) t ( i ) = k ) ∑ N ( ) i =1 1 π k = N (Same as Bernoulli) Naive Bayes and Gaussian Bayes Classifier February 22, 2016 14 / 21

  15. Gaussian Bayes Classifier Demo Naive Bayes and Gaussian Bayes Classifier February 22, 2016 15 / 21

  16. Gaussian Bayes Classifier If we constrain Σ to be diagonal, then we can rewrite p ( x j | t ) as a product of p ( x j | t ) 1 ( − 1 ) 2( x j − µ jt ) T Σ − 1 p ( x | t ) = exp t ( x k − µ kt ) (2 π ) D det(Σ t ) √ D D 1 ( 1 ) ∏ || x j − µ jt || 2 ∏ = exp − = p ( x j | t ) 2 √ 2Σ t , jj (2 π ) D Σ t , jj j =1 j =1 Diagonal covariance matrix satisfies the naive Bayes assumption. Naive Bayes and Gaussian Bayes Classifier February 22, 2016 16 / 21

  17. Gaussian Bayes Classifier Case 1: The covariance matrix is shared among classes p ( x | t ) = N ( x | µ t , Σ) Case 2: Each class has its own covariance p ( x | t ) = N ( x | µ t , Σ t ) Naive Bayes and Gaussian Bayes Classifier February 22, 2016 17 / 21

  18. Gaussian Bayes Binary Classifier Decision Boundary If the covariance is shared between classes, p ( x , t = 1) = p ( x , t = 0) log π 1 − 1 2( x − µ 1 ) T Σ − 1 ( x − µ 1 ) = log π 0 − 1 2( x − µ 0 ) T Σ − 1 ( x − µ 0 ) C + x T Σ − 1 x − 2 µ T 1 Σ − 1 x + µ T 1 Σ − 1 µ 1 = x T Σ − 1 x − 2 µ T 0 Σ − 1 x + µ T 0 Σ − 1 µ 0 [ 2( µ 0 − µ 1 ) T Σ − 1 ] x − ( µ 0 − µ 1 ) T Σ − 1 ( µ 0 − µ 1 ) = C ⇒ a T x − b = 0 The decision boundary is a linear function (a hyperplane in general). Naive Bayes and Gaussian Bayes Classifier February 22, 2016 18 / 21

  19. Relation to Logistic Regression We can write the posterior distribution p ( t = 0 | x ) as p ( x , t = 0) π 0 N ( x | µ 0 , Σ) p ( x , t = 0) + p ( x , t = 1) = π 0 N ( x | µ 0 , Σ) + π 1 N ( x | µ 1 , Σ) ]} − 1 { [ 1 + π 1 − 1 2( x − µ 1 ) T Σ − 1 ( x − µ 1 ) + 1 2( x − µ 0 ) T Σ − 1 ( x − µ 0 ) = exp π 0 )]} − 1 { [ log π 1 + ( µ 1 − µ 0 ) T Σ − 1 x + 1 ( µ T 1 Σ − 1 µ 1 − µ T 0 Σ − 1 µ 0 = 1 + exp π 0 2 1 = 1 + exp( − w T x − b ) Naive Bayes and Gaussian Bayes Classifier February 22, 2016 19 / 21

  20. Gaussian Bayes Binary Classifier Decision Boundary If the covariance is not shared between classes, p ( x , t = 1) = p ( x , t = 0) log π 1 − 1 1 ( x − µ 1 ) = log π 0 − 1 2( x − µ 1 ) T Σ − 1 2( x − µ 0 ) T Σ − 1 0 ( x − µ 0 ) ( ) ( ) x T ( Σ − 1 − Σ − 1 µ T 1 Σ − 1 − µ T 0 Σ − 1 µ T 0 Σ 0 µ 0 − µ T ) x − 2 x + 1 Σ 1 µ 1 = C 1 0 1 0 ⇒ x T Qx − 2 b T x + c = 0 The decision boundary is a quadratic function. In 2-d case, it looks like an ellipse, or a parabola, or a hyperbola. Naive Bayes and Gaussian Bayes Classifier February 22, 2016 20 / 21

  21. Thanks! Naive Bayes and Gaussian Bayes Classifier February 22, 2016 21 / 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend