machine learning
play

Machine Learning Nave Bayes Model Rui Xia T ext M ining Group N - PowerPoint PPT Presentation

Machine Learning Nave Bayes Model Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn Nave Bayes Models A Probabilistic Model A Generative Model Known as the Nave Assumption


  1. Machine Learning Naïve Bayes Model Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn

  2. Naïve Bayes Models • A Probabilistic Model • A Generative Model • Known as the “Naïve” Assumption • Suitable for Discrete Distributions • Widely used in Text Classification, Natural Language Processing and Pattern Recognition Machine Learning Course, NJUST 2

  3. Generative vs. Discriminative • Discriminative Model • Generative Model It models the posterior It models the joint probability probability of class label given of class label and observation observation p(y|x) p(x, y), and then use the Bayes rule (p(y|x)=p(x,y)/p(x) ) for prediction. Machine Learning Course, NJUST 3

  4. Naïve Bayes Assumption • A Mixture Model Class prior probability 𝑞 𝑦, 𝑧 = 𝑑 𝑘 = 𝑞 𝑧 = 𝑑 𝑘 𝑞(𝑦|𝑑 𝑘 ) Class-conditional probability • Bag-of-words (BOW) representation 𝑦 = (𝜕 1 , 𝜕 2 , … , 𝜕 |𝑦| ) |𝑦| 𝑞 𝑦|𝑑 𝑘 = 𝑞 𝜕 1 , 𝜕 2 , … , 𝜕 𝑦 𝑑 𝑘 = ෑ 𝑞(𝜕 ℎ |𝑑 𝑘 ) ℎ=1 Having two event models Machine Learning Course, NJUST 4

  5. Multinomial Event Model Machine Learning Course, NJUST 5

  6. Model Description • Hypothesis 𝑞 𝑧 = 𝑑 𝑘 = 𝜌 𝑘 |𝑦| 𝑞 𝑦|𝑑 𝑘 = 𝑞 𝜕 1 , 𝜕 2 , … , 𝜕 𝑦 𝑑 𝑘 = ෑ 𝑞(𝜕 ℎ |𝑑 𝑘 ) ℎ=1 𝑊 𝑊 𝑞(𝑢 𝑗 |𝑑 𝑘 ) 𝑂(𝑢 𝑗 ,𝑦) = ෑ 𝑂(𝑢 𝑗 ,𝑦) = ෑ 𝜄 𝑗|𝑘 𝑗=1 𝑗=1 • Joint Probability Model Parameters 𝑊 𝑂(𝑢 𝑗 ,𝑦) 𝑞 𝑦, 𝑧 = 𝑑 𝑘 = 𝑞 𝑑 𝑘 𝑞 𝑦|𝑑 𝑘 = 𝜌 𝑘 ෑ 𝜄 𝑗|𝑘 𝑗=1 Machine Learning Course, NJUST 6

  7. Likelihood Function • (Joint) Likelihood 𝑂 𝑀 𝜌, 𝜄 = log ෑ 𝑞(𝑦 𝑙 , 𝑧 𝑙 ) 𝑙=1 𝑂 𝐷 = log ෑ ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑞 𝑧 𝑙 = 𝑑 𝑘 𝑞(𝑦 𝑙 |𝑧 𝑙 = 𝑑 𝑘 ) 𝑙=1 𝑘=1 𝑂 𝐷 = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑘 log 𝑞 𝑧 𝑙 = 𝑑 𝑘 𝑞(𝑦 𝑙 |𝑧 𝑙 = 𝑑 𝑘 ) 𝑙=1 𝑘=1 𝑂 𝐷 𝑊 𝑂(𝑢 𝑗 ,𝑦 𝑙 ) = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑘 log 𝜌 𝑘 ෑ 𝜄 𝑗|𝑘 𝑙=1 𝑘=1 𝑗=1 𝑂 𝐷 𝑊 = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 log𝜌 𝑘 + ෍ 𝑂 𝑢 𝑗 , 𝑦 𝑙 log𝜄 𝑗|𝑘 𝑘 𝑙=1 𝑘=1 𝑗=1 Machine Learning Course, NJUST 7

  8. Maximum Likelihood Estimation • MLE Formulation max 𝜌,𝜄 𝑀(𝜌, 𝜄) 𝐷 ෍ 𝜌 𝑘 = 1 𝑘=1 𝑡. 𝑢. 𝑊 ෍ 𝜄 𝑗|𝑘 = 1, 𝑘 = 1, … , 𝐷 𝑗=1 • Applying Lagrange multipliers 𝐷 𝐷 𝑊 𝐾 = 𝑀 𝜌, 𝜄 + 𝛽(1 − ෍ 𝜌 𝑘 ) + ෍ 𝛾 𝑘 (1 − ෍ 𝜄 𝑗|𝑘 ) 𝑘=1 𝑘=1 𝑗=1 𝑂 𝐷 𝑊 𝐷 𝐷 𝑊 = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑘 [log𝜌 𝑘 + ෍ 𝑂 𝑢 𝑗 , 𝑦 𝑙 log𝜄 𝑗|𝑘 ] + 𝛽 1 − ෍ 𝜌 𝑘 + ෍ 𝛾 𝑘 1 − ෍ 𝜄 𝑗|𝑘 𝑙=1 𝑘=1 𝑗=1 𝑘=1 𝑘=1 𝑗=1 Machine Learning Course, NJUST 8

  9. Close-form MLE Solution • Gradient 𝑂 𝜖𝐾 1 = ෍ 𝐽 𝑧 𝑙 = 𝑑 − 𝛽 = 0 𝑘 𝜖𝜌 𝑘 𝜌 𝑘 𝑙=1 𝑂 𝜖𝐾 𝑂 𝑢 𝑗 , 𝑦 𝑙 = ෍ 𝐽 𝑧 𝑙 = 𝑑 − 𝛾 𝑘 = 0 𝑘 𝜖𝜄 𝑗|𝑘 𝜄 𝑗|𝑘 𝑙=1 • MLE Solution 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 = 𝑂 𝑘 𝑘 𝜌 𝑘 = 𝐷 𝑂 𝑂 σ 𝑙=1 σ 𝑘 ′ =1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑂 𝑢 𝑗 , 𝑦 𝑙 𝜄 𝑗|𝑘 = 𝑊 𝑂 σ 𝑙=1 𝑘 σ 𝑗 ′ =1 𝐽 𝑧 𝑙 = 𝑑 𝑂 𝑢 𝑗′ , 𝑦 𝑙 Machine Learning Course, NJUST 9

  10. Laplace Smoothing • In order to prevent from zero probability 𝑊 𝑂(𝑢 𝑗 ,𝑦) 𝑞 𝑦, 𝑧 = 𝑑 𝑘 = 𝜌 𝑘 ෑ 𝜄 𝑗|𝑘 𝑗=1 • Laplace Smoothing 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑂 𝑢 𝑗 , 𝑦 𝑙 𝑘 𝜌 𝑘 = 𝜄 𝑗|𝑘 = 𝐷 𝑂 σ 𝑘 ′ =1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑊 𝑂 σ 𝑗 ′ =1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑂 𝑢 𝑗′ , 𝑦 𝑙 𝑘 𝑂 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 + 1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑂 𝑢 𝑗 , 𝑦 𝑙 + 1 𝜌 𝑘 = 𝜄 𝑗|𝑘 = 𝐷 𝑂 𝑊 𝑂 σ 𝑘 ′ =1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 + 𝐷 σ 𝑗 ′ =1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑂 𝑢 𝑗′ , 𝑦 𝑙 + 𝑊 Machine Learning Course, NJUST 10

  11. Multi-variate Bernoulli Event Model Machine Learning Course, NJUST 11

  12. Model Description • Hypothesis 𝑞 𝑧 = 𝑑 𝑘 = 𝜌 𝑘 𝑞 𝑦|𝑧 = 𝑑 𝑘 = 𝑞 𝑢 1 , 𝑢 2 , … , 𝑢 𝑊 𝑑 𝑘 𝑊 = ෑ [𝐽 𝑢 𝑗 𝜗𝑦 𝑞 𝑢 𝑗 𝑑 𝑘 + 𝐽( 𝑢 𝑗 ∉𝑦 )(1 − 𝑞 𝑢 𝑗 𝑑 𝑘 )] 𝑗=1 𝑊 = ෑ [𝐽 𝑢 𝑗 𝜗𝑦 𝜈 𝑗|𝑘 + 𝐽(𝑢 𝑗 ∉𝑦)(1 − 𝜈 𝑗|𝑘 )] 𝑗=1 • Joint Probability Model Parameters 𝑊 𝑞 𝑦, 𝑑 𝑘 = 𝜌 𝑘 ෑ [𝐽 𝑢 𝑗 𝜗𝑦 𝜈 𝑗|𝑘 + 𝐽(𝑢 𝑗 ∉𝑦)(1 − 𝜈 𝑗|𝑘 )] 𝑗=1 Machine Learning Course, NJUST 12

  13. Likelihood Function • (Joint) Likelihood 𝑂 𝑀 𝜌, 𝜈 = log ෑ 𝑞(𝑦 𝑙 , 𝑧 𝑙 ) 𝑙=1 𝑂 𝐷 = ෍ log ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑞 𝑦 𝑙 , 𝑧 𝑙 𝑙=1 𝑘=1 𝑂 𝐷 𝑊 = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑘 log𝑞(𝑑 𝑘 ) ෑ 𝐽 𝑢 𝑗 𝜗𝑦 𝑞 𝑢 𝑗 𝑑 𝑘 + 𝐽(𝑢 𝑗 ∉𝑦)(1 − 𝑞 𝑢 𝑗 𝑑 𝑘 ) 𝑙=1 𝑘=1 𝑗=1 𝑂 𝐷 𝑊 = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 log𝜌 𝑘 + ෍ 𝐽(𝑢 𝑗 𝜗𝑦 𝑙 ) log𝜈 𝑗|𝑘 + 𝐽 𝑢 𝑗 ∉𝑦 𝑙 log(1 − 𝜈 𝑗|𝑘 ) 𝑘 𝑙=1 𝑘=1 𝑗=1 Machine Learning Course, NJUST 13

  14. Maximum Likelihood Estimation • MLE Formulation max 𝜌,𝜈 𝑀(𝜌, 𝜈) 𝐷 𝑡. 𝑢. ෍ 𝜌 𝑘 = 1 𝑘=1 • Applying Lagrange multipliers 𝐷 𝐾 = 𝑀 𝜌, 𝜈 + 𝛽 1 − ෍ 𝜌 𝑘 𝑘=1 𝑂 𝐷 𝑊 𝐷 = ෍ ෍ 𝐽 𝑧 𝑙 = 𝑑 𝑚𝑝𝑕𝜌 𝑘 + ෍ 𝐽(𝑢 𝑗 𝜗𝑦 𝑙 ) 𝑚𝑝𝑕𝜈 𝑗|𝑘 + 𝐽 𝑢 𝑗 ∉𝑦 log(1 − 𝜈 𝑗|𝑘 ) + 𝛽 1 − ෍ 𝜌 𝑘 𝑘 𝑙=1 𝑘=1 𝑗=1 𝑘=1 Machine Learning Course, NJUST 14

  15. Close-form MLE Solution • Gradient 𝑂 𝜖𝐾 𝑘 ) 1 = ෍ 𝐽(𝑧 𝑙 = 𝑑 − 𝛽 = 0 𝜖𝜌 𝑘 𝜌 𝑘 𝑙=1 𝑂 𝜖𝐾 𝐽 𝑢 𝑗 𝜗𝑦 𝑙 − 𝐽 𝑢 𝑗 ∉𝑦 𝑙 = ෍ 𝐽 𝑧 𝑙 = 𝑑 = 0, ∀𝑘 = 1, … , 𝐷. 𝑘 𝜖𝜈 𝑗|𝑘 𝜈 𝑗|𝑘 1 − 𝜈 𝑗|𝑘 𝑙=1 • MLE Solution 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 = 𝑂 𝑘 𝑘 𝜌 𝑘 = 𝑂 𝐷 𝑂 σ 𝑙=1 σ 𝑘 ′ =1 𝐽 𝑧 𝑙 = 𝑑 𝑘′ 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝐽 𝑢 𝑗 𝜗𝑦 𝑙 𝜈 𝑗|𝑘 = 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 Machine Learning Course, NJUST 15

  16. Laplace Smoothing • In order to prevent from zero probability 𝑊 𝑞(𝑦, 𝑑 𝑘 ) = 𝜌 𝑘 ෑ [𝐽 𝑢 𝑗 𝜗𝑦 𝜈 𝑗|𝑘 + 𝐽(𝑢 𝑗 ∉𝑦)(1 − 𝜈 𝑗|𝑘 )] 𝑗=1 • Laplace Smoothing 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝐽 𝑢 𝑗 𝜗𝑦 𝑙 𝑘 𝜌 𝑘 = 𝜈 𝑗|𝑘 = 𝐷 𝑂 σ 𝑘 ′ =1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑂 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝑘 𝑂 𝑂 σ 𝑙=1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 + 1 𝐽 𝑧 𝑙 = 𝑑 𝑘 𝐽 𝑢 𝑗 𝜗𝑦 𝑙 + 1 𝜌 𝑘 = 𝜈 𝑗|𝑘 = 𝐷 𝑂 𝑂 σ 𝑘 ′ =1 σ 𝑙=1 σ 𝑙=1 𝐽 𝑧 𝑙 = 𝑑 𝑘 + 𝐷 𝐽 𝑧 𝑙 = 𝑑 𝑘 + 2 Machine Learning Course, NJUST 16

  17. Text Classification as An Example 17 Machine Learning Course, NJUST

  18. Data sets • Training data • Class labels • Feature vector • Test data Machine Learning Course, NJUST 18

  19. Multinomial Naïve Bayes • Training • Prediction Machine Learning Course, NJUST 19

  20. Multi-variate Bernoulli Naïve Bayes • Training • Prediction Machine Learning Course, NJUST 20

  21. Xia-NB Software • Functions – Written in C++ – Support multinomial and multi-variate Bernoulli event model – Laplace smoothing – Uniform data format like SVM-light/LibSVM – Fast running with sparse representation • Download https://github.com/NUSTM/XIA-NB Machine Learning Course, NJUST 21

  22. Project • Implement naïve Bayes algorithm with – Multinomial event model – Multi-variate Bernoulli model • Running the algorithm based on the training & testing data given in Page 18. • Compare the naïve Bayes algorithm with logistic regression (by using Bag-of-words to represent the data). Machine Learning Course, NJUST 22

  23. Questions? Machine Learning Course, NJUST

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend