a probabilistic view
play

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C - PowerPoint PPT Presentation

A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Some slides based on material by Tom Mitchell What we know so far Bayes rule A probabilistic view of machine learning If we know the


  1. A Probabilistic View of Machine Learning (2/2) CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Some slides based on material by Tom Mitchell

  2. What we know so far… • Bayes rule • A probabilistic view of machine learning – If we know the data generating distribution, we can define the Bayes optimal classifier – Under iid assumption • How to estimate a probability distribution from data? – Maximum likelihood estimation

  3. T oday • How to compute Maximum Likelihood Estimates – For Bernouilli and Categorical Distributions • Naïve Bayes classifier

  4. Maximum Likelihood Estimates Given a data set D of iid flips, which contains 𝛽 1 ones and 𝛽 0 zeros 𝑄 𝜄 (𝐸) = 𝜄 𝛽 1 (1 − 𝜄) 𝛽 0 𝛽 1 𝜄 𝑁𝑀𝐹 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑄 𝜄 𝐸 = 𝛽 1 + 𝛽 0

  5. Maximum Likelihood Estimates Given a data set D of iid rolls, which contains 𝑦 𝑙 outcomes 𝑙 for each 𝑙 𝐿 K sided die 𝑦 𝑙 𝑄 𝜄 (𝐸) = 𝜄 𝑙 ∀ 𝑙, 𝑄 𝑌 = 𝑙 = 𝜄 𝑙 𝑙=1 (Categorical Distribution) 𝜄 𝑁𝑀𝐹 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑄 𝜄 𝐸 Problem: = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 log 𝑄 𝜄 𝐸 This objective lacks 𝐿 constraints! = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑦 𝑙 log(𝜄 𝑙 ) 𝑙=1

  6. Maximum Likelihood Estimates A constrained optimization problem 𝐿 𝜄 𝑁𝑀𝐹 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝜄 𝑦 𝑙 log(𝜄 𝑙 ) 𝑙=1 K sided die 𝐿 ∀ 𝑙, 𝑄 𝑌 = 𝑙 = 𝜄 𝑙 𝑥𝑗𝑢ℎ 𝜄 𝑙 = 1 𝑙=1 How to solve it? Use lagrange multipliers to turn it into unconstrained objective (on board)

  7. Maximum Likelihood Estimates The parameters that maximize the likelihood of the data are given by: 𝑦 𝑙 K sided die 𝜄 𝑙 = ∀ 𝑙, 𝑄 𝑌 = 𝑙 = 𝜄 𝑙 𝑙 𝑦 𝑙 This is the relative frequency of rolls where side k comes up!

  8. T oday • How to compute Maximum Likelihood Estimates – For Bernouilli and Categorical Distributions • Naïve Bayes classifier

  9. Let’s learn a classifier by learning P(Y|X) • Goal: learn a classifier P(Y|X) • Prediction: – Given an example x – Predict 𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝑦)

  10. Parameters for P(X,Y) vs. P(Y|X) Y = Wealth X = <Gender, Hours_worked> Joint probability distribution P(X,Y) Conditional probability distribution P(Y|X)

  11. Parameters for P(X,Y) and P(Y|X) • P(Y|X) requires estimating fewer parameters than P(X,Y) • But that is still too many parameters in practice! • So we need simplifying assumptions to make estimation more practical

  12. Naïve Bayes Assumption Naïve Bayes assumes 𝑒 𝑄 𝑌 1 , 𝑌 2 , … 𝑌 𝑒 𝑍 = 𝑗=1 𝑄(𝑌 𝑗 |𝑍) i.e., that 𝑌 𝑗 and 𝑌 𝑘 are conditionally independent given Y, for all 𝑗 ≠ 𝑘

  13. Conditional Independence • Definition: X is conditionally independent of Y given Z if P(X|Y,Z) = P(X|Z) • Recall that X is independent of Y if P(X|Y)=P(Y)

  14. Naïve Bayes classifier 𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄 𝑍 = 𝑧 𝑌 = 𝑦) = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄(𝑍 = 𝑧)𝑄 𝑌 = 𝑦 𝑍 = 𝑧) 𝑒 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑧 𝑄(𝑍 = 𝑧) 𝑄 𝑌 𝑗 = 𝑦 𝑗 𝑍 = 𝑧) 𝑗=1 Bayes rule + Conditional independence assumption

  15. How many parameters do we need to learn? • To describe P(Y)? • To describe 𝑄 𝑌 = < 𝑌 1 , 𝑌 2 , … 𝑌 𝑒 > 𝑍 ) – Without conditional independence assumption? – With conditional independence assumption? (Suppose all random variables are Boolean)

  16. Training a Naïve Bayes classifier Let’s assume discrete Xi and Y # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 𝑔𝑝𝑠 𝑥ℎ𝑗𝑑ℎ 𝑍 = 𝑧 𝑙 TrainNaïveBayes (Data) # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 for each value 𝑧 𝑙 of Y estimate 𝜌 𝑙 = 𝑄(𝑍 = 𝑧 𝑙 ) for each value 𝑦 𝑗𝑘 of 𝑌 𝑗 estimate 𝜄 𝑗𝑘𝑙 = 𝑄 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑍 = 𝑧 𝑙 ) # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 𝑔𝑝𝑠 𝑥ℎ𝑗𝑑ℎ 𝑌 𝑗 = 𝑦 𝑗𝑘 𝑏𝑜𝑒 𝑍 = 𝑧 𝑙 # 𝑓𝑦𝑏𝑛𝑞𝑚𝑓𝑡 𝑔𝑝𝑠 𝑥ℎ𝑗𝑑ℎ 𝑍 = 𝑧 𝑙

  17. Naïve Bayes Wrap-up • A simple classifier, that performs well in practice • Subtleties – Often the Xi are not really conditionally independent – What if the Maximum Likelihood estimate for P(Xi|Y) is zero?

  18. What you should know • The Naïve Bayes classifier – Conditional independence assumption – How to train it? – How to make predictions? – How does it relate to other classifiers we know? [HW] • Fundamental Machine Learning concepts – iid assumption – Bayes optimal classifier

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend