MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: - PowerPoint PPT Presentation

10-‑601 ¡Introduction ¡to ¡Machine ¡Learning Machine ¡Learning ¡Department School ¡of ¡Computer ¡Science Carnegie ¡Mellon ¡University MLE/MAP + Naïve ¡Bayes MLE ¡/ ¡MAP ¡Readings: Naïve ¡Bayes ¡Readings: Matt ¡Gormley “Estimating ¡Probabilities” ¡ “Generative ¡and ¡Discriminative ¡ (Mitchell, ¡2016) Classifiers: ¡Naive ¡Bayes ¡and ¡Logistic ¡ Lecture ¡5 Regression” ¡ (Mitchell, ¡2016) February ¡1, ¡2016 Murphy ¡3 Bishop ¡-‑-‑ HTF ¡-‑-‑ Mitchell ¡6.1-‑6.10 1

Reminders • Background Exercises (Homework 1) – Release: ¡Wed, ¡Jan. ¡25 – Due: ¡Wed, ¡Feb. ¡1 ¡at ¡5:30pm – ONLY ¡HW1: ¡Collaboration questions not required • Homework 2: ¡Naive Bayes – Release: ¡Wed, ¡Feb. ¡1 – Due: ¡Mon, ¡Feb. ¡13 ¡at ¡5:30pm 2

MLE ¡/ ¡MAP ¡Outline • Generating ¡Data – Natural ¡(stochastic) ¡data – Synthetic ¡data – Why ¡synthetic ¡data? – Examples: ¡Multinomial, ¡Bernoulli, ¡Gaussian • Data ¡Likelihood Last ¡Lecture – Independent ¡and ¡Identically ¡Distributed ¡(i.i.d.) – Example: ¡Dice ¡Rolls • Learning ¡from ¡Data ¡(Frequentist) – Principle ¡of ¡Maximum ¡Likelihood ¡Estimation ¡(MLE) – Optimization ¡for ¡MLE – Examples: ¡1D ¡and ¡2D ¡optimization – Example: ¡MLE ¡of ¡Multinomial – Aside: ¡Method ¡of ¡Lagrange ¡Multipliers • Learning ¡from ¡Data ¡(Bayesian) This ¡Lecture – maximum ¡a ¡posteriori ¡ (MAP) ¡estimation – Optimization ¡for ¡MAP – Example: ¡MAP ¡of ¡Bernoulli—Beta ¡ 3

Learning ¡from ¡Data ¡(Frequentist) Whiteboard – Aside: ¡Method ¡of ¡Langrange Multipliers – Example: ¡MLE ¡of ¡Multinomial 4

Learning ¡from ¡Data ¡(Bayesian) Whiteboard – maximum ¡a ¡posteriori ¡ (MAP) ¡estimation – Optimization ¡for ¡MAP – Example: ¡MAP ¡of ¡Bernoulli—Beta ¡ 5

Takeaways • One ¡view ¡of ¡what ¡ML ¡is ¡trying ¡to ¡accomplish ¡is ¡ function ¡approximation • The ¡principle ¡of ¡ maximum ¡likelihood ¡ estimation ¡ provides ¡an ¡alternate ¡view ¡of ¡ learning • Synthetic ¡data ¡ can ¡help ¡ debug ML ¡algorithms • Probability ¡distributions ¡can ¡be ¡used ¡to ¡ model real ¡data ¡that ¡occurs ¡in ¡the ¡world (don’t ¡worry ¡we’ll ¡make ¡our ¡distributions ¡more ¡ interesting ¡soon!) 6

Naïve ¡Bayes ¡Outline • Probabilistic ¡(Generative) ¡View ¡of ¡ Classification – Decision ¡rule ¡for ¡probability ¡model • Real-‑world ¡Dataset – Economist ¡vs. ¡Onion ¡articles – Document ¡ à bag-‑of-‑words ¡ à binary ¡feature ¡ vector • Naive ¡Bayes: ¡Model – Generating ¡synthetic ¡"labeled ¡documents" – Definition ¡of ¡model – Naive ¡Bayes ¡assumption – Counting ¡# ¡of ¡parameters ¡with ¡/ ¡without ¡NB ¡ assumption • Naïve ¡Bayes: ¡Learning ¡from ¡Data – Data ¡likelihood – MLE ¡for ¡Naive ¡Bayes – MAP ¡for ¡Naive ¡Bayes • Visualizing ¡Gaussian ¡Naive ¡Bayes 7

Today’s ¡Goal To ¡define ¡a ¡generative ¡model ¡ of ¡emails ¡of ¡two ¡different ¡ classes ¡ (e.g. ¡spam ¡vs. ¡not ¡spam) 8

Spam ¡News The ¡Economist The ¡Onion 9

Real-‑world ¡Dataset Whiteboard – Economist ¡vs. ¡Onion ¡articles – Document ¡ à bag-‑of-‑words ¡ à binary ¡feature ¡ vector 10

Naive ¡Bayes: ¡Model Whiteboard – Generating ¡synthetic ¡"labeled ¡documents" – Definition ¡of ¡model – Naive ¡Bayes ¡assumption – Counting ¡# ¡of ¡parameters ¡with ¡/ ¡without ¡NB ¡ assumption 11

Model ¡1: ¡Bernoulli ¡Naïve ¡Bayes Flip ¡weighted ¡coin If ¡TAILS, ¡flip ¡ If ¡HEADS, ¡flip ¡ each ¡blue ¡coin each ¡red ¡coin y x 1 x 2 x 3 x M … … … 0 1 1 0 … 1 We ¡can ¡ generate data ¡in ¡ 1 0 0 1 … 1 this ¡fashion. ¡Though ¡in ¡ practice ¡we ¡never ¡would ¡ 1 1 1 1 … 1 since ¡our ¡data ¡is ¡ given . ¡ 0 0 0 1 … 1 Instead, ¡this ¡provides ¡an ¡ 0 1 0 1 … 0 explanation ¡of ¡ how the ¡ Each ¡red ¡coin ¡ data ¡was ¡generated ¡ corresponds ¡to ¡ 1 1 1 0 … 0 (albeit ¡a ¡terrible ¡one). an ¡ x m 12

Naive ¡Bayes: ¡Model Whiteboard – Generating ¡synthetic ¡"labeled ¡documents" – Definition ¡of ¡model – Naive ¡Bayes ¡assumption – Counting ¡# ¡of ¡parameters ¡with ¡/ ¡without ¡NB ¡ assumption 13

What’s ¡wrong ¡with ¡the ¡ Naïve ¡Bayes ¡Assumption? The ¡features ¡might ¡not ¡be ¡independent!! • Example ¡1: – If ¡a ¡document ¡contains ¡the ¡word ¡ “Donald”, ¡it’s ¡extremely ¡likely ¡to ¡ contain ¡the ¡word ¡“Trump” – These ¡are ¡not ¡independent! • Example ¡2: – If ¡the ¡petal ¡width ¡is ¡very ¡high, ¡ the ¡petal ¡length ¡is ¡also ¡likely ¡to ¡ be ¡very ¡high 14

Naïve ¡Bayes: ¡Learning ¡from ¡Data Whiteboard – Data ¡likelihood – MLE ¡for ¡Naive ¡Bayes – MAP ¡for ¡Naive ¡Bayes 15

VISUALIZING ¡NAÏVE ¡BAYES 16 Slides ¡in ¡this ¡section ¡from ¡William ¡Cohen ¡(10-‑601B, ¡Spring ¡2016)

Fisher ¡Iris ¡Dataset Fisher ¡(1936) ¡used ¡150 ¡measurements ¡of ¡flowers ¡ from ¡3 ¡different ¡species: ¡Iris ¡setosa (0), ¡Iris ¡ virginica (1), ¡Iris ¡versicolor (2) ¡collected ¡by ¡ Anderson ¡(1936) Species Sepal ¡ Sepal ¡ Petal ¡ Petal ¡ Length Width Length Width 0 4.3 3.0 1.1 0.1 0 4.9 3.6 1.4 0.1 0 5.3 3.7 1.5 0.2 1 4.9 2.4 3.3 1.0 1 5.7 2.8 4.1 1.3 1 6.3 3.3 4.7 1.6 1 6.7 3.0 5.0 1.7 18 Full ¡dataset: ¡https://en.wikipedia.org/wiki/Iris_flower_data_set

Slide ¡from ¡William ¡Cohen

Plot ¡the ¡difference ¡of ¡the ¡probabilities Slide ¡from ¡William ¡Cohen

Naïve ¡Bayes ¡has ¡a ¡ linear decision ¡ boundary Slide ¡from ¡William ¡Cohen ¡(10-‑601B, ¡Spring ¡2016)

Figure ¡from ¡William ¡Cohen ¡(10-‑601B, ¡Spring ¡2016)

Figure ¡from ¡William ¡Cohen ¡(10-‑601B, ¡Spring ¡2016) Why ¡don’t ¡we ¡drop ¡the ¡ generative ¡model ¡and ¡ try ¡to ¡learn ¡this ¡ hyperplane directly?

Beyond ¡the ¡Scope ¡of ¡this ¡Lecture • Multinomial Naïve ¡Bayes ¡can ¡be ¡used ¡for ¡ integer features • Multi-‑class ¡ Naïve ¡Bayes ¡can ¡be ¡used ¡if ¡your ¡ classification ¡problem ¡has ¡> ¡2 ¡classes 25

Summary 1. Naïve ¡Bayes ¡provides ¡a ¡framework ¡for ¡ generative ¡modeling 2. Choose ¡p(x m | ¡y) ¡appropriate ¡to ¡the ¡data (e.g. ¡Bernoulli ¡for ¡binary ¡features, ¡ Gaussian ¡for ¡continuous ¡features) 3. Train ¡by ¡ MLE or ¡ MAP 4. Classify ¡by ¡maximizing ¡the ¡posterior 26

EXTRA ¡SLIDES 27

Generic Naïve ¡Bayes ¡Model Support: Depends ¡on ¡the ¡choice ¡of ¡ event ¡model , ¡ P(X k |Y) Model: ¡ Product ¡of ¡ prior and ¡the ¡event ¡model K � P ( � , Y ) = P ( Y ) P ( X k | Y ) k =1 Training: ¡ Find ¡the ¡ class-‑conditional ¡ MLE ¡parameters For ¡ P(Y) , ¡we ¡find ¡the ¡MLE ¡using ¡all ¡the ¡data. ¡For ¡each ¡ P(X k |Y) we ¡condition ¡on ¡the ¡data ¡with ¡the ¡corresponding ¡ class. Classification: ¡ Find ¡the ¡class ¡that ¡maximizes ¡the ¡posterior y = �� ˆ p ( y | � ) y 28

Generic Naïve ¡Bayes ¡Model Classification: (posterior) y = �� ˆ p ( y | � ) y p ( � | y ) p ( y ) (by Bayes’ rule) = �� p ( x ) y = �� p ( � | y ) p ( y ) y 29

Model ¡1: ¡Bernoulli ¡Naïve ¡Bayes Support: ¡ Binary ¡vectors ¡of ¡length ¡K � ∈ { 0 , 1 } K Generative ¡Story: Y ∼ Bernoulli ( φ ) X k ∼ Bernoulli ( θ k,Y ) ∀ k ∈ { 1 , . . . , K } Model: p φ , θ ( x , y ) = p φ , θ ( x 1 , . . . , x K , y ) K � = p φ ( y ) p θ k ( x k | y ) k =1 K � = ( φ ) y (1 − φ ) (1 − y ) ( θ k,y ) x k (1 − θ k,y ) (1 − x k ) k =1 30

Model ¡1: ¡Bernoulli ¡Naïve ¡Bayes Support: ¡ Binary ¡vectors ¡of ¡length ¡K � ∈ { 0 , 1 } K Generative ¡Story: Y ∼ Bernoulli ( φ ) X k ∼ Bernoulli ( θ k,Y ) ∀ k ∈ { 1 , . . . , K } Same ¡as ¡Generic ¡ K Naïve ¡Bayes Model: = ( φ ) y (1 − φ ) (1 − y ) � ( θ k,y ) x k (1 − θ k,y ) (1 − x k ) p φ , θ ( x , y ) = k =1 Classification: ¡ Find ¡the ¡class ¡that ¡maximizes ¡the ¡posterior y = �� ˆ p ( y | � ) y 31

Model ¡1: ¡Bernoulli ¡Naïve ¡Bayes Training: ¡ Find ¡the ¡ class-‑conditional ¡ MLE ¡parameters For ¡ P(Y) , ¡we ¡find ¡the ¡MLE ¡using ¡all ¡the ¡data. ¡For ¡each ¡ P(X k |Y) we ¡condition ¡on ¡the ¡data ¡with ¡the ¡corresponding ¡ class. i =1 I ( y ( i ) = 1) � N φ = N i =1 I ( y ( i ) = 0 ∧ x ( i ) � N = 1) k θ k, 0 = i =1 I ( y ( i ) = 0) � N i =1 I ( y ( i ) = 1 ∧ x ( i ) � N = 1) k θ k, 1 = i =1 I ( y ( i ) = 1) � N ∀ k ∈ { 1 , . . . , K } 32

MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: Matt

MLE vs. MAP Aarti Singh Machine Learning 10-701/15-781 Sept 15, 2010 1 MLE vs. MAP Maximum

Making Life Easier Online service for people within North Lanarkshire MLE History MLE website

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Logistic Regression: MLE vs. OLS1 in Excel2013 29 Aug 2016 V0B V0B V0B Schield MLE vs.

MLE, MAP, AND NAIVE BAYES 10-601 RECITATION MARY MCGLOHON MLE The usual representation we come

Laying a Solid Foundation for Learning: Lessons from the Kom MLE Project in Cameroon Paul

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Homework 2 MLE and Naive Bayes Instructions Answer the questions and upload your answers to

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabs Pczos

Nave Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning /

Some issues in verifying e-voting systems Mark D. Ryan Present-day e-voting offers few

Servicification: Modularizing Chromium {blundell, clamy, rjkroege}@chromium.org 2018 |

The not so Dark Side of the Darknet Dr. Victoria Wang (BSc; PhD) Senior Lecturer on Security and

How your native language shapes your world Daniel Rozenberg Sapir-Whorf hypothesis The principle

Anonymous Communication and Internet Freedom CS 161: Computer Security Prof. David Wagner May 2,

Stuttering modification with the young stuttering child - additional slides Peter Schneider

Anonymity Fall 2017 Franziska (Franzi) Roesner franzi@cs.washington.edu Thanks to Dan Boneh,