Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / - PowerPoint PPT Presentation

Naïve Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

Administrative • HW 1 out today. Please start early! • Office hours • Chen: Wed 4pm-5pm • Shih-Yang: Fri 3pm-4pm • Location: Whittemore 266

Linear Regression • Model representation ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 1 + 𝜄 2 𝑦 2 + ⋯ + 𝜄 𝑜 𝑦 𝑜 = 𝜄 ⊤ 𝑦 2 1 • Cost function 𝐾 𝜄 = 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 σ 𝑗=1 • Gradient descent for linear regression 𝑗 } 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑛 σ 𝑗=1 Repeat until convergence { 𝜄 𝑘 ≔ 𝜄 𝑘 − 𝛽 𝑦 𝑘 • Features and polynomial regression Can combine features; can use different functions to generate features (e.g., polynomial) • Normal equation 𝜄 = (𝑌 ⊤ 𝑌) −1 𝑌 ⊤ 𝑧

Number of ( 𝑦 0 ) Size in feet^2 Number of Age of home Price ($) in bedrooms ( 𝑦 1 ) floors ( 𝑦 3 ) (years) ( 𝑦 4 ) 1000’s (y) ( 𝑦 2 ) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 … … 460 232 𝑧 = 315 178 𝜄 = (𝑌 ⊤ 𝑌) −1 𝑌 ⊤ 𝑧 Slide credit: Andrew Ng

Least square solution 2 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 • 𝐾 𝜄 = 2𝑛 σ 𝑗=1 2 1 𝜄 ⊤ 𝑦 (𝑗) − 𝑧 𝑗 𝑛 2𝑛 σ 𝑗=1 = 1 2 = 2𝑛 𝑌𝜄 − 𝑧 2 𝜖 • 𝜖𝜄 𝐾 𝜄 = 0 • 𝜄 = (𝑌 ⊤ 𝑌) −1 𝑌 ⊤ 𝑧

𝒛 Justification/interpretation 1 𝒀𝜾 − 𝒛 • Geometric interpretation 𝒀𝜾 column space of 𝒀 ← 𝒚 (1) → 1 ↑ ↑ ↑ ↑ ← 𝒚 (2) → ⋯ 1 𝒀 = = 𝒜 𝟏 𝒜 𝟐 𝒜 𝟑 𝒜 𝒐 ⋮ ⋮ ↓ ↓ ↓ ↓ ← 𝒚 (𝑛) → 1 • 𝒀𝜾 : column space of 𝒀 or span( {𝒜 𝟏 , 𝒜 𝟐 , ⋯ , 𝒜 𝒐 } ) • Residual 𝒀𝜾 − 𝐳 is orthogonal to the column space of 𝒀 • 𝒀 ⊤ 𝒀𝜾 − 𝐳 = 0 → (𝒀 ⊤ 𝒀)𝜾 = 𝒀 ⊤ 𝒛

Justification/interpretation 2 • Probabilistic model • Assume linear model with Gaussian errors 2𝜌𝜏 2 exp(− 1 1 𝑞 𝜄 𝑧 𝑗 𝑦 𝑗 2𝜏 2 (𝑧 𝑗 − 𝜄 ⊤ 𝑦 𝑗 ) = • Solving maximum likelihood 𝑛 𝑞 𝜄 𝑧 𝑗 𝑦 𝑗 argmin ෑ 𝜄 𝑛 1 𝑗=1 𝑛 1 2 𝑞 𝑧 𝑗 𝑦 𝑗 2 𝜄 ⊤ 𝑦 𝑗 − 𝑧 𝑗 argmin log(ෑ ) = argmin 2𝜏 2 ෍ 𝜄 𝜄 𝑗=1 𝑗=1 Image credit: CS 446@UIUC

Justification/interpretation 3 • Loss minimization 𝑛 𝑛 𝐾 𝜄 = 1 2 = 1 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑀 𝑚𝑡 ℎ 𝜄 𝑦 𝑗 , 𝑧 𝑗 2𝑛 ෍ 𝑛 ෍ 𝑗=1 𝑗=1 1 2 : Least squares loss • 𝑀 𝑚𝑡 𝑧, ො 𝑧 = 2 𝑧 − ො 𝑧 2 • Empirical Risk Minimization (ERM) 𝑛 1 𝑀 𝑚𝑡 𝑧 𝑗 , ො 𝑛 ෍ 𝑧 𝑗=1

𝑛 training examples, 𝑜 features Gradient Descent Normal Equation • Need to choose 𝛽 • No need to choose 𝛽 • Need many iterations • Don’t need to iterate • Works well even when • Need to compute (𝑌 ⊤ 𝑌) −1 𝑜 is large • Slow if 𝑜 is very large Slide credit: Andrew Ng

Things to remember • Model representation ℎ 𝜄 𝑦 = 𝜄 0 + 𝜄 1 𝑦 1 + 𝜄 2 𝑦 2 + ⋯ + 𝜄 𝑜 𝑦 𝑜 = 𝜄 ⊤ 𝑦 2 1 • Cost function 𝐾 𝜄 = 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 2𝑛 σ 𝑗=1 • Gradient descent for linear regression 𝑗 } 1 𝑛 ℎ 𝜄 𝑦 𝑗 − 𝑧 𝑗 𝑛 σ 𝑗=1 Repeat until convergence { 𝜄 𝑘 ≔ 𝜄 𝑘 − 𝛽 𝑦 𝑘 • Features and polynomial regression Can combine features; can use different functions to generate features (e.g., polynomial) • Normal equation 𝜄 = (𝑌 ⊤ 𝑌) −1 𝑌 ⊤ 𝑧

Today’s plan • Probability basics • Estimating parameters from data • Maximum likelihood (ML) • Maximum a posteriori estimation (MAP) • Naïve Bayes

Today’s plan • Probability basics • Estimating parameters from data • Maximum likelihood (ML) • Maximum a posteriori estimation (MAP) • Naive Bayes

Random variables • Outcome space S • Space of possible outcomes • Random variables • Functions that map outcomes to real numbers • Event E • Subset of S

Visualizing probability 𝑄(𝐵) Sample space Area = 1 A is true A is false 𝑄 𝐵 = Area of the blue circle

Visualizing probability 𝑄 𝐵 + P ~A A is true A is false 𝑄 𝐵 + P ~A = 1

Visualizing probability 𝑄 𝐵 A^B B A^~B 𝑄 𝐵 = P(A, B) + P A, ~𝐶

Visualizing conditional probability A^B 𝑄 𝐵|𝐶 = 𝑄 𝐵, 𝐶 /𝑄(𝐶) Corollary: The chain rule B A 𝑄 𝐵, 𝐶 = 𝑄 𝐵|𝐶 𝑄(𝐶)

Bayes rule A^B Thomas Bayes B A 𝑄 𝐵|𝐶 = 𝑄 𝐵, 𝐶 𝑄 𝐶 = 𝑄 𝐶 𝐵 𝑄 𝐵 𝑄(𝐶) Corollary: The chain rule 𝑄 𝐵, 𝐶 = 𝑄 𝐵|𝐶 𝑄 𝐶 = P B P A B

Other forms of Bayes rule 𝑄 𝐵|𝐶 = 𝑄 𝐶 𝐵 𝑄 𝐵 𝑄(𝐶) 𝑄 𝐵|𝐶, 𝑌 = 𝑄 𝐶 𝐵, 𝑌 𝑄 𝐵, 𝑌 𝑄(𝐶, 𝑌) 𝑄 𝐶 𝐵 𝑄 𝐵 𝑄 𝐵|𝐶 = 𝑄 𝐶 𝐵 𝑄 𝐵 + 𝑄 𝐶 ~𝐵 𝑄(~𝐵)

Applying Bayes rule 𝑄 𝐶 𝐵 𝑄 𝐵 𝑄 𝐵|𝐶 = 𝑄 𝐶 𝐵 𝑄 𝐵 + 𝑄 𝐶 ~𝐵 𝑄(~𝐵) • A = you have the flu B = you just coughed • Assume: • 𝑄 𝐵 = 0.05 0.8 × 0.05 𝑄 𝐵|𝐶 = 0.8 × 0.05 + 0.2 × 0.95 ~0.17 • 𝑄 𝐶 𝐵 = 0.8 • 𝑄 𝐶 ~𝐵 = 0.2 • What is P(flu | cough) = P(A|B)? Slide credit: Tom Mitchell

Why we are learning this? 𝑦 ℎ 𝑧 Hypothesis Learn 𝑄 𝑍|𝑌

A B C Prob Joint distribution 0 0 0 0.30 0 0 1 0.05 0 1 0 0.10 • Making a joint distribution of M variables 0 1 1 0.05 1 0 0 0.05 1 0 1 0.10 1. Make a truth table listing all combinations 1 1 0 0.25 1 1 1 0.10 2. For each combination of values, say how probable it is 3. Probability must sum to 1 Slide credit: Tom Mitchell

Using joint distribution • Can ask for any logical expression involving these variables • 𝑄 𝐹 = σ rows matching E 𝑄(row) σ rows matching E1 and 𝐹2 𝑄 row • 𝑄 𝐹 1 |𝐹 2 = σ rows matching 𝐹 2 𝑄 row Slide credit: Tom Mitchell

The solution to learn 𝑄 𝑍|𝑌 ? • Main problem: learning 𝑄 𝑍|𝑌 may require more data than we have • Say, learning a joint distribution with 100 attributes 2 100 ≥ 10 30 • # of rows in this table? • # of people on earth? 10 9 Slide credit: Tom Mitchell

What should we do? 1. Be smart about how we estimate probabilities from sparse data • Maximum likelihood estimates (ML) • Maximum a posteriori estimates (MAP) 2. Be smart about how to represent joint distributions • Bayes network, graphical models (more on this later) Slide credit: Tom Mitchell

Today’s plan • Probability basics • Estimating parameters from data • Maximum likelihood (ML) • Maximum a posteriori (MAP) • Naive Bayes

Estimating the probability 𝑌 = 1 𝑌 = 0 • Flip the coin repeatedly, observing • It turns heads 𝛽 1 times • It turns tails 𝛽 0 times • Your estimate for 𝑄 𝑌 = 1 is? • Case A: 100 flips: 51 Heads ( 𝑌 = 1 ), 49 Tails ( 𝑌 = 0 ) 𝑄 𝑌 = 1 = ? • Case B: 3 flips: 2 Heads ( 𝑌 = 1 ), 1 Tails ( 𝑌 = 0 ) 𝑄 𝑌 = 1 = ? Slide credit: Tom Mitchell

Two principles for estimating parameters • Maximum Likelihood Estimate (MLE) Choose 𝜄 that maximizes probability of observed data 𝜾 MLE = argmax ෡ 𝑄(𝐸𝑏𝑢𝑏|𝜄) 𝜄 • Maximum a posteriori estimation (MAP) Choose 𝜄 that is most probable given prior probability and data 𝑄 𝐸𝑏𝑢𝑏 𝜄 𝑄 𝜄 𝜾 MAP = argmax ෡ 𝑄 𝜄 𝐸 = argmax 𝑄(𝐸𝑏𝑢𝑏) 𝜄 𝜄 Slide credit: Tom Mitchell

Two principles for estimating parameters • Maximum Likelihood Estimate (MLE) Choose 𝜄 that maximizes 𝑄 𝐸𝑏𝑢𝑏 𝜄 𝛽 1 𝜾 MLE = ෡ 𝛽 1 + 𝛽 0 • Maximum a posteriori estimation (MAP) Choose 𝜄 that maximize 𝑄 𝜄 𝐸𝑏𝑢𝑏 (𝛽 1 + #halluciated 1s) 𝜾 MAP = ෡ (𝛽 1 +#halluciated 1𝑡) + (𝛽 0 + #halluciated 0s) Slide credit: Tom Mitchell

Maximum likelihood estimate 𝑌 = 1 𝑌 = 0 • Each flip yields Boolean value for 𝑌 𝑄 𝑌 = 1 = 𝜄 𝑌 ∼ Bernoulli : 𝑄 𝑌 = 𝜄 𝑌 1 − 𝜄 1−𝑌 𝑄 𝑌 = 0 = 1 − 𝜄 • Data set 𝐸 of independent, identically distributed (iid) flips, produces 𝛽 1 ones, 𝛽 0 zeros 𝑄 𝐸 𝜄 = 𝑄 𝛽 1 , 𝛽 0 𝜄 = 𝜄 𝛽 1 1 − 𝜄 𝛽 0 𝛽 1 ෡ 𝜾 = argmax 𝑄(𝐸|𝜄) = 𝛽 1 + 𝛽 0 𝜄 Slide credit: Tom Mitchell

Beta prior distribution 𝑄 𝜄 1 𝐶(𝛾 1 ,𝛾 0 ) 𝜄 𝛾 1 −1 1 − 𝜄 𝛾 0 −1 • 𝑄 𝜄 = 𝐶𝑓𝑢𝑏 𝛾 1 , 𝛾 0 = Slide credit: Tom Mitchell

Maximum likelihood estimate 𝑌 = 1 𝑌 = 0 • Data set 𝐸 of iid flips, produces 𝛽 1 ones, 𝛽 0 zeros 𝑄 𝐸 𝜄 = 𝑄 𝛽 1 , 𝛽 0 𝜄 = 𝜄 𝛽 1 1 − 𝜄 𝛽 0 • Assume prior ( Conjugate prior: Closed form representation of posterior) 1 𝐶(𝛾 1 , 𝛾 0 ) 𝜄 𝛾 1 −1 1 − 𝜄 𝛾 0 −1 𝑄 𝜄 = 𝐶𝑓𝑢𝑏 𝛾 1 , 𝛾 0 = 𝛽 1 + 𝛾 1 − 1 ෡ 𝜾 = argmax 𝑄 𝐸 𝜄 P(𝜄) = (𝛽 1 + 𝛾 1 − 1) + (𝛽 0 + 𝛾 0 − 1) 𝜄 Slide credit: Tom Mitchell

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / - PowerPoint PPT Presentation

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266 Linear Regression

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

About this class Point Estimators The next two lectures are really coming from Lets say we

Maximum Likelihood Theory Max Turgeon STAT 4690Applied Multivariate Analysis Suffjcient

Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models Chenyang Zhang,

TUTORIAL TUTORIAL Matthieu R Bloch Tuesday, March 24, 2020 1 MLE FOR UNIFORM DISTRIBUTIONS

6. Linear & logistjc regressions Chlo-Agathe Azencot Centre for Computatjonal Biology,

Maximum Likelihood Fits on GPUs S. Jarp, A. Lazzaro, J. Leduc, A. Nowak, F. Pantaleo CERN

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 7: Maximum likelihood

Sambuz

Useful Links

Newsletter

Mail Us

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / - PowerPoint PPT Presentation

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266 Linear Regression

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

About this class Point Estimators The next two lectures are really coming from Lets say we

Maximum Likelihood Theory Max Turgeon STAT 4690Applied Multivariate Analysis Suffjcient

Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models Chenyang Zhang,

TUTORIAL TUTORIAL Matthieu R Bloch Tuesday, March 24, 2020 1 MLE FOR UNIFORM DISTRIBUTIONS

6. Linear &amp; logistjc regressions Chlo-Agathe Azencot Centre for Computatjonal Biology,

Maximum Likelihood Fits on GPUs S. Jarp, A. Lazzaro, J. Leduc, A. Nowak, F. Pantaleo CERN

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 7: Maximum likelihood

Sambuz

Useful Links

Newsletter

Mail Us

6. Linear & logistjc regressions Chlo-Agathe Azencot Centre for Computatjonal Biology,