Nave Bayes in a Nutshell Bayes rule: Assuming conditional - PDF document

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 25, 2011 Today: Readings: • Naïve Bayes Required: • discrete-valued X i ’s • Mitchell: “Naïve Bayes and • Document classification Logistic Regression” • Gaussian Naïve Bayes (available on class website) • real-valued X i ’s • Brain image classification Optional • Form of decision surfaces • Bishop 1.2.4 • Bishop 4.2 Naïve Bayes in a Nutshell Bayes rule: Assuming conditional independence among X i ’s: So, classification rule for X new = < X 1 , …, X n > is: 1

Another way to view Naïve Bayes (Boolean Y): Decision rule: is this quantity greater or less than 1? P(S | D,G,M) 2

Naïve Bayes: classifying text documents • Classify which emails are spam? • Classify which emails promise an attachment? How shall we represent text documents for Naïve Bayes? Learning to classify documents: P(Y|X) • Y discrete valued. – e.g., Spam or not • X = <X 1 , X 2 , … X n > = document • X i is a random variable describing… 3

Learning to classify documents: P(Y|X) • Y discrete valued. – e.g., Spam or not • X = <X 1 , X 2 , … X n > = document • X i is a random variable describing… Answer 1: X i is boolean, 1 if word i is in document, else 0 e.g., X pleased = 1 Issues? Learning to classify documents: P(Y|X) • Y discrete valued. – e.g., Spam or not • X = <X 1 , X 2 , … X n > = document • X i is a random variable describing… Answer 2: • X i represents the i th word position in document • X 1 = “I”, X 2 = “am”, X 3 = “pleased” • and, let’s assume the X i are iid (indep, identically distributed) 4

Learning to classify document: P(Y|X) the “Bag of Words” model • Y discrete valued. e.g., Spam or not • X = <X 1 , X 2 , … X n > = document • X i are iid random variables. Each represents the word at its position i in the document • Generating a document according to this distribution = rolling a 50,000 sided die, once for each word position in the document • The observed counts for each word follow a ??? distribution Multinomial Distribution 5

Multinomial Bag of Words aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0 ... gas 1 ... oil 1 … Zaire 0 MAP estimates for bag of words Map estimate for multinomial What β ’s should we choose? 6

Naïve Bayes Algorithm – discrete X i • Train Naïve Bayes (examples) for each value y k estimate for each value x ij of each attribute X i estimate prob that word x ij appears in position i, given Y=y k • Classify ( X new ) * Additional assumption: word probabilities are position independent 7

For code and data, see www.cs.cmu.edu/~tom/mlbook.html click on “Software and Data” What if we have continuous X i ? Eg., image classification: X i is real-valued i th pixel 8

What if we have continuous X i ? Eg., image classification: X i is real-valued i th pixel Naïve Bayes requires P ( X i | Y=y k ) , but X i is real (continuous) Common approach: assume P ( X i | Y=y k ) follows a Normal (Gaussian) distribution Gaussian Distribution (also called “Normal”) p(x) is a probability density function , whose integral (not sum) is 1 9

What if we have continuous X i ? Gaussian Naïve Bayes (GNB): assume Sometimes assume variance • is independent of Y (i.e., σ i ), • or independent of X i (i.e., σ k ) • or both (i.e., σ ) Gaussian Naïve Bayes Algorithm – continuous X i (but still discrete Y) • Train Naïve Bayes (examples) for each value y k estimate* for each attribute X i estimate • class conditional mean , variance • Classify ( X new ) * probabilities must sum to 1, so need estimate only n-1 parameters... 10

Estimating Parameters: Y discrete , X i continuous Maximum likelihood estimates: jth training example ith feature kth class δ ()=1 if (Y j =y k ) else 0 How many parameters must we estimate for Gaussian Naïve Bayes if Y has k possible values, X=<X1, … Xn>? 11

What is form of decision surface for Gaussian Naïve Bayes classifier? eg., if we assume attributes have same variance, indep of Y ( ) GNB Example: Classify a person’s cognitive state, based on brain image • reading a sentence or viewing a picture? • reading the word describing a “Tool” or “Building”? • answering the question, or getting confused? 12

Mean activations over all training examples for Y=“bottle” fMRI activation high average Y is the mental state (reading “house” or “bottle”) X i are the voxel activities, this is a plot of the µ’s defining P(X i | Y=“bottle”) below average Classification task: is person viewing a “tool” or “building”? statistically Classification accuracy significant p<0.05 13

Where is information encoded in the brain? Accuracies of cubical 27-voxel classifiers centered at each significant voxel [0.7-0.8] Naïve Bayes: What you should know • Designing classifiers based on Bayes rule • Conditional independence – What it is – Why it’s important • Naïve Bayes assumption and its consequences – Which (and how many) parameters must be estimated under different generative models (different forms for P(X|Y) ) • and why this matters • How to train Naïve Bayes classifiers – MLE and MAP estimates – with discrete and/or continuous inputs X i 14

Questions to think about: • Can you use Naïve Bayes for a combination of discrete and real-valued X i ? • How can we easily model just 2 of n attributes as dependent? • What does the decision surface of a Naïve Bayes classifier look like? • How would you select a subset of X i ’s? 15

Nave Bayes in a Nutshell Bayes rule: Assuming conditional - PDF document

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 25, 2011 Today: Readings: Nave Bayes Required: discrete-valued X i s Mitchell: Nave Bayes and Document

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Elementary Particle Physics in a Nutshell Elementary Particle Physics in a Nutshell

Design Thinking in a nutshell Alexander Gtze 25.10.2019 Design Thinking in a nutshell

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Bias and Parsimony in Regression Analysis ECS 256 W14 Final Project Presentaion Kevin Cosgrove,

Android goes Semantic: DL Reasoners on Smartphones Fernando Bobillo , fbobillo@unizar.es

Semantics Avalanche: Word Sense Disambiguation, Dependency Parsing, Semantic Role

Mining Useful Patterns Jilles Vreeken 22 May 2015 Questions of the day How can we find useful

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

MDL L for or Pat atte tern Min inin ing Jill illes V s Vreeken 4 4 June une 2014 2014

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

Cautious label-wise ranking with constraint satisfaction Sbastien Destercke, Yonatan Carlos