Nave Bayes A special type of Bayesian network Makes a conditional - PDF document

Naïve Bayes • A special type of Bayesian network • Makes a conditional independence CS 331: Artificial Intelligence assumption Naïve Bayes • Typically used for classification Thanks to Andrew Moore for some course material 1 2 Classification Classification This is called the “class” variable These are called Suppose you are trying to classify situations that determine (because we’re trying to classify it) features or attributes whether or not Canvas will be down. You’ve come up with the following list of variables (which are all Boolean): We also have a Boolean Monday Is a Monday Monday Assn Grades Win CD variable called CD Assn CS331 assignment which stands for These entries in the true true true false true due “Canvas down” CD column are false true true true false Grades CS331 instructor called “class labels” true false false false false needs to enter grades false true true false true Win The Beavers won the true true true false true football game false false true false true true true false true false 3 4 Classification Naïve Bayes Structure Monday Assn Grades Win CD You create a dataset out of your past experience. This true true true false true CD is called “training data”. false true true true false true false false false false false true false false true true true true false true M A G W false false true false true true true false true false You now have 2 new Notice the conditional independence assumption: situations and you would like Monday Assn Grades Win to predict if Canvas will go The features are conditionally independent given down. This is called “test true true true true data”. the class variable. false true true false 5 6 1

Naïve Bayes Parameters Naïve Bayes Parameters CD P ( CD ) = ? CD M A G W CD P( CD ) M A G W false (# of records in training data with CD = false) / (# of records in training data) P ( M | CD ) = ? P ( A | CD ) = ? P ( G | CD ) = ? P ( W | CD ) = ? true (# of records in training data with CD = true) / How do you get these parameters from the training data? (# of records in training data) 7 8 Naïve Bayes Parameters Inference in Naïve Bayes CD P ( CD | M , A , G , W ) ( , , , | ) ( ) P M A G W CD P CD  By Bayes Rule M A G W P ( M , A , G , W )   ( , , , | ) ( ) P M A G W CD P CD Treat denominator M CD P( M | CD ) as constant false false (# of records with M = false and CD = false) / (# of records with CD = false)   P ( CD ) P ( M | CD ) P ( A | CD ) P ( G | CD ) P ( W | CD ) false true (# of records with M = false and CD = true) / (# of records with CD = true) true false (# of records with M = true and CD = false) / From conditional (# of records with CD = false) independence true true (# of records with M = true and CD = true) / (# of records with CD = true) 10 Prediction Prediction • Suppose you are now in a day when • You need to compare: – P( cd | m, a, g, w ) = α P( cd ) P( m | cd ) P( a | M=true, A=true, G=true, W=true. cd ) P( g | cd ) P( w | cd ) • You need to predict if CD=true or – P(  cd | m, a, g, w) = α P(  cd ) P( m |  cd ) P( CD=false. a |  cd ) P( g |  cd ) P( w |  cd ) • We will use the notation that CD=true is • Whichever probability is the bigger of the two equivalent to cd and CD=false is equivalent above, that is your prediction for CD to  cd. • Because you take the max of the two probabilities above, you can ignore α (since it is the same in both) 11 12 2

Naïve Bayes Classifier The General Case     predict argmax ( |  ) Y P Y v X u X u 1 1 m m Y v . . . X 1 X 2 X m 1. Estimate P(Y=v) as fraction of records with Y=v 2. Estimate P(X i =u | Y=v) as fraction of “Y=v” records that also have X=u. 3. To predict the Y value given observations of all the X i values, compute     predict  Y argmax P ( Y v | X u X u ) 1 1 m m v 13 14 Naïve Bayes Classifier Naïve Bayes Classifier         predict predict argmax ( |  ) argmax ( |  ) Y P Y v X u X u Y P Y v X u X u 1 1 m m 1 1 m m v v         P ( Y v , X u X u ) P ( Y v , X u X u )   predict 1 1 predict 1 1 argmax m m argmax m m Y Y     (  ) (  ) P X u X u P X u X u v v 1 1 1 1 m m m m     (  | ) ( ) P X u X u Y v P Y v  predict 1 1 Y argmax m m   (  ) P X u X u v 1 1 m m 15 16 Naïve Bayes Classifier Naïve Bayes Classifier         predict predict argmax ( |  ) argmax ( |  ) Y P Y v X u X u Y P Y v X u X u 1 1 m m 1 1 m m v v         P ( Y v , X u X u ) P ( Y v , X u X u )   predict 1 1 predict 1 1 argmax m m argmax m m Y Y       P ( X u X u ) P ( X u X u ) v v 1 1 1 1 m m m m           P ( X u X u | Y v ) P ( Y v ) P ( X u X u | Y v ) P ( Y v )   predict 1 1 predict 1 1 argmax m m argmax m m Y Y     (  ) (  ) P X u X u P X u X u v v 1 1 1 1 m m m m           predict predict argmax (  | ) ( ) argmax (  | ) ( ) Y P X u X u Y v P Y v Y P X u X u Y v P Y v 1 1 m m 1 1 m m v v Because of the structure of the Bayes Net m      predict argmax ( ) ( | ) Y P Y v P X u Y v j j  v j 1 17 18 3

Technical Point #1 Technical Point #2 • The probabilities P( X j = u j | Y = v ) can sometimes • When estimating parameters, what happens if you don’t have any records that match a certain be really small combination of features? • This can result in numerical instability since • For example, in our training data, we didn’t have floating point numbers are not represented exactly on any computer architecture M=false, A=false, G=false, W=false • To get around this, use the log of the last line in • This means that P( X j = u j | Y = v ) in the formula the previous slide i.e. below will be 0 and the entire expression will be 0.   m m Even more horrible           predict   Y argmax log( P ( Y v )) log( P ( X u | Y v ) ) P ( Y v ) P ( X u | Y v ) things happen if you j j j j   v   1 1 j j had this expression in log space 19 20 Uniform Dirichlet Priors Example Let 𝑂 𝑘 be the number of values that 𝑌 𝑘 can take on. Monday Assn Grades Win CD    (# records with X u and Y v ) 1    j j true true true false true ( | ) P X u Y v   j j (# records with ) Y v N false true true true false j true false false false false What happens when you have no records with Y = v ? false true false false true true true true false true 1    ( | ) P X u Y v false false true false true j j N true true false true false j This means that each value of 𝑌 𝑘 is equally likely in the absence Compute P(M|CD) using uniform Dirichlet priors of data. If you have a lot of data, it dominates the 1/𝑂 𝑘 value. We call this trick a “uniform Dirichlet prior ”. 21 22 CW: Practice Programming Assignment #3 Monday Assn Grades Win CD You will classify text into two classes. true true true false true false true true true false true false false false false There are two files: false true false false true 1. Training data: trainingSet.txt true true true false true false false true false true 2. Testing data: testSet.txt true true false true false Compute P(W=true|CD=true) using uniform Dirichlet priors 23 24 4

Nave Bayes A special type of Bayesian network Makes a conditional - PDF document

Nave Bayes A special type of Bayesian network Makes a conditional independence CS 331: Artificial Intelligence assumption Nave Bayes Typically used for classification Thanks to Andrew Moore for some course material 1 2

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Formal Modeling in Cognitive Science Independence Lecture 23: Conditional Probability; Bayes

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

BAYES FORMULA a two-stage experiment Xingru Chen xingru.chen.gr@dartmouth.edu XC 2020

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Probabilistic Diagnosis Albert R Meyer, May 3, 2013 Albert R Meyer, May 3, 2013 bayes.1

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes

Bayes meets Dijkstra Exact Inference by Program Verification Joost-Pieter Katoen Dagstuhl

Connected Domina-ng Sets Network Design Fall 2015 Saba Ahmadi Sheng Yang Domina-ng Sets and

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li Time: 6:00pm 8:50pm Thu

Generalised Parsing with Parser Combinators L. Thomas van Binsbergen Royal Holloway, University

The Importance of Being Disconnected: Principal Extension Gauge Theories Antoine Bourget June

Shifting Left With Cloud Native CI/CD QCon San Francisco 2019 @bobcatwilson @tektoncd Christie

CEC Financial PRESENTED BY: LUKE STEEN, BUSINESS Report MANAGER

Governor Littles FY 2021 CEC Recommendation ALEX J. ADAMS ADMINISTRATOR DIVISION OF

CCAs and Energy Resilience RCEAs Airport Microgrid Partnership Presentation for Clean Power