The Bayes Optimal Classifier Machine Learning 1 Most probable - PowerPoint PPT Presentation

The Bayes Optimal Classifier Machine Learning 1

Most probable classification • In Bayesian learning, the primary question is: What is the most probable hypothesis given data? • We can also ask: For a new test point, what is the most probable label, given training data? • Is this the same as the prediction of the maximum a posteriori hypothesis? 2

Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 3

Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 4

Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 6

Bayes Optimal Classifier • How should we use the general formalism? – What should H be? H can be a collection of functions. H can be a collection of possible predictions Given the training data, choose Given the data, try to directly • • an optimal function choose the optimal prediction Then, given new data, evaluate • the selected function on it These two could be different! Selecting a function vs. entertaining all options until the last minute 9

Bayes Optimal Classification Defined as the label produced by the most probable classifier Computing this can be hopelessly inefficient And yet an interesting theoretical concept because, no other classification method can outperform this method on average (using the same hypothesis space and prior knowledge) 12

The Bayes Optimal Classifier Machine Learning 1 Most probable - PowerPoint PPT Presentation

The Bayes Optimal Classifier Machine Learning 1 Most probable classification In Bayesian learning, the primary question is: What is the most probable hypothesis given data? We can also ask: For a new test point, what is the most

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Lecture 8: Maximum a Posteriori (MAP) Nave Bayes Classifier Applications Aykut Erdem

Template Attack vs. Bayes Classifier Stjepan Picek 1 Annelie Heuser 2 Sylvain Guilley 2 1 KU

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs.

Spam Filtering with Naive Bayes Classifier Yuriy Arabskyy June 6, 2017 Table of contents What

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners Minimum description

Machine Learning and Data Mining 2 : Bayes Classifiers Kalev Kask A basic classifier

Data Mining with Weka Class 2 Lesson 1 Be a classifier! Ian H. Witten Department of Computer

Bayesian D-optimal designs for rank-order conjoint choice experiments Bart Vermeulen Katholieke

Multilevel and Multi-index Monte Carlo methods for the McKean-Vlasov equation Abdul-Lateef

Ninomiya-Victoir scheme: strong convergence, antithetic version and application to multilevel

Variance reduction Timo Tiihonen 2014 Variance reduction techniques The most efficient way to

Announcements CS 4100: Artificial Intelligence Homework k 1: Search (lead TA: Iris) Informed

Optimal decentralized control of coupled subsystems with control sharing Aditya Mahajan McGill

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Greedy Huffman Coding Wheeler Ruml

Optimal Approximation of Queries Using Tractable Propositional Languages Robert Fink and Dan