Classification based on Bayes decision theory Machine Learning - PowerPoint PPT Presentation

Classification based on Bayes decision theory Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 1 / 70

Introduction 1 Bayes decision theory 2 Minimizing the classification error probability Minimizing the average risk Discriminant function and decision surface Bayesian classifiers for Normally distributed classes Minimum distance classifier Bayesian classifiers for independent binary features Supervised learning of the Bayesian classifiers 3 Parametric methods for density estimation Maximum likelihood parameter estimation Bayesian estimation Maximum a posteriori estimation Mixture models for density estimation Nonparametric methods for density estimation Histogram estimator Naive estimator Kernel estimator k − Nearest neighbor estimator k − Nearest neighbor classifier Naive Bayes classifier 4 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 2 / 70

Outline Introduction 1 Bayes decision theory 2 Minimizing the classification error probability Minimizing the average risk Discriminant function and decision surface Bayesian classifiers for Normally distributed classes Minimum distance classifier Bayesian classifiers for independent binary features Supervised learning of the Bayesian classifiers 3 Parametric methods for density estimation Maximum likelihood parameter estimation Bayesian estimation Maximum a posteriori estimation Mixture models for density estimation Nonparametric methods for density estimation Histogram estimator Naive estimator Kernel estimator k − Nearest neighbor estimator k − Nearest neighbor classifier Naive Bayes classifier 4 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 3 / 70

Introduction In classification, the goal is to find a mapping from inputs X to outputs t given a labeled set of input-output pairs S = { ( x 1 , t 1 ) , ( x 2 , t 2 ) , . . . , ( x N , t N ) } . S is called training set. In the simplest setting, each training input x is a D − dimensional vector of numbers. Each component of x is called feature, attribute, or variable and x is called feature vector. The goal is to find a mapping from inputs X to outputs t , where t ∈ { 1 , 2 , . . . , C } with C being the number of classes. When C = 2, the problem is called binary classification. In this case, we often assume that t ∈ {− 1 , +1 } or t ∈ { 0 , 1 } . When C > 2, the problem is called multi-class classification. Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 3 / 70

Introduction (cont.) Bayes theorem P ( X | C k ) P ( C k ) p ( C k | X ) = P ( X ) P ( X | C k ) P ( C k ) = � Y p ( X | C k ) p ( C k ) p ( C k ) is called prior of C k . p ( X | C k ) is called likelihood of data . p ( C k | X ) is called posterior probability. Since p ( X ) is the same for all classes, we can write as p ( C k | X ) P ( X | C k ) P ( C k ) = Approaches for building a classifier. Generative approach: This approach first creates a joint model of the form of p ( x , C k ) and then to condition on x , then deriving p ( C k | x ). Discriminative approach: This approach creates a model of the form of p ( C k | x ) directly. Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 4 / 70

Outline Introduction 1 Bayes decision theory 2 Minimizing the classification error probability Minimizing the average risk Discriminant function and decision surface Bayesian classifiers for Normally distributed classes Minimum distance classifier Bayesian classifiers for independent binary features Supervised learning of the Bayesian classifiers 3 Parametric methods for density estimation Maximum likelihood parameter estimation Bayesian estimation Maximum a posteriori estimation Mixture models for density estimation Nonparametric methods for density estimation Histogram estimator Naive estimator Kernel estimator k − Nearest neighbor estimator k − Nearest neighbor classifier Naive Bayes classifier 4 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 5 / 70

Bayes decision theory Given a classification task of M classes, C 1 , C 2 , . . . , C M , and an input vector x , we can form M conditional probabilities p ( C k | x ) ∀ k = 1 , 2 , . . . , M Without loss of generality, consider two class classification problem. From the Bayes theorem, we have p ( C k | x ) P ( x | C k ) P ( C k ) = The base classification rule is if p ( C 1 | x ) > p ( C 2 | x ) then x is classified to C 1 if p ( C 1 | x ) < p ( C 2 | x ) then x is classified to C 2 if p ( C 1 | x ) = p ( C 2 | x ) then x is classified to either C 1 or C 2 Since p ( x ) is same for all classes, then it can be removed. Hence p ( x | C 1 ) p ( C 1 ) ≶ p ( x | C 2 ) p ( C 2 ) If p ( C 1 ) = p ( C 2 ) = 1 2 , then we have p ( x | C 1 ) ≶ p ( x | C 2 ) Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 5 / 70

Bayes decision theory If p ( C 1 ) = p ( C 2 ) = 1 2 , then we have The coloured region may produce error. The probability of error equals to p ( x ∈ R 1 , C 2 ) + p ( x ∈ R 2 , C 1 ) P e = p ( mistake ) = 1 � p ( x | C 2 ) dx + 1 � = p ( x | C 1 ) dx 2 2 R 1 R 2 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 6 / 70

Outline Introduction 1 Bayes decision theory 2 Minimizing the classification error probability Minimizing the average risk Discriminant function and decision surface Bayesian classifiers for Normally distributed classes Minimum distance classifier Bayesian classifiers for independent binary features Supervised learning of the Bayesian classifiers 3 Parametric methods for density estimation Maximum likelihood parameter estimation Bayesian estimation Maximum a posteriori estimation Mixture models for density estimation Nonparametric methods for density estimation Histogram estimator Naive estimator Kernel estimator k − Nearest neighbor estimator k − Nearest neighbor classifier Naive Bayes classifier 4

Minimizing the classification error probability (cont.) We now show that the Bayesian classifier is optimal with respect to minimizing the classification probability. Let R 1 ( R 2 ) be the region in the feature space in which we decide in favor of C 1 ( C 2 ). Then error is made if x ∈ R ∞ although it belongs to C 2 , or if x ∈ R ∈ but it may belongs to C 1 . That is p ( x ∈ R 2 , C 1 ) + p ( x ∈ R 1 , C 2 ) P e = = p ( x ∈ R 2 | C 1 ) p ( C 2 ) + p ( x ∈ R 1 | C 2 ) p ( C 1 ) � � p ( x ∈ R 2 | C 1 ) + p ( C 1 ) p ( x ∈ R 1 | C 2 ) = p ( C 2 ) R 2 R 1 Since R 1 ∪ R 2 covers all the feature space, from the definition of probability density function, we have � � p ( C 1 | x ) p ( x ) dx + p ( C 1 | x ) p ( x ) dx p ( C 1 ) = R 1 R 2 By combining these two equation, we obtain � P e = p ( C 1 ) − [ p ( C 1 | x ) − p ( C 2 | x )] p ( x ) dx R 1 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 7 / 70

Minimizing the classification error probability (cont.) The probability of error equals to � P e = p ( C 1 ) − [ p ( C 1 | x ) − p ( C 2 | x )] p ( x ) dx R 1 The probability of error is minimized if R 1 is the region of the space in which [ p ( C 1 | x ) − p ( C 2 | x )] > 0 Then R 2 becomes the region where the reverse is true, i.e. is the region of the space in which [ p ( C 1 | x ) − p ( C 2 | x )] < 0 This completes the proof of the Theorem. For classification task with M classes, x is assigned to class C k with the following rule if p ( C k | x ) > p ( C j | x ) ∀ j � = k Show that this rule also minimizes the classification error probability for classification task with M classes. Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 8 / 70

Outline Introduction 1 Bayes decision theory 2 Minimizing the classification error probability Minimizing the average risk Discriminant function and decision surface Bayesian classifiers for Normally distributed classes Minimum distance classifier Bayesian classifiers for independent binary features Supervised learning of the Bayesian classifiers 3 Parametric methods for density estimation Maximum likelihood parameter estimation Bayesian estimation Maximum a posteriori estimation Mixture models for density estimation Nonparametric methods for density estimation Histogram estimator Naive estimator Kernel estimator k − Nearest neighbor estimator k − Nearest neighbor classifier Naive Bayes classifier 4

Classification based on Bayes decision theory Machine Learning - PowerPoint PPT Presentation

Classification based on Bayes decision theory Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Classification based on Bayes decision theory Fall 1393 1 / 70 Introduction

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

STAT 339 Naive Bayes Classification 8-10 March 2017 Colin Reimer Dawson Outline Naive Bayes

Bayes Classifiers Nave Bayes Classification Patrick Mair Bayes Classifiers Weather data

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Bayes Theorem Thomas Bayes (1701-1761) Simple form of Bayes Theorem, for

Overview Decision Theory Classification and Bayes decision rule Sampling vs diagnostic paradigm

Introduction to Machine Learning Classification: Naive Bayes Learning goals 15 Understand the

DATA MINING: NAVE BAYES 1 Nave Bayes Classifier Thomas Bayes 1702 - 1761 We will start off

I ntroduction to Mobile Robotics Bayes Filter Kalm an Filter Wolfram Burgard 1 Bayes

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised

Part 10: Vector Space Classification Francesco Ricci 1 Content p Recap on nave Bayes p

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Decision Trees and Nave Bayes 3/29/17 Hypothesis Spaces Decision Trees and K-Nearest

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Bayesian parameter estimation in predictive engineering Damon McDougall Institute for

Topics in Brain Computer Interfaces Topics in Brain Computer Interfaces CS295- -7 7 CS295

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Probabilistic Fundamentals Probabilistic Fundamentals in Robotics in Robotics Basic Conc e pts in

Autonomous Intelligent Robotics Instructor: Shiqi Zhang

Introduction Maani Ghaffari January 8, 2020 Robotics Systems: How and Why?

Temporal probability models Chapter 15, Sections 15 Chapter 15, Sections 15 1 Outline

CSci 8980: Advanced Topics in Graphical Models Expectation Propagation Instructor: Arindam