Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer - PowerPoint PPT Presentation

Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

Recap Previous Lecture 2 C. Long Lecture 3 January 28, 2018

Outline What's Beyesian Decision Theory? • A More General Theory • Discriminant Function and Decision Boundary • Multivariate Gaussian Density • 3 C. Long Lecture 3 January 28, 2018

Bayesian Decision Theory Design classifiers to make decisions subject to • minimizing an expected " risk " . The simplest risk is the classification error ( i . e . , • assuming that misclassification costs are equal ). When misclassification costs are not equal , the risk • can include the cost associated with different misclassifications . 5 C. Long Lecture 3 January 28, 2018

Terminology State of nature ω ( class label ): • e . g ., ω 1 for sea bass , ω 2 for salmon • Probabilities P ( ω 1 ) and P ( ω 2 ) ( priors ): • e . g ., prior knowledge of how likely is to get a sea • bass or a salmon Probability density function p ( x ) ( evidence ): • e . g ., how frequently we will measure a pattern with • feature value x ( e . g ., x corresponds to lightness ) 6 C. Long Lecture 3 January 28, 2018

Terminology Conditional probability density p ( x / ω j ) ( likelihood ) : • e . g ., how frequently we will measure a pattern with • feature value x given that the pattern belongs to class ω j 7 C. Long Lecture 3 January 28, 2018

Terminology Conditional probability P ( ω j / x ) ( posterior ) : • e . g ., the probability that the fish belongs to • class ω j given feature x . Ultimately , we are interested in computing P ( ω j / x ) • for each class ω j . 8 C. Long Lecture 3 January 28, 2018

Decision Rule or Favours the most likely class . • This rule will be making the same decision all times . • i . e ., optimum if no other information is available • 9 C. Long Lecture 3 January 28, 2018

Decision Rule Using Bayes’ rule : • w w p x ( / ) ( P ) ´ likelihood prior w = j j = P ( / ) x j p x ( ) evidence where = å 2 w w p x ( ) p x ( / ) ( P ) j j = j 1 Decide ω 1 if P(ω 1 /x) > P(ω 2 /x); otherwise decide ω 2 or Decide ω 1 if p(x/ω 1 )P(ω 1 )>p(x/ω 2 )P(ω 2 ); otherwise decide ω 2 or Decide ω 1 if p(x/ω 1 )/p(x/ω 2 ) >P(ω 2 )/P(ω 1 ) ; otherwise decide ω 2 10 C. Long Lecture 3 January 28, 2018

Decision Rule p(x/ω j ) 2 1 P(ω j /x) w = w = P ( ) P ( ) 1 2 3 3 11 C. Long Lecture 3 January 28, 2018

Probability of Error The probability of error is defined as : • or What is the average probability error ? • The Bayes rule is optimum , that is , it minimizes • the average probability error ! 12 C. Long Lecture 3 January 28, 2018

Where do Probabilities come from? There are two competitive answers : •  Relative frequency ( objective ) approach . Probabilities can only come from experiments .   Bayesian ( subjective ) approach . Probabilities may reflect degree of belief and can be  based on opinion . 13 C. Long Lecture 3 January 28, 2018

Example: Objective approach Classify cars whether they are more or less than • $ 50 K :  Classes : C 1 if price > 50 K , C 2 if price < = 50 K  Features : x , the height of a car Use the Bayes’ rule to compute the posterior • probabilities : p x C P C ( / ) ( ) = P C ( / ) x i i i p x ( ) We need to estimate p ( x / C 1 ), p ( x / C 2 ), P ( C 1 ), P ( C 2 ) • 14 C. Long Lecture 3 January 28, 2018

Example: Objective approach Collect data •  Ask drivers how much their car was and measure height . Determine prior probabilities P ( C 1 ), P ( C 2 ) •  e . g ., 1209 samples : #C 1 =221 #C 2 =988 221 = = P C ( ) 0.183 1 1209 988 = = P C ( ) 0.817 2 1209 15 C. Long Lecture 3 January 28, 2018

Example: Objective approach Determine class conditional probabilities ( likelihood ) •  Discretize car height into bins and use normalized histogram p x C ( / ) i  Calculate the posterior probability for each bin : = p x ( 1.0/ C P C ) ( ) = = = P C ( / x 1.0) 1 1 1 = + = p x ( 1.0/ C P C ) ( ) p x ( 1.0/ C P C ) ( ) 1 1 2 2 0.2081*0.183 = = 0.438 + 0.2081*0.183 0.0597*0.817 16 C. Long Lecture 3 January 28, 2018

A More General Theory  Use more than one features .  Allow more than two categories .  Allow actions other than classifying the input to one of the possible categories ( e . g ., rejection ).  Employ a more general error function ( i . e ., expected “risk” ) by associating a “cost” ( based on a “loss” function ) with different errors . 18 C. Long Lecture 3 January 28, 2018

Terminology Î d Features form a vector x R • A set of c categories ω 1 , ω 2, … , ω c • A finite set of l actions α 1, α 2, … , α l • A loss function λ ( α i / ω j ) •  the cost associated with taking action α i when the correct classification category is ω j 19 C. Long Lecture 3 January 28, 2018

Conditional Risk (or Expected Loss) Suppose we observe x and take action α i • The conditional risk ( or expected loss ) with taking • action α i is defined as : = å c l w w R a ( / ) x ( a / ) ( P / ) x i i j j = j 1 From a medical image, we want to classify (determine) whether it contains cancer tissues or not. 20 C. Long Lecture 3 January 28, 2018

Overall Risk Suppose α ( x ) is a general decision rule that • determines which action α 1, α 2, … , α l to take for every x . The overall risk is defined as : • = ò R R a ( ( ) / ) ( ) x x p x d x The optimum decision rule is the Bayes rule • 21 C. Long Lecture 3 January 28, 2018

Overall Risk The Bayes rule minimizes R by : • ( i ) Computing R ( α i / x ) for every α i given an x ( ii ) Choosing the action α i with the minimum R ( α i / x ) The resulting minimum R * is called Bayes risk and • is the best ( i . e ., optimum ) performance that can be achieved : = * R min R 22 C. Long Lecture 3 January 28, 2018

Example: Two-category classification Define •  α 1 : decide ω 1  α 2: decide ω 2  λ ij = λ ( α i / ω j ) The conditional risks are : • = å c l w w R a ( / ) x ( a / ) ( P / ) x i i j j = j 1 23 C. Long Lecture 3 January 28, 2018

Example: Two-category classification Minimum risk decision rule : • or or ( i . e ., using likelihood ratio ) likelihood ratio threshold 24 C. Long Lecture 3 January 28, 2018

Special Case: Zero-One Loss Function Assign the same loss to all errors : • The conditional risk corresponding to this loss function : • 25 C. Long Lecture 3 January 28, 2018

Special Case: Zero-One Loss Function The decision rule becomes : • or or The overall risk turns out to be the average probability • error ! 26 C. Long Lecture 3 January 28, 2018

Example Assuming general loss: • Assuming zero - one loss : • Decide ω 1 if p(x/ω 1 )/p(x/ω 2 )>P(ω 2 )/P(ω 1 ) otherwise decide ω 2 q = w w P ( )/ P ( ) a 2 1 w l - l P ( )( ) q = 2 12 22 b w l - l P ( )( ) 1 21 11 l > l assume : 12 21 27 C. Long Lecture 3 January 28, 2018

Outline What's Beyesian Decision Theory? • A More General Theory • Discriminant Function and Decision Boundary • Multivariate Gaussian Density • Error Bound, ROC, Missing Features and Compound • Bayesian Decision Theory Summary • 28 C. Long Lecture 3 January 28, 2018

Discriminant Functions A useful way to represent a classifier is through • discriminant functions g i (x), i = 1, . . . , c, where a feature vector x is assigned to class ω i if g i ( x ) > g j ( x ) for all j i max 29 C. Long Lecture 3 January 28, 2018

Discriminants for Bayes Classifier Is the choice of g i unique ? •  Replacing g i ( x ) with f ( g i ( x )), where f () is monotonically increasing , does not change the classification results . w w p ( / x ) ( P ) = i i g ( ) x i p ( ) x g i ( x )= P ( ω i / x ) = w w g ( ) x p ( / x ) ( P ) i i i = w + w g ( ) x ln p ( / x ) ln P ( ) i i i we’ll use this discriminant extensively ! 30 C. Long Lecture 3 January 28, 2018

Case of two categories More common to use a single discriminant function • ( dichotomizer ) instead of two : Examples : = w - w g ( ) x P ( / ) x P ( / ) x 1 2 w w p ( / x ) P ( ) = + g ( ) x ln 1 ln 1 w w p ( / x ) P ( ) 2 2 31 C. Long Lecture 3 January 28, 2018

Decision Regions and Boundaries Discriminants divide the feature space in decision regions • R 1 , R 2, … , R c , separated by decision boundaries . Decision boundary is defined by : g 1 (x)=g 2 (x) 32 C. Long Lecture 3 January 28, 2018

Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer - PowerPoint PPT Presentation

Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 3 January 28, 2018 Outline What's Beyesian

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian Decision Theory with applications to Experimental Design Robbie Peck University of Bath

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

De Decision cision Th Theo eory: ry: Se Sequ quential ential De Decisions cisions Co

De Decision cision Th Theo eory: ry: Si Singl ngle e St Stag age e De Decisions

Decision theory Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2017 Jarad Niemi

Statistical Machine Learning Lecture 05: Bayesian Decision Theory Kristian Kersting TU Darmstadt

Making Decisions Under Uncertainty What an agent should do depends on: The agents ability

Probability and Statistical Decision Theory Many slides attributable to: Prof. Mike Hughes Erik

Introduction to Decision Networks Alice Gao Lecture 13 Based on work by K. Leyton-Brown, K.

SENSATA THIRD QUARTER 2018 EARNINGS PRESENTATION OCTOBER 30, 2018 Forward-Looking Statements and