CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick - PowerPoint PPT Presentation

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Outline for “today”  Simple Tuberculosis reminder of Bayes rule and how that relates to decision making  Some basic discussions of what it means to make a good decision and the relation to Bayes  Basic Bayesian decision making  Minimum loss  Application to Normal distributions  Origins of linear classifiers?  Why normals?  Obvious and less obvious

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Special thanks…  Professor Srihari in Buffalo… posted lots of slides…

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick So you go to the doctor…  Assume you go to the doctor because it’s that time of year…  He tells you that you’re overdue for your Tuberculosis test  You take the TB test ( 𝑌 ) and it’s positive !!! ( 𝑌 + )  But then he tells you not to worry because:  The detection rate is 100% 𝑄 𝑌 + 𝑈 + ) = 1 Collectively exhaustive  But the false alarm rate is 5% 𝑄 𝑌 + 𝑈 − ) = 0.05 Mutually exclusive  The incident rate in Atlanta of TB is 0.1% 𝑄 𝑈 + = 0.0 01  Therefore the odds that you have TB given the test are: 𝑄 𝑌 + 𝑈 + 𝑄 𝑈 + 𝑄 𝑌 + 𝑈 + 𝑄 𝑈 +  𝑄 𝑈 + 𝑌 + ) = = 𝑄 𝑌 + | 𝑈 + 𝑄 𝑈 + +𝑄 𝑌 + 𝑈 − 𝑄 ( 𝑈 − ) 𝑄 𝑌 + 1 . 0∗0 . 001  = 1 . 0∗0 . 001+0 . 05∗0 . 999 = 0.0196 (ie 20 times what it was before the test) Bayes rule

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick So…  Q1: if you had to decide right then whether you have TB or not, what would you decide?  Q2: would you go get a chest X-ray?  Why can’t you really answer that question?  Cost of the X-ray?  Cost of having TB and not finding out?  (Prostate cancer treatments….)  So to make the “right” decisions we needed to know:  Prior probabilities 𝑄 𝑈 +  Likelihoods 𝑄 𝑌 + 𝑈 + and 𝑄 𝑌 + 𝑈 −  Cost (loss) functions

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes decision theory  Bayesian theory is fundamental to decision theory and pattern recognition.  Basically the mechanisms by which one can evaluate the probability of being right (and thus wrong).  Allows one to compute an expectation of cost/reward (assuming some very non-ICBM – no infinities - types of loss) But…  It presumes that that a variety of probabilities are known – or at least known about how much they are unknown (Bayes meets Rumsfeld???)  We’ll ignore this concern for now…

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes 1: Priors  We have states of nature 𝜕 𝑗 that are mutually exclusive and collectively exhaustive: i P ω = ∑ ( ) 1 i  Decision rule if only two classes and based only on prior: if P 𝜕 1 > P 𝜕 2 choose class 𝜕 1 otherwise 𝜕 2 .

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes 2: Class conditional probabilities  Need to know the probability of our data (measurements) given the possible states of nature: 𝑞 ( 𝑦 | 𝜕 𝑗 )  These are probability densities as opposed to distribution on the priors. I will definitely confuse this is class.

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayes rule to get data conditioned probability ω ω ( | ) ( ) p x P ω = ( | ) j j P x j ( ) p x = ∑ ω ω ( ) ( | ) ( ) p x p x P where “evidence” j j j  Read “posterior is the likelihood times the prior divided by the evidence”.  And since the “evidence” 𝑞 𝑦 is fixed we can usually ignore that.

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick The posteriors from the division…

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Bayesian decision rule  If 𝑄 𝜕 1 𝑦 > 𝑄 𝜕 2 𝑦 then choose 𝜕 1 since the true state of nature is more likely 𝜕 1 ….  Assuming there is no significant difference between being wrong in one direction or the other.  What is probability of making an error? 𝑄 error 𝑦 = 𝑄 𝜕 1 | 𝑦 when we deicded 𝜕 2 and 𝑄 error 𝑦 = 𝑄 ω 2|x when we decided 𝜕 1 .  So P error 𝑦 = min [ 𝑄 ( 𝜕 1 | 𝑦 ), 𝑄 𝜕 2 | 𝑦 ] (Bayes error)

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Obvious generalizations:  Feature is a vector (no real difference)  More than two classes (as long as ME and CE no problem)  Introduce a general loss function which is more general than just making an error … we’ll do this in a minute…  And you can refuse to give an answer “ I don’t know”. We’ll talk more about that another time.

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Loss functions and minimum risk  Let 𝜕 𝑗 be the possible states of nature.  Let { 𝛽 𝑘 } be the possible actions taken (usually announcing the class so as many actions as classes).  Let 𝜇 𝛽 𝑘 𝜕 𝑗 be the “loss” incurred for taking action j when actual state of nature is i.  Then the expected loss of taking action i when measurement 𝑦 : α = ∑ λ α ω ω ( | ) ( | ) ( | ) R x P x i i j j j  So: select 𝛽 𝑗 with minimum expected loss. That’s what you’re “ risking ”. Bayes risk is the best you can do.

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick LRT – likelihood ratio test  Action 𝛽 𝑗 is to choose i. Cost 𝜇 𝑗𝑘 is cost of choosing i when reality is j. α = λ ω + λ ω ( | ) ( | ) ( | ) R x P x P x  Two risks: 1 11 1 12 2 α = λ ω + λ ω ( | ) ( | ) ( | ) R x P x P x 2 21 1 22 2  Choose 𝛽 1 is it’s risk is lower: λ − λ ω ω > λ − λ ω ω ( ) ( | ) ( ) ( ) ( | ) ( ) p x P p x P 21 11 1 1 12 22 2 2  Which gives a ratio test based on cost and priors: Choose 𝛽 1 if ω λ − λ ω ( | ) ( ) p x P > =  1 12 22 2 T ω λ − λ ω ( | ) ( ) p x P 2 21 11 1

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick A special loss function  Cost 𝜇 𝑗𝑘 is 0 is 𝑗 = 𝑘 , 1 otherwise. Called zero-one loss funciton (duh).  Which gives a ratio test: choose 𝛽 1 if ω ω ( | ) ( ) p x P > 1 2 ω ω ( | ) ( ) p x P 2 1  i.e. choose whichever class is more likely given the data. Which really means you combine likelihoods and priors, and you never separate them. That is, you just have a decision boundary on 𝒚 . That is you just discriminate based upon 𝒚 …

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Introduction to discriminant functions  Let 𝑕 𝑗 � = −𝑆 ( 𝛽 _ 𝑗 | 𝑦 ) (So “max” discriminant function is min risk.)  For minimum error rate (zero one loss): 𝑕 𝑗 � 𝑦 = 𝑄 𝜕 𝑗 𝑦 (max discrimination is max posterior)  Using Bayes rule : 𝑕 𝑗 � 𝑦 ∝ 𝑞 𝑦 𝜕 𝑗 𝑄 𝜕 1  Finally and then monotonicity of ln let: 𝑕 𝑗 𝑦 = ln 𝑞 𝑦 𝜕 𝑗 + ln ( 𝑄 𝜕 1 )

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Two class discrimination  Let 𝑕 𝑦 = 𝑕 1 𝑦 − 𝑕 2 𝑦  Decide class 𝜕 1 if 𝑕 𝑦 > 0 otherwise decide 𝜕 2

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Next time…  Linear discriminants applied to normal distributions.

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition – A. Bobick CS7616 Pattern Recognition – A. Bobick Remember your first assignment!  Due next Tuesday, Jan 14.  Find an available data set that corresponds to “modest” number of features and “small” number of classes  Modest – plausible to try all or many possible subsets of features  Small - maybe less than 5. 2 is ideal. 30 would be too many.  Submit a one page description of the data, how we would get it within a week. (Are you making it? That’s OK)

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick - PowerPoint PPT Presentation

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition A. Bobick CS7616 Pattern Recognition A. Bobick CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing Bayesian

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Pattern Recognition. Bayesian and non-Bayesian Tasks. Petr Po s k This lecture is based

Pattern Recognition Theory Lecture 12 : Correlation Filters Pattern Matching a How to match

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

Statistical Machine Learning Lecture 05: Bayesian Decision Theory Kristian Kersting TU Darmstadt

Decision theory Dr. Jarad Niemi STAT 544 - Iowa State University March 7, 2017 Jarad Niemi

Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer Vision Researcher at Kitware

Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University

Making Decisions Under Uncertainty What an agent should do depends on: The agents ability

Probability and Statistical Decision Theory Many slides attributable to: Prof. Mike Hughes Erik

Introduction to Decision Networks Alice Gao Lecture 13 Based on work by K. Leyton-Brown, K.

SENSATA THIRD QUARTER 2018 EARNINGS PRESENTATION OCTOBER 30, 2018 Forward-Looking Statements and