lecture 3 bayesian decision theory
play

Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer - PowerPoint PPT Presentation

Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 3 January 28, 2018 Outline What's Beyesian


  1. Lecture 3: Bayesian Decision Theory Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

  2. Recap Previous Lecture 2 C. Long Lecture 3 January 28, 2018

  3. Outline What's Beyesian Decision Theory? • A More General Theory • Discriminant Function and Decision Boundary • Multivariate Gaussian Density • 3 C. Long Lecture 3 January 28, 2018

  4. Outline What's Beyesian Decision Theory? • A More General Theory • Discriminant Function and Decision Boundary • Multivariate Gaussian Density • 4 C. Long Lecture 3 January 28, 2018

  5. Bayesian Decision Theory Design classifiers to make decisions subject to • minimizing an expected " risk " . The simplest risk is the classification error ( i . e . , • assuming that misclassification costs are equal ). When misclassification costs are not equal , the risk • can include the cost associated with different misclassifications . 5 C. Long Lecture 3 January 28, 2018

  6. Terminology State of nature ω ( class label ): • e . g ., ω 1 for sea bass , ω 2 for salmon • Probabilities P ( ω 1 ) and P ( ω 2 ) ( priors ): • e . g ., prior knowledge of how likely is to get a sea • bass or a salmon Probability density function p ( x ) ( evidence ): • e . g ., how frequently we will measure a pattern with • feature value x ( e . g ., x corresponds to lightness ) 6 C. Long Lecture 3 January 28, 2018

  7. Terminology Conditional probability density p ( x / ω j ) ( likelihood ) : • e . g ., how frequently we will measure a pattern with • feature value x given that the pattern belongs to class ω j 7 C. Long Lecture 3 January 28, 2018

  8. Terminology Conditional probability P ( ω j / x ) ( posterior ) : • e . g ., the probability that the fish belongs to • class ω j given feature x . Ultimately , we are interested in computing P ( ω j / x ) • for each class ω j . 8 C. Long Lecture 3 January 28, 2018

  9. Decision Rule or Favours the most likely class . • This rule will be making the same decision all times . • i . e ., optimum if no other information is available • 9 C. Long Lecture 3 January 28, 2018

  10. Decision Rule Using Bayes’ rule : • w w p x ( / ) ( P ) ´ likelihood prior w = j j = P ( / ) x j p x ( ) evidence where = å 2 w w p x ( ) p x ( / ) ( P ) j j = j 1 Decide ω 1 if P(ω 1 /x) > P(ω 2 /x); otherwise decide ω 2 or Decide ω 1 if p(x/ω 1 )P(ω 1 )>p(x/ω 2 )P(ω 2 ); otherwise decide ω 2 or Decide ω 1 if p(x/ω 1 )/p(x/ω 2 ) >P(ω 2 )/P(ω 1 ) ; otherwise decide ω 2 10 C. Long Lecture 3 January 28, 2018

  11. Decision Rule p(x/ω j ) 2 1 P(ω j /x) w = w = P ( ) P ( ) 1 2 3 3 11 C. Long Lecture 3 January 28, 2018

  12. Probability of Error The probability of error is defined as : • or What is the average probability error ? • The Bayes rule is optimum , that is , it minimizes • the average probability error ! 12 C. Long Lecture 3 January 28, 2018

  13. Where do Probabilities come from? There are two competitive answers : •  Relative frequency ( objective ) approach . Probabilities can only come from experiments .   Bayesian ( subjective ) approach . Probabilities may reflect degree of belief and can be  based on opinion . 13 C. Long Lecture 3 January 28, 2018

  14. Example: Objective approach Classify cars whether they are more or less than • $ 50 K :  Classes : C 1 if price > 50 K , C 2 if price < = 50 K  Features : x , the height of a car Use the Bayes’ rule to compute the posterior • probabilities : p x C P C ( / ) ( ) = P C ( / ) x i i i p x ( ) We need to estimate p ( x / C 1 ), p ( x / C 2 ), P ( C 1 ), P ( C 2 ) • 14 C. Long Lecture 3 January 28, 2018

  15. Example: Objective approach Collect data •  Ask drivers how much their car was and measure height . Determine prior probabilities P ( C 1 ), P ( C 2 ) •  e . g ., 1209 samples : #C 1 =221 #C 2 =988 221 = = P C ( ) 0.183 1 1209 988 = = P C ( ) 0.817 2 1209 15 C. Long Lecture 3 January 28, 2018

  16. Example: Objective approach Determine class conditional probabilities ( likelihood ) •  Discretize car height into bins and use normalized histogram p x C ( / ) i  Calculate the posterior probability for each bin : = p x ( 1.0/ C P C ) ( ) = = = P C ( / x 1.0) 1 1 1 = + = p x ( 1.0/ C P C ) ( ) p x ( 1.0/ C P C ) ( ) 1 1 2 2 0.2081*0.183 = = 0.438 + 0.2081*0.183 0.0597*0.817 16 C. Long Lecture 3 January 28, 2018

  17. Outline What's Beyesian Decision Theory? • A More General Theory • Discriminant Function and Decision Boundary • Multivariate Gaussian Density • 17 C. Long Lecture 3 January 28, 2018

  18. A More General Theory  Use more than one features .  Allow more than two categories .  Allow actions other than classifying the input to one of the possible categories ( e . g ., rejection ).  Employ a more general error function ( i . e ., expected “risk” ) by associating a “cost” ( based on a “loss” function ) with different errors . 18 C. Long Lecture 3 January 28, 2018

  19. Terminology Î d Features form a vector x R • A set of c categories ω 1 , ω 2, … , ω c • A finite set of l actions α 1, α 2, … , α l • A loss function λ ( α i / ω j ) •  the cost associated with taking action α i when the correct classification category is ω j 19 C. Long Lecture 3 January 28, 2018

  20. Conditional Risk (or Expected Loss) Suppose we observe x and take action α i • The conditional risk ( or expected loss ) with taking • action α i is defined as : = å c l w w R a ( / ) x ( a / ) ( P / ) x i i j j = j 1 From a medical image, we want to classify (determine) whether it contains cancer tissues or not. 20 C. Long Lecture 3 January 28, 2018

  21. Overall Risk Suppose α ( x ) is a general decision rule that • determines which action α 1, α 2, … , α l to take for every x . The overall risk is defined as : • = ò R R a ( ( ) / ) ( ) x x p x d x The optimum decision rule is the Bayes rule • 21 C. Long Lecture 3 January 28, 2018

  22. Overall Risk The Bayes rule minimizes R by : • ( i ) Computing R ( α i / x ) for every α i given an x ( ii ) Choosing the action α i with the minimum R ( α i / x ) The resulting minimum R * is called Bayes risk and • is the best ( i . e ., optimum ) performance that can be achieved : = * R min R 22 C. Long Lecture 3 January 28, 2018

  23. Example: Two-category classification Define •  α 1 : decide ω 1  α 2: decide ω 2  λ ij = λ ( α i / ω j ) The conditional risks are : • = å c l w w R a ( / ) x ( a / ) ( P / ) x i i j j = j 1 23 C. Long Lecture 3 January 28, 2018

  24. Example: Two-category classification Minimum risk decision rule : • or or ( i . e ., using likelihood ratio ) likelihood ratio threshold 24 C. Long Lecture 3 January 28, 2018

  25. Special Case: Zero-One Loss Function Assign the same loss to all errors : • The conditional risk corresponding to this loss function : • 25 C. Long Lecture 3 January 28, 2018

  26. Special Case: Zero-One Loss Function The decision rule becomes : • or or The overall risk turns out to be the average probability • error ! 26 C. Long Lecture 3 January 28, 2018

  27. Example Assuming general loss: • Assuming zero - one loss : • Decide ω 1 if p(x/ω 1 )/p(x/ω 2 )>P(ω 2 )/P(ω 1 ) otherwise decide ω 2 q = w w P ( )/ P ( ) a 2 1 w l - l P ( )( ) q = 2 12 22 b w l - l P ( )( ) 1 21 11 l > l assume : 12 21 27 C. Long Lecture 3 January 28, 2018

  28. Outline What's Beyesian Decision Theory? • A More General Theory • Discriminant Function and Decision Boundary • Multivariate Gaussian Density • Error Bound, ROC, Missing Features and Compound • Bayesian Decision Theory Summary • 28 C. Long Lecture 3 January 28, 2018

  29. Discriminant Functions A useful way to represent a classifier is through • discriminant functions g i (x), i = 1, . . . , c, where a feature vector x is assigned to class ω i if g i ( x ) > g j ( x ) for all j i max 29 C. Long Lecture 3 January 28, 2018

  30. Discriminants for Bayes Classifier Is the choice of g i unique ? •  Replacing g i ( x ) with f ( g i ( x )), where f () is monotonically increasing , does not change the classification results . w w p ( / x ) ( P ) = i i g ( ) x i p ( ) x g i ( x )= P ( ω i / x ) = w w g ( ) x p ( / x ) ( P ) i i i = w + w g ( ) x ln p ( / x ) ln P ( ) i i i we’ll use this discriminant extensively ! 30 C. Long Lecture 3 January 28, 2018

  31. Case of two categories More common to use a single discriminant function • ( dichotomizer ) instead of two : Examples : = w - w g ( ) x P ( / ) x P ( / ) x 1 2 w w p ( / x ) P ( ) = + g ( ) x ln 1 ln 1 w w p ( / x ) P ( ) 2 2 31 C. Long Lecture 3 January 28, 2018

  32. Decision Regions and Boundaries Discriminants divide the feature space in decision regions • R 1 , R 2, … , R c , separated by decision boundaries . Decision boundary is defined by : g 1 (x)=g 2 (x) 32 C. Long Lecture 3 January 28, 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend