bayesian decision theory
play

Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall - PowerPoint PPT Presentation

Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1 Classification Losses & Risks Discriminant Functions Association Rules Outline


  1. Classification Losses & Risks Discriminant Functions Association Rules Bayesian Decision Theory Steven J Zeil Old Dominion Univ. Fall 2010 1

  2. Classification Losses & Risks Discriminant Functions Association Rules Outline Classification 1 Losses & Risks 2 Discriminant Functions 3 Association Rules 4 2

  3. Classification Losses & Risks Discriminant Functions Association Rules Bernoulli Distribution Random variable X ∈ 0 , 1 0 (1 − p 0 ) (1 − X ) Bernoulli: P { X = 1 } = p X Given a sample X = { x t } N t =1 t x t � we can estimate ˆ p 0 = N 3

  4. Classification Losses & Risks Discriminant Functions Association Rules Classification Input � x = [ x 1 , x 2 ], Output C ∈ { 0 , 1 } Prediction: � C = 1 if P ( C = 1 | � x ) > 0 . 5 choose C = 0 otherwise Equivalently: � C = 1 if P ( C = 1 | � x ) > P ( C = 0 | � x ) choose C = 0 otherwise E.g., Credit scoring inputs are income and savings Output is low-risk versus high-risk 4

  5. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability 5

  6. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? 5

  7. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability 5

  8. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? 5

  9. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood 5

  10. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? 5

  11. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. 5

  12. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. P ( � x ): evidence 5

  13. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule x ) = P ( C ) p ( � x | C ) P ( C | � p ( � x ) P ( C | � x ): posterior probability Given that we have learned something ( � x ), what is the prob that � x is in class C ? P ( C ): prior probability What would we expect for the prob of getting something in C if we had no info about the specific case? P ( � x | C ): likelihood If we knew that an item really was in C , what is the prob that it would have values � x ? In effect, the reverse of what we are trying to find out. P ( � x ): evidence If we ignore the classes, how like are we to see a value � x ? 5

  14. Classification Losses & Risks Discriminant Functions Association Rules Bayes’ Rule - Multiple Classes P ( C i ) p ( � x | C i ) P ( C i | � x ) = p ( � x ) P ( C i ) p ( � x | C i ) = � K k =1 p ( � x | C k ) P ( C k ) P ( C i ) ≥ 0) and � K i =1 P ( C i ) = 1 choose C i if P ( C i | � x ) = max k P ( C k | � x ) 6

  15. Classification Losses & Risks Discriminant Functions Association Rules Unequal Risks In many situations, different actions carry different potential gains and costs. Actions: α i Let λ ik denote the loss incurred by taking action α i when the current state is actually in C k Expected risk of taking action α i : K � R ( α i | � x ) = λ ik P ( C k | � x ) k =1 This is simply the expected value of the loss function given that we have chosen α i Choose α i if R ( α i | � x ) = min k R ( α k | � x ) 7

  16. Classification Losses & Risks Discriminant Functions Association Rules Special Case: Equal Risks � 0 if i = k Suppose λ ik = 1 if i � = k Expected risk of taking action α i : � R ( α i | � x ) = K λ ikP ( C k | � x ) k =1 � = P ( C k | � x ) k � = i = (1 − P ( C i | � x )) Choose α i if R ( α i | � x ) = min k R ( α k | � x ) which happens when P ( C i | � x ) is largest So if all actions have equal cost, choose the action for the most probable class. 8

  17. Classification Losses & Risks Discriminant Functions Association Rules Special Case: Indecision Suppose that making the wrong decision is more expensive than making no decision at all (i.e., falling back to some other procedure) Introduce a special reject action α K +1 that denotes the decision to not select a “real” action Cost of a reject is λ , 0 < λ < 1  0 if i = k  λ ik = λ if i = K + 1 1 if i � = k  9

  18. Classification Losses & Risks Discriminant Functions Association Rules The Risk of Indecision Risk: K � R ( α K +1 | � x ) = λ P ( C k | � x ) k =1 = λ K � R ( α i | � x ) = P ( C k | � x ) k � = i = 1 = P ( C i | � x ) Choose α i if P ( C i | � x ) > P ( C k | � x ) ∀ k � = i and P ( C i | � x ) > 1 − λ otherwise reject all actions 10

  19. Classification Losses & Risks Discriminant Functions Association Rules Discriminant Functions An alternate vision. Instead of searching for the most probable class we seek a set of functions that divide the space into K decision regions R 1 , . . . R K � � R i = � x | g i ( � x ) = max g k ( � x ) k 11

  20. Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. 12

  21. Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution 12

  22. Classification Losses & Risks Discriminant Functions Association Rules Why Discriminants? In general, discriminants are more general because they do not have to lie in a 0 . . . 1 range, not correspond to actual probabilities. Allows us to use them when we have no info of the underlying distribution Later techniques will seek discriminant functions directly. 12

  23. Classification Losses & Risks Discriminant Functions Association Rules Bayes Classifier as Discriminant Functions We can form a discriminant function for the Bayes classifier very simply: g i ( � x ) = − R ( α i | � x ) If we have a constant loss function, we can use g i ( � x ) = P ( C i | � x ) P ( C i ) p ( � x | C i ) = p ( � x ) 13

  24. Classification Losses & Risks Discriminant Functions Association Rules Bayes Classifier as Discriminant Functions (cont.) P ( C i ) p ( � x | C i ) g i ( � x ) = p ( � x ) Because all the g i above would have the same denominator, we could alternatively do: g i ( � x ) = P ( C i ) p ( � x | C i ) 14

  25. Classification Losses & Risks Discriminant Functions Association Rules Association Rules Suppose that we want to learn an association rule X → Y 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend