5 bayesian decision theory
play

5. Bayesian decision theory Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Fall 2017 5. Bayesian decision theory Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters... I do not grade


  1. Foundatjons of Machine Learning CentraleSupélec — Fall 2017 5. Bayesian decision theory Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

  2. Practjcal maters... ● I do not grade homework that is sent as .docx ● (Partjal) solutjons to Lab 2 are at the end of the slides of Chap 4.

  3. Learning objectjves Afuer this lecture, you should be able to ● Apply Bayes rule for simple inference and decision problems; ● Explain the connectjon between Bayes decision rule , empirical risk minimizatjon , maximum a priori and maximum likelihood; ● Apply the Naive Bayes algorithm. 3

  4. Let's start by tossing coins... 4

  5. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 5

  6. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● We need to model P ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) E.g: a complex physical functjon of the compositjon of the coin, p 0 = # heads / # tosses the force that is applied to it, ● Predictjon of next toss: initjal conditjons, etc. heads if p 0 > 0.5 , tails otherwise 6

  7. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● We need to model P ? ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) E.g: a complex physical functjon of the compositjon of the coin, p 0 = # heads / # tosses the force that is applied to it, ● Predictjon of next toss: initjal conditjons, etc. heads if p 0 > 0.5 , tails otherwise 7

  8. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) ? p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 8

  9. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: ? heads if p 0 > 0.5 , tails otherwise 9

  10. Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 10

  11. Classifjcatjon ● Cat vs. dog – Cat = 1 (positjve) Dog good eater – Dog = 0 (negatjve) Cat – x 1 = human contact – x 2 = good eater ● Predictjon: human contact 11

  12. Bayes rule 12

  13. Reverend Thomas Bayes 170?-1761 … possibly 13

  14. Bayes rule 14

  15. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 99% ? 90% ? 10% ? 1% ? 15

  16. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? ? ? 16

  17. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 ? 17

  18. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 ? 0.99 0.0001 ? 18

  19. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 (1-0.99) (1-0.0001) 0.99 0.0001 19

  20. Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 (1-0.99) (1-0.0001) 0.99 0.0001 20

  21. Bayes rule prior likelihood posterior evidence Bayes' decision rule: 21

  22. Maximum A Posteriori criterion ● MAP decision rule: – pick the hypothesis that is most probable – i.e. maximize the posterior ? ● Decision rule: If Λ MAP ( x ) > 1 then choose y=1 else choose y=0. 22

  23. Maximum A Posteriori criterion ● MAP decision rule: – pick the hypothesis that is most probable – i.e. maximize the posterior ● Decision rule: If Λ MAP ( x ) > 1 then choose y=1 else choose y=0. 23

  24. Likelihood ratjo test (LRT) p( x ) does not afgect the decision rule. ● Likelihood ratjo test: ? test whether the likelihood ratjo Λ( x ) is larger than decision rule: 24

  25. Likelihood ratjo test (LRT) p( x ) does not afgect the decision rule. ● Likelihood ratjo test: test whether the likelihood ratjo Λ( x ) is larger than decision rule: 25

  26. Example: LRT decision rule ? Assuming the likelihoods below and equal priors, derive a decision rule based on the LRT. 26

  27. ● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 C=0 C=1 7 27

  28. ● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 Now assume P(y=1) = 2 P(y=0) ? C=0 C=1 7 28

  29. ● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 Now assume P(y=1) = 2 P(y=0) x < 7 – log(1/2) ≈ 7.69 y=1 is more likely. C=1 C=0 7.69 29

  30. Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) 1 ● Bayes decision rule seeks to maximize P(x|y=c) and is hence called the Maximum Likelihood criterion – Decision rule: If Λ ML (x) > 1 then choose y=1 else choose y=0 30

  31. Bayes rule for K > 2 ● Bayes rule: ? ? ● ● What is the decision rule? 31

  32. Bayes rule for K > 2 ● Bayes rule: ● ● Decision ? 32

  33. Bayes rule for K > 2 ● Bayes rule: ● ● Decision 33

  34. Risk minimizatjon 34

  35. Losses and risks ● So far we've assumed all errors were equally costly. But misclassfying a cancer sufgerer as a healthy patjent is much more problematjc than the other way around. ● Actjon α k : assigining class c k ● Loss: quantjfy the cost λ kl of taking actjon α k when the true class is c l ● Expected risk: ● Decision (Bayes Classifjer): 35

  36. Discriminant functjons Classifjcatjon = fjnd K discriminant functjons f k s.t. x is assigned class c k if k = argmax f l ( x ) ● Bayes classifjer: 36

  37. Discriminant functjons Classifjcatjon = fjnd K discriminant functjons f k s.t. x is assigned class c k if k = argmax f l ( x ) ● Bayes classifjer: ● Defjnes K decision regions x 2 Sports car Engine power Luxury sedan Family car x 1 Price 37

  38. Bayes risk minimizatjon ● Bayes risk: overall expected risk ● Bayes decision rule: use the discriminant functjons that minimize the Bayes risk. 38

  39. Bayes risk minimizatjon ● Bayes risk: overall expected risk ● Bayes decision rule: use the discriminant functjons that minimize the Bayes risk. ● This is also a LRT. For 2 classes, let us show that Bayes decision rule is equivalent to: ? 39

  40. 0/1 Loss ● All misclassifjcatjons are equally costly. ● λ kl = 0 if k=l and 1 otherwise ● Minimizing the risk: – choose the most probable class (MAP) – this is equivalent to the Bayes decision rule. 40

  41. Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) ● Consider the 0/1 loss functjon ? ? 41

  42. Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) ● Consider the 0/1 loss functjon =1 (equal priors) =1 (0/1 loss) 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend