5. Bayesian decision theory Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSupélec — Fall 2017 5. Bayesian decision theory Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

Practjcal maters... ● I do not grade homework that is sent as .docx ● (Partjal) solutjons to Lab 2 are at the end of the slides of Chap 4.

Learning objectjves Afuer this lecture, you should be able to ● Apply Bayes rule for simple inference and decision problems; ● Explain the connectjon between Bayes decision rule , empirical risk minimizatjon , maximum a priori and maximum likelihood; ● Apply the Naive Bayes algorithm. 3

Let's start by tossing coins... 4

Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 5

Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● We need to model P ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) E.g: a complex physical functjon of the compositjon of the coin, p 0 = # heads / # tosses the force that is applied to it, ● Predictjon of next toss: initjal conditjons, etc. heads if p 0 > 0.5 , tails otherwise 6

Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● We need to model P ? ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) E.g: a complex physical functjon of the compositjon of the coin, p 0 = # heads / # tosses the force that is applied to it, ● Predictjon of next toss: initjal conditjons, etc. heads if p 0 > 0.5 , tails otherwise 7

Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) ? p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 8

Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: ? heads if p 0 > 0.5 , tails otherwise 9

Probability and inference ● Result of tossing a coin: x in {heads, tails} – x = f( z ) z: unobserved variables – Replace f(z) (maybe deterministjc but unknown) with the random variable X in {0, 1} drawn from a probability distributjon P(X=x). ● Bernouilli distributjon ● We do not know P but a sample ● Goal: approximate P (from which X is drawn) p 0 = # heads / # tosses ● Predictjon of next toss: heads if p 0 > 0.5 , tails otherwise 10

Classifjcatjon ● Cat vs. dog – Cat = 1 (positjve) Dog good eater – Dog = 0 (negatjve) Cat – x 1 = human contact – x 2 = good eater ● Predictjon: human contact 11

Bayes rule 12

Reverend Thomas Bayes 170?-1761 … possibly 13

Bayes rule 14

Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 99% ? 90% ? 10% ? 1% ? 15

Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? ? ? 16

Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 ? 17

Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 ? 0.99 0.0001 ? 18

Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 (1-0.99) (1-0.0001) 0.99 0.0001 19

Example: rare disease testjng – test is correct 99% of the tjme – disease prevalence = 1 out of 10,000 What is the probability that a patjent that tested positjve actually has the disease? 0.99 0.0001 (1-0.99) (1-0.0001) 0.99 0.0001 20

Bayes rule prior likelihood posterior evidence Bayes' decision rule: 21

Maximum A Posteriori criterion ● MAP decision rule: – pick the hypothesis that is most probable – i.e. maximize the posterior ? ● Decision rule: If Λ MAP ( x ) > 1 then choose y=1 else choose y=0. 22

Maximum A Posteriori criterion ● MAP decision rule: – pick the hypothesis that is most probable – i.e. maximize the posterior ● Decision rule: If Λ MAP ( x ) > 1 then choose y=1 else choose y=0. 23

Likelihood ratjo test (LRT) p( x ) does not afgect the decision rule. ● Likelihood ratjo test: ? test whether the likelihood ratjo Λ( x ) is larger than decision rule: 24

Likelihood ratjo test (LRT) p( x ) does not afgect the decision rule. ● Likelihood ratjo test: test whether the likelihood ratjo Λ( x ) is larger than decision rule: 25

Example: LRT decision rule ? Assuming the likelihoods below and equal priors, derive a decision rule based on the LRT. 26

● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 C=0 C=1 7 27

● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 Now assume P(y=1) = 2 P(y=0) ? C=0 C=1 7 28

● Likelihood ratjo: ● Simplifying the equatjon and taking the log: ● Equal priors mean we're testjng whether log(LR) > 0 Hence: If x < 7 then assign y=1 else assign y=0 Now assume P(y=1) = 2 P(y=0) x < 7 – log(1/2) ≈ 7.69 y=1 is more likely. C=1 C=0 7.69 29

Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) 1 ● Bayes decision rule seeks to maximize P(x|y=c) and is hence called the Maximum Likelihood criterion – Decision rule: If Λ ML (x) > 1 then choose y=1 else choose y=0 30

Bayes rule for K > 2 ● Bayes rule: ? ? ● ● What is the decision rule? 31

Bayes rule for K > 2 ● Bayes rule: ● ● Decision ? 32

Bayes rule for K > 2 ● Bayes rule: ● ● Decision 33

Risk minimizatjon 34

Losses and risks ● So far we've assumed all errors were equally costly. But misclassfying a cancer sufgerer as a healthy patjent is much more problematjc than the other way around. ● Actjon α k : assigining class c k ● Loss: quantjfy the cost λ kl of taking actjon α k when the true class is c l ● Expected risk: ● Decision (Bayes Classifjer): 35

Discriminant functjons Classifjcatjon = fjnd K discriminant functjons f k s.t. x is assigned class c k if k = argmax f l ( x ) ● Bayes classifjer: 36

Discriminant functjons Classifjcatjon = fjnd K discriminant functjons f k s.t. x is assigned class c k if k = argmax f l ( x ) ● Bayes classifjer: ● Defjnes K decision regions x 2 Sports car Engine power Luxury sedan Family car x 1 Price 37

Bayes risk minimizatjon ● Bayes risk: overall expected risk ● Bayes decision rule: use the discriminant functjons that minimize the Bayes risk. 38

Bayes risk minimizatjon ● Bayes risk: overall expected risk ● Bayes decision rule: use the discriminant functjons that minimize the Bayes risk. ● This is also a LRT. For 2 classes, let us show that Bayes decision rule is equivalent to: ? 39

0/1 Loss ● All misclassifjcatjons are equally costly. ● λ kl = 0 if k=l and 1 otherwise ● Minimizing the risk: – choose the most probable class (MAP) – this is equivalent to the Bayes decision rule. 40

Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) ● Consider the 0/1 loss functjon ? ? 41

Maximum likelihood criterion ● Consider equal priors P(y=1) = P(y=0) ● Consider the 0/1 loss functjon =1 (equal priors) =1 (0/1 loss) 42

5. Bayesian decision theory Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Fall 2017 5. Bayesian decision theory Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters... I do not grade

Bayesian decision theory Andrea Passerini passerini@disi.unitn.it Machine Learning Bayesian

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Bayesian Decision Theory with applications to Experimental Design Robbie Peck University of Bath

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

The Shadow Cost of Bank Capital Requirements Roni Kisin Washington University in St. Louis Asaf

Principled Deep Neural Network Training through Linear Programming Daniel Bienstock 1 , Gonzalo

Stochastic Algorithms in Machine Learning Aymeric DIEULEVEUT EPFL, Lausanne December 1st, 2017

On oracle inequalities related to high dimensional linear models Yuri Golubev CNRS, Universit

Understanding the safety-relevance of visual cue perception at a Surface Manager HMI Lothar

CS 6316 Machine Learning Boosting Yangfeng Ji Department of Computer Science University of

learning to compare using operator-valued large-margin classiers andreas maurer a binary

Lecture 2: Linear Regression Jan 27th 2020 Lecturer: Steven Wu Scribe: Steven Wu A curious