Decision Making Probabilistic model Known Unknown Bayes Decision - PowerPoint PPT Presentation

Bayesian Decision Theory Chapter 2 (Jan 11, 18, 23, 25) • Bayes decision theory is a fundamental statistical approach to pattern classification • Assumption: decision problem posed in probabilistic terms and relevant probability values are known

Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised Theory Learning Learning (Chapter 2) Parametric Nonparametric Parametric Nonparametric Approach Approach Approach Approach (Chapter 3) (Chapter 4, 6) (Chapter 10) (Chapter 10) “Optimal” Plug-in Density K-NN, neural Mixture Cluster Analysis Rules Rules Estimation networks models

Sea bass v. Salmon Classification • Each fish appearing on the conveyor belt is either sea bass or salmon; two “states of nature” • Let  denote the state of nature:  1 = sea bass and  2 = salmon;  is a random variable that must be described probabilistically • a priori (prior) probability: P (  1 ) and P(  2 ); P (  1 ) is the probability next fish observed is a sea bass • If no other types of fish are present then • P(  1 ) + P(  2 ) = 1 (exclusivity and exhaustivity) • P(  1 ) = P(  2 ) (uniform priors) • Prior prob. reflects our prior knowledge about how likely we are to observe a sea bass or salmon; prior prob. may depend on time of the year or the fishing area!

• Case 1: Suppose we are asked to make a decision without observing the fish. We only have prior information • Bayes decision rule given only prior information • Decide  1 if P(  1 ) > P(  2 ), otherwise decide  2 • Error rate = Min {P(  1 ) , P(  2 )} • Suppose now we are allowed to measure a feature on the state of nature - say the fish lightness value • Define class-conditional probability density function (pdf) of feature x; x is a r.v. • P(x |  i ) is the prob. of x given class  i , i =1, 2. P(x |  i )>= 0 and area under the pdf is 1.

Less the densities overlap, better the feature

• Case 2: Suppose we only have class-conditional densities and no prior information • Maximum likelihood decision rule • Assign input pattern x to class  1 if P(x |  1 ) > P(x |  2 ), otherwise  2 • P(x |  1 ) is also the likelihood of class  1 given the feature value x • Case 3: We have both prior densities and class- conditional densities • How does the feature x influence our attitude (prior) concerning the true state of nature? • Bayes decision rule

• Posteriori prob. is a function of likelihood & prior • Joint density: P(  j , x) = P(  j | x)p (x) = p(x |  j ) P (  j ) • Bayes rule P(  j | x) = {p(x |  j ) . P (  j )} / p(x), j = 1,2  2 j  where    ( ) ( | ) ( ) P x P x j P j  1 j • Posterior = (Likelihood x Prior) / Evidence • Evidence P(x) can be viewed as a scale factor that guarantees that the posterior probabilities sum to 1 • P(x) is also called the unconditional density of feature x

• P(  1 | x) is the probability of the state of nature being  1 given that feature value x has been observed • Decision based on the posterior probabilities is called the “Optimal” Bayes Decision rule. What does optimal mean? For a given observation (feature value) X: if P(  1 | x) > P(  2 | x) decide  1 if P(  1 | x) < P(  2 | x) decide  2 To justify the above rule, calculate the probability of error: P(error | x) = P(  1 | x) if we decide  2 P(error | x) = P(  2 | x) if we decide  1

• So, for a given x, we can minimize the prob. of error by deciding  1 if P(  1 | x) > P(  2 | x); otherwise decide  2 Therefore: P(error | x) = min [P(  1 | x), P(  2 | x)] • For each observation x, Bayes decision rule minimizes the probability of error • Unconditional error: P(error) obtained by integration over all possible observed x w.r.t. p(x)

• Optimal Bayes decision rule Decide  1 if P(  1 | x) > P(  2 | x); otherwise decide  2 • Special cases: (i) P(  1 ) = P(  2 ); Decide  1 if p(x |  1 ) > p(x |  2 ), otherwise  2 (ii) p(x |  1 ) = p(x |  2 ); Decide  1 if P(  1 ) > P(  2 ), otherwise  2

Bayesian Decision Theory – Continuous Features • Generalization of the preceding formulation • Use of more than one feature (d features) • Use of more than two states of nature (c classes) • Allowing actions other than deciding on the state of nature • Introduce a “loss function”; minimizing the “risk” is more general than minimizing the probability of error

• Allowing actions other than classification primarily allows the possibility of “rejection” • Rejection: Input pattern is rejected when it is difficult to decide between two classes or the pattern is too noisy! • The loss function specifies the cost of each action

• Let {  1 ,  2 ,…,  c } be the set of c states of nature (or “categories” or “classes”) • Let {  1 ,  2 ,…,  a } be the set of a possible actions that can be taken for an input pattern x • Let  (  i |  j ) be the loss incurred for taking action  i when the true state of nature is  j • Decision rule:  (x) specifies which action to take for every possible observation x

 j c        R ( | x ) ( | ) P ( | x ) Conditional Risk i i j j  j 1 For a given x, suppose we take the action  i • If the true state is  j , we will incur the loss  (  i |  j ) • P(  j | x) is the prob. that the true state is  j • But, any one of the C states is possible for given x Overall risk R = Expected value of R(  i | x) w.r.t. p(x) Conditional risk Minimizing R Minimize R(  i | x) for i = 1,…, a

Select the action  i for which R(  i | x) is minimum • This action minimizes the overall risk • The resulting risk is called the Bayes risk • It is the best classification performance that can be achieved given the priors, class-conditional densities and the loss function!

• Two-category classification  1 : decide  1  2 : decide  2  ij =  (  i |  j ); loss incurred in deciding  i when the true state of nature is  j Conditional risk: R(  1 | x) =  11 P(  1 | x) +  12 P(  2 | x) R(  2 | x) =  21 P(  1 | x) +  22 P(  2 | x)

Bayes decision rule is stated as: if R(  1 | x) < R(  2 | x) Take action  1 : “decide  1 ” This rule is equivalent to: decide  1 if: {(  21 -  11 ) P(x |  1 ) P(  1 )} > {(  12 -  22 ) P(x |  2 ) P(  2 )}; decide  2 otherwise

In terms of the Likelihood Ratio (LR), the preceding rule is equivalent to the following rule:     P ( x | ) P ( )   12 22 2 1 if .      P ( x | ) P ( ) 2 21 11 1 then take action  1 (decide  1 ); otherwise take action  2 (decide  2 ) The “threshold” term on the right hand side now involves the prior and the loss function

Interpretation of the Bayes decision rule: If the likelihood ratio of class  1 and class  2 exceeds a threshold value (independent of the input pattern x), the optimal action is: decide  1 Maximum likelihood decision rule is a special case of minimum risk decision rule: • Threshold value = 1 • 0-1 loss function • Equal class prior probability

Bayesian Decision Theory (Sections 2.3-2.5) • Minimum Error Rate Classification • Classifiers, Discriminant Functions and Decision Surfaces • Multivariate Normal (Gaussian) Density

Minimum Error Rate Classification • Actions are decisions on classes If action  i is taken and the true state of nature is  j then: the decision is correct if i = j and in error if i  j • Seek a decision rule that minimizes the probability of error or the error rate

• Zero-one (0-1) loss function: no loss for correct decision and a unit loss for incorrect decision   0 i j       ( , ) i , j 1 ,..., c  i j  1 i j The conditional risk can now be simplified as:  j c        R ( | x ) ( | ) P ( | x ) i i j j  j 1       P ( | x ) 1 P ( | x ) j i  j 1 “The risk corresponding to the 0 -1 loss function is the average probability of error”

• Minimizing the risk under 0-1 loss function requires maximizing the posterior probability P(  i | x) since R(  i | x) = 1 – P(  i | x) ) • For Minimum error rate • Decide  i if P (  i | x) > P(  j | x)  j  i

• Decision boundaries and decision regions      P ( ) P ( x | )      12 22 2 1 Let . then decide if :        1 P ( ) P ( x | ) 21 11 1 2 • If  is the 0-1 loss function then the threshold involves only the priors:   0 1         1 0  P ( )     2 then   a P ( ) 1    0 2 2 P ( )         2 if then     b   1 0 P ( ) 1

Decision Making Probabilistic model Known Unknown Bayes Decision - PowerPoint PPT Presentation

Bayesian Decision Theory Chapter 2 (Jan 11, 18, 23, 25) Bayes decision theory is a fundamental statistical approach to pattern classification Assumption: decision problem posed in probabilistic terms and relevant probability values are

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

DECISION MAKING readysetpresent.com Decision Making Program Objectives ( 1 of 2 ) To examine

Decision Making 1 Decision Making Skills Establishing a positive decision-making environment.

Decision Making Under Decision Making . . . General Set Uncertainty: Proof of This Result

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

S C DECISION E N C E decision science SDS CMU What is Decision Science? Behavioral

Supported Decision-Making in Wisconsin Today we will talk about: The concept of Supported

MI MI and Shared MI MI and Shared and Shared Decision Making and Shared Decision Making

Overview of Robot Decision Making Prof. Yuke Zhu Fall 2020 CS391R: Robot Learning (Fall 2020) 1

APT TECHNICAL CPD - MAF SHORT TERM DECISION MAKING Short term decision making and pricing

$ Lesson One Making Decisions 04/09 the decision-making process The decision-making process

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

The Positive & Negative Aspects of Group Decision Making Positive Aspects of Group Decision

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Decision Making Under Uncertainty Making Decisions Under Uncertainty AI C LASS 10 (C H .

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello

T O MB RAIDER T HE ART O F EPIC SC O RING I N T R O D U C T I O N 1 INT RO DUC T IO N

Energy Systema-cs Studies Elizabeth Worcester (BNL) March 15,

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Introduction to Machine Learning Machine Perception An Example Pattern Recognition Systems The

Supervised Classification with the Perceptron CMSC 470 Marine Carpuat Slides credit: Hal Daume

Decision Making Probabilistic model Known Unknown Bayes Decision - PowerPoint PPT Presentation

Bayesian Decision Theory Chapter 2 (Jan 11, 18, 23, 25) Bayes decision theory is a fundamental statistical approach to pattern classification Assumption: decision problem posed in probabilistic terms and relevant probability values are

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

DECISION MAKING readysetpresent.com Decision Making Program Objectives ( 1 of 2 ) To examine

Decision Making 1 Decision Making Skills Establishing a positive decision-making environment.

Decision Making Under Decision Making . . . General Set Uncertainty: Proof of This Result

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

S C DECISION E N C E decision science SDS CMU What is Decision Science? Behavioral

Supported Decision-Making in Wisconsin Today we will talk about: The concept of Supported

MI MI and Shared MI MI and Shared and Shared Decision Making and Shared Decision Making

Overview of Robot Decision Making Prof. Yuke Zhu Fall 2020 CS391R: Robot Learning (Fall 2020) 1

APT TECHNICAL CPD - MAF SHORT TERM DECISION MAKING Short term decision making and pricing

$ Lesson One Making Decisions 04/09 the decision-making process The decision-making process

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

The Positive &amp; Negative Aspects of Group Decision Making Positive Aspects of Group Decision

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Decision Making Under Uncertainty Making Decisions Under Uncertainty AI C LASS 10 (C H .

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

TOWARDS MULTI-INSTRUMENT DRUM TRANSCRIPTION Richard Vogl 1,2 , Gerhard Widmer 2 , Peter Knees 1

Structured training for large-vocabulary chord recognition Brian McFee* &amp; Juan Pablo Bello

T O MB RAIDER T HE ART O F EPIC SC O RING I N T R O D U C T I O N 1 INT RO DUC T IO N

Energy Systema-cs Studies Elizabeth Worcester (BNL) March 15,

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

WSD Word Sense Disambiguation: Determine from context (or otherwise) what Word Sense

Introduction to Machine Learning Machine Perception An Example Pattern Recognition Systems The

Supervised Classification with the Perceptron CMSC 470 Marine Carpuat Slides credit: Hal Daume

The Positive & Negative Aspects of Group Decision Making Positive Aspects of Group Decision

Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello