Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick - - PowerPoint PPT Presentation
CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick - - PowerPoint PPT Presentation
Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition A. Bobick CS7616 Pattern Recognition A. Bobick CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing Bayesian
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Outline for “today”
- Simple Tuberculosis reminder of Bayes rule and how that
relates to decision making
- Some basic discussions of what it means to make a good
decision and the relation to Bayes
- Basic Bayesian decision making
- Minimum loss
- Application to Normal distributions
- Origins of linear classifiers?
- Why normals?
- Obvious and less obvious
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Special thanks…
- Professor Srihari in Buffalo… posted lots of slides…
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
So you go to the doctor…
- Assume you go to the doctor because it’s that time of year…
- He tells you that you’re overdue for your Tuberculosis test
- You take the TB test (𝑌) and it’s positive!!! (𝑌+)
- But then he tells you not to worry because:
- The detection rate is 100% 𝑄 𝑌+ 𝑈+) = 1
- But the false alarm rate is 5% 𝑄 𝑌+ 𝑈−) = 0.05
- The incident rate in Atlanta of TB is 0.1% 𝑄 𝑈+ = 0.001
- Therefore the odds that you have TB given the test are:
- 𝑄 𝑈+ 𝑌+) =
𝑄 𝑌+ 𝑈+ 𝑄 𝑈+ 𝑄 𝑌+
=
𝑄 𝑌+ 𝑈+ 𝑄 𝑈+ 𝑄 𝑌+|𝑈+ 𝑄 𝑈+ +𝑄 𝑌+ 𝑈− 𝑄(𝑈−)
- =
1.0∗0.001 1.0∗0.001+0.05∗0.999 = 0.0196 (ie 20 times what it was before the test)
Bayes rule
Collectively exhaustive Mutually exclusive
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
So…
- Q1: if you had to decide right then whether you have TB or not,
what would you decide?
- Q2: would you go get a chest X-ray?
- Why can’t you really answer that question?
- Cost of the X-ray?
- Cost of having TB and not finding out?
- (Prostate cancer treatments….)
- So to make the “right” decisions we needed to know:
- Prior probabilities 𝑄 𝑈+
- Likelihoods 𝑄 𝑌+ 𝑈+ and 𝑄 𝑌+ 𝑈−
- Cost (loss) functions
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Bayes decision theory
- Bayesian theory is fundamental to decision theory and pattern
recognition.
- Basically the mechanisms by which one can evaluate the
probability of being right (and thus wrong).
- Allows one to compute an expectation of cost/reward (assuming some
very non-ICBM – no infinities - types of loss)
But…
- It presumes that that a variety of probabilities are known – or
at least known about how much they are unknown (Bayes meets
Rumsfeld???)
- We’ll ignore this concern for now…
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Bayes 1: Priors
- We have states of nature 𝜕𝑗 that are mutually exclusive and
collectively exhaustive:
- Decision rule if only two classes and based only on prior:
if P 𝜕1 > P 𝜕2 choose class 𝜕1otherwise 𝜕2.
( ) 1
i i P ω
= ∑
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Bayes 2: Class conditional probabilities
- Need to know the probability of our data (measurements)
given the possible states of nature:
- These are probability densities as opposed to distribution on
the priors. I will definitely confuse this is class.
𝑞(𝑦|𝜕𝑗)
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Bayes rule to get data conditioned probability
where “evidence”
- Read “posterior is the likelihood times the prior divided by the
evidence”.
- And since the “evidence” 𝑞 𝑦 is fixed we can usually ignore
that.
) ( ) ( | | ) ( ( )
j j j
p x P P p x x ω ω ω =
( ) ( | ) ( )
j j j
p x p x P ω ω = ∑
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
The posteriors from the division…
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Bayesian decision rule
- If 𝑄 𝜕1 𝑦 > 𝑄 𝜕2 𝑦 then choose 𝜕1 since the true state of
nature is more likely 𝜕1….
- Assuming there is no significant difference between being
wrong in one direction or the other.
- What is probability of making an error?
𝑄 error 𝑦 = 𝑄 𝜕1|𝑦 when we deicded 𝜕2 and 𝑄 error 𝑦 = 𝑄 ω2|x when we decided 𝜕1.
- So P error 𝑦 = min
[𝑄( 𝜕1|𝑦), 𝑄 𝜕2|𝑦 ] (Bayes error)
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Obvious generalizations:
- Feature is a vector (no real difference)
- More than two classes (as long as ME and CE no problem)
- Introduce a general loss function which is more general than
just making an error … we’ll do this in a minute…
- And you can refuse to give an answer “I don’t know”. We’ll talk
more about that another time.
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Loss functions and minimum risk
- Let 𝜕𝑗 be the possible states of nature.
- Let {𝛽𝑘} be the possible actions taken (usually announcing the
class so as many actions as classes).
- Let 𝜇 𝛽𝑘 𝜕𝑗 be the “loss” incurred for taking action j when
actual state of nature is i.
- Then the expected loss of taking action i when measurement 𝑦:
- So: select 𝛽𝑗with minimum expected loss. That’s what you’re
“risking”. Bayes risk is the best you can do.
| ) ( | ( ) ( | )
i i j j j
x R P x α λ α ω ω = ∑
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
LRT – likelihood ratio test
- Action 𝛽𝑗 is to choose i. Cost 𝜇𝑗𝑘 is cost of choosing i when
reality is j.
- Two risks:
- Choose 𝛽1 is it’s risk is lower:
- Which gives a ratio test based on cost and priors: Choose 𝛽1 if
1 11 1 12 2 21 1 22 2 2
| ) ( | ) ( | ) | ) ( | ) ( | ( ( ) R R x P x P x x P x P x α λ ω λ ω α λ ω λ ω = + = +
21 11 1 1 12 22 2 2
) ( | ) ( ) ) ( | ) ( ) ( ( p x P p x P λ λ ω ω λ λ ω ω − > −
1 12 22 21 2 2 11 1
) ( ( ) | ) ( | ( ) p x P T p x P ω ω ω ω λ λ λ λ > − − =
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
A special loss function
- Cost 𝜇𝑗𝑘 is 0 is 𝑗 = 𝑘 , 1 otherwise. Called zero-one loss funciton
(duh).
- Which gives a ratio test: choose 𝛽1 if
- i.e. choose whichever class is more likely given the data. Which
really means you combine likelihoods and priors, and you never separate them. That is, you just have a decision boundary on 𝒚. That is you just discriminate based upon 𝒚…
1 2 2 1
) ) ) ( | ( ( | ( ) p x P p x P ω ω ω ω >
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Introduction to discriminant functions
- Let 𝑗
= −𝑆(𝛽_𝑗|𝑦) (So “max” discriminant function is min risk.)
- For minimum error rate (zero one loss):
𝑗 𝑦 = 𝑄 𝜕𝑗 𝑦 (max discrimination is max posterior)
- Using Bayes rule :
𝑗 𝑦 ∝ 𝑞 𝑦 𝜕𝑗 𝑄 𝜕1
- Finally and then monotonicity of ln let:
𝑗 𝑦 = ln 𝑞 𝑦 𝜕𝑗 + ln (𝑄 𝜕1 )
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Two class discrimination
- Let 𝑦 = 1 𝑦 − 2 𝑦
- Decide class 𝜕1 if 𝑦 > 0 otherwise decide 𝜕2
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Next time…
- Linear discriminants applied to normal distributions.
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Remember your first assignment!
- Due next Tuesday, Jan 14.
- Find an available data set that corresponds to “modest”
number of features and “small” number of classes
- Modest – plausible to try all or many possible subsets of features
- Small - maybe less than 5. 2 is ideal. 30 would be too many.
- Submit a one page description of the data, how we would get it
within a week. (Are you making it? That’s OK)
Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick
Going forward
- For coming lectures:
- HTF: read ch 1&2
- Get yourself Matlab (and/or Python)
- Make sure you’re invited to Piazza