CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick - - PowerPoint PPT Presentation

cs 7616 pattern recognition bayesian decision theory
SMART_READER_LITE
LIVE PREVIEW

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick - - PowerPoint PPT Presentation

Bayesian Decision Theory and more Introduction CS7616 Pattern Recognition A. Bobick CS7616 Pattern Recognition A. Bobick CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing Bayesian


slide-1
SLIDE 1

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Aaron Bobick School of Interactive Computing

CS 7616 Pattern Recognition Bayesian Decision Theory

slide-2
SLIDE 2

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Outline for “today”

  • Simple Tuberculosis reminder of Bayes rule and how that

relates to decision making

  • Some basic discussions of what it means to make a good

decision and the relation to Bayes

  • Basic Bayesian decision making
  • Minimum loss
  • Application to Normal distributions
  • Origins of linear classifiers?
  • Why normals?
  • Obvious and less obvious
slide-3
SLIDE 3

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Special thanks…

  • Professor Srihari in Buffalo… posted lots of slides…
slide-4
SLIDE 4

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

So you go to the doctor…

  • Assume you go to the doctor because it’s that time of year…
  • He tells you that you’re overdue for your Tuberculosis test
  • You take the TB test (𝑌) and it’s positive!!! (𝑌+)
  • But then he tells you not to worry because:
  • The detection rate is 100% 𝑄 𝑌+ 𝑈+) = 1
  • But the false alarm rate is 5% 𝑄 𝑌+ 𝑈−) = 0.05
  • The incident rate in Atlanta of TB is 0.1% 𝑄 𝑈+ = 0.001
  • Therefore the odds that you have TB given the test are:
  • 𝑄 𝑈+ 𝑌+) =

𝑄 𝑌+ 𝑈+ 𝑄 𝑈+ 𝑄 𝑌+

=

𝑄 𝑌+ 𝑈+ 𝑄 𝑈+ 𝑄 𝑌+|𝑈+ 𝑄 𝑈+ +𝑄 𝑌+ 𝑈− 𝑄(𝑈−)

  • =

1.0∗0.001 1.0∗0.001+0.05∗0.999 = 0.0196 (ie 20 times what it was before the test)

Bayes rule

Collectively exhaustive Mutually exclusive

slide-5
SLIDE 5

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

So…

  • Q1: if you had to decide right then whether you have TB or not,

what would you decide?

  • Q2: would you go get a chest X-ray?
  • Why can’t you really answer that question?
  • Cost of the X-ray?
  • Cost of having TB and not finding out?
  • (Prostate cancer treatments….)
  • So to make the “right” decisions we needed to know:
  • Prior probabilities 𝑄 𝑈+
  • Likelihoods 𝑄 𝑌+ 𝑈+ and 𝑄 𝑌+ 𝑈−
  • Cost (loss) functions
slide-6
SLIDE 6

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Bayes decision theory

  • Bayesian theory is fundamental to decision theory and pattern

recognition.

  • Basically the mechanisms by which one can evaluate the

probability of being right (and thus wrong).

  • Allows one to compute an expectation of cost/reward (assuming some

very non-ICBM – no infinities - types of loss)

But…

  • It presumes that that a variety of probabilities are known – or

at least known about how much they are unknown (Bayes meets

Rumsfeld???)

  • We’ll ignore this concern for now…
slide-7
SLIDE 7

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Bayes 1: Priors

  • We have states of nature 𝜕𝑗 that are mutually exclusive and

collectively exhaustive:

  • Decision rule if only two classes and based only on prior:

if P 𝜕1 > P 𝜕2 choose class 𝜕1otherwise 𝜕2.

( ) 1

i i P ω

= ∑

slide-8
SLIDE 8

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Bayes 2: Class conditional probabilities

  • Need to know the probability of our data (measurements)

given the possible states of nature:

  • These are probability densities as opposed to distribution on

the priors. I will definitely confuse this is class.

𝑞(𝑦|𝜕𝑗)

slide-9
SLIDE 9

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Bayes rule to get data conditioned probability

where “evidence”

  • Read “posterior is the likelihood times the prior divided by the

evidence”.

  • And since the “evidence” 𝑞 𝑦 is fixed we can usually ignore

that.

) ( ) ( | | ) ( ( )

j j j

p x P P p x x ω ω ω =

( ) ( | ) ( )

j j j

p x p x P ω ω = ∑

slide-10
SLIDE 10

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

The posteriors from the division…

slide-11
SLIDE 11

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Bayesian decision rule

  • If 𝑄 𝜕1 𝑦 > 𝑄 𝜕2 𝑦 then choose 𝜕1 since the true state of

nature is more likely 𝜕1….

  • Assuming there is no significant difference between being

wrong in one direction or the other.

  • What is probability of making an error?

𝑄 error 𝑦 = 𝑄 𝜕1|𝑦 when we deicded 𝜕2 and 𝑄 error 𝑦 = 𝑄 ω2|x when we decided 𝜕1.

  • So P error 𝑦 = min

[𝑄( 𝜕1|𝑦), 𝑄 𝜕2|𝑦 ] (Bayes error)

slide-12
SLIDE 12

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Obvious generalizations:

  • Feature is a vector (no real difference)
  • More than two classes (as long as ME and CE no problem)
  • Introduce a general loss function which is more general than

just making an error … we’ll do this in a minute…

  • And you can refuse to give an answer “I don’t know”. We’ll talk

more about that another time.

slide-13
SLIDE 13

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Loss functions and minimum risk

  • Let 𝜕𝑗 be the possible states of nature.
  • Let {𝛽𝑘} be the possible actions taken (usually announcing the

class so as many actions as classes).

  • Let 𝜇 𝛽𝑘 𝜕𝑗 be the “loss” incurred for taking action j when

actual state of nature is i.

  • Then the expected loss of taking action i when measurement 𝑦:
  • So: select 𝛽𝑗with minimum expected loss. That’s what you’re

“risking”. Bayes risk is the best you can do.

| ) ( | ( ) ( | )

i i j j j

x R P x α λ α ω ω = ∑

slide-14
SLIDE 14

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

LRT – likelihood ratio test

  • Action 𝛽𝑗 is to choose i. Cost 𝜇𝑗𝑘 is cost of choosing i when

reality is j.

  • Two risks:
  • Choose 𝛽1 is it’s risk is lower:
  • Which gives a ratio test based on cost and priors: Choose 𝛽1 if

1 11 1 12 2 21 1 22 2 2

| ) ( | ) ( | ) | ) ( | ) ( | ( ( ) R R x P x P x x P x P x α λ ω λ ω α λ ω λ ω = + = +

21 11 1 1 12 22 2 2

) ( | ) ( ) ) ( | ) ( ) ( ( p x P p x P λ λ ω ω λ λ ω ω − > −

1 12 22 21 2 2 11 1

) ( ( ) | ) ( | ( ) p x P T p x P ω ω ω ω λ λ λ λ > − − = 

slide-15
SLIDE 15

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

A special loss function

  • Cost 𝜇𝑗𝑘 is 0 is 𝑗 = 𝑘 , 1 otherwise. Called zero-one loss funciton

(duh).

  • Which gives a ratio test: choose 𝛽1 if
  • i.e. choose whichever class is more likely given the data. Which

really means you combine likelihoods and priors, and you never separate them. That is, you just have a decision boundary on 𝒚. That is you just discriminate based upon 𝒚…

1 2 2 1

) ) ) ( | ( ( | ( ) p x P p x P ω ω ω ω >

slide-16
SLIDE 16

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Introduction to discriminant functions

  • Let 𝑕𝑗

= −𝑆(𝛽_𝑗|𝑦) (So “max” discriminant function is min risk.)

  • For minimum error rate (zero one loss):

𝑕𝑗 𝑦 = 𝑄 𝜕𝑗 𝑦 (max discrimination is max posterior)

  • Using Bayes rule :

𝑕𝑗 𝑦 ∝ 𝑞 𝑦 𝜕𝑗 𝑄 𝜕1

  • Finally and then monotonicity of ln let:

𝑕𝑗 𝑦 = ln 𝑞 𝑦 𝜕𝑗 + ln (𝑄 𝜕1 )

slide-17
SLIDE 17

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Two class discrimination

  • Let 𝑕 𝑦 = 𝑕1 𝑦 − 𝑕2 𝑦
  • Decide class 𝜕1 if 𝑕 𝑦 > 0 otherwise decide 𝜕2
slide-18
SLIDE 18

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Next time…

  • Linear discriminants applied to normal distributions.
slide-19
SLIDE 19

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Remember your first assignment!

  • Due next Tuesday, Jan 14.
  • Find an available data set that corresponds to “modest”

number of features and “small” number of classes

  • Modest – plausible to try all or many possible subsets of features
  • Small - maybe less than 5. 2 is ideal. 30 would be too many.
  • Submit a one page description of the data, how we would get it

within a week. (Are you making it? That’s OK)

slide-20
SLIDE 20

Introduction CS7616 Pattern Recognition – A. Bobick Bayesian Decision Theory and more CS7616 Pattern Recognition – A. Bobick

Going forward

  • For coming lectures:
  • HTF: read ch 1&2
  • Get yourself Matlab (and/or Python)
  • Make sure you’re invited to Piazza