The Bayes Optimal Classifier Machine Learning 1 Most probable - - PowerPoint PPT Presentation

the bayes optimal classifier
SMART_READER_LITE
LIVE PREVIEW

The Bayes Optimal Classifier Machine Learning 1 Most probable - - PowerPoint PPT Presentation

The Bayes Optimal Classifier Machine Learning 1 Most probable classification In Bayesian learning, the primary question is: What is the most probable hypothesis given data? We can also ask: For a new test point, what is the most


slide-1
SLIDE 1

Machine Learning

The Bayes Optimal Classifier

1

slide-2
SLIDE 2

Most probable classification

  • In Bayesian learning, the primary question is: What is

the most probable hypothesis given data?

  • We can also ask: For a new test point, what is the

most probable label, given training data?

  • Is this the same as the prediction of the maximum a

posteriori hypothesis?

2

slide-3
SLIDE 3

Most probable classification

Suppose our hypothesis space H has three functions h1, h2 and h3

  • P(h1 | D) = 0.4, P(h2 | D) = 0.3, P(h3 | D) = 0.3
  • What is the MAP hypothesis?
  • For a new instance x, suppose h1(x) = +1, h2(x) = -1 and h3(x) = -1
  • What is the most probable classification of x?

The most probable classification is not the same as the prediction of the MAP hypothesis

3

P(+1 | x) = 0.4 P(-1| x) = 0.3 + 0.3

  • 1
slide-4
SLIDE 4

Most probable classification

Suppose our hypothesis space H has three functions h1, h2 and h3

  • P(h1 | D) = 0.4, P(h2 | D) = 0.3, P(h3 | D) = 0.3
  • What is the MAP hypothesis?
  • For a new instance x, suppose h1(x) = +1, h2(x) = -1 and h3(x) = -1
  • What is the most probable classification of x?

The most probable classification is not the same as the prediction of the MAP hypothesis

4

P(+1 | x) = 0.4 P(-1| x) = 0.3 + 0.3

  • 1

h1

slide-5
SLIDE 5

Most probable classification

Suppose our hypothesis space H has three functions h1, h2 and h3

  • P(h1 | D) = 0.4, P(h2 | D) = 0.3, P(h3 | D) = 0.3
  • What is the MAP hypothesis?
  • For a new instance x, suppose h1(x) = +1, h2(x) = -1 and h3(x) = -1
  • What is the most probable classification of x?

The most probable classification is not the same as the prediction of the MAP hypothesis

5

P(+1 | x) = 0.4 P(-1| x) = 0.3 + 0.3

  • 1

h1

slide-6
SLIDE 6

Most probable classification

Suppose our hypothesis space H has three functions h1, h2 and h3

  • P(h1 | D) = 0.4, P(h2 | D) = 0.3, P(h3 | D) = 0.3
  • What is the MAP hypothesis?
  • For a new instance x, suppose h1(x) = +1, h2(x) = -1 and h3(x) = -1
  • What is the most probable classification of x?

The most probable classification is not the same as the prediction of the MAP hypothesis

6

P(+1 | x) = 0.4 P(-1| x) = 0.3 + 0.3

h1

slide-7
SLIDE 7

Most probable classification

Suppose our hypothesis space H has three functions h1, h2 and h3

  • P(h1 | D) = 0.4, P(h2 | D) = 0.3, P(h3 | D) = 0.3
  • What is the MAP hypothesis?
  • For a new instance x, suppose h1(x) = +1, h2(x) = -1 and h3(x) = -1
  • What is the most probable classification of x?

The most probable classification is not the same as the prediction of the MAP hypothesis

7

P(+1 | x) = 0.4 P(-1| x) = 0.3 + 0.3

  • 1

h1

slide-8
SLIDE 8

Most probable classification

Suppose our hypothesis space H has three functions h1, h2 and h3

  • P(h1 | D) = 0.4, P(h2 | D) = 0.3, P(h3 | D) = 0.3
  • What is the MAP hypothesis?
  • For a new instance x, suppose h1(x) = +1, h2(x) = -1 and h3(x) = -1
  • What is the most probable classification of x?

The most probable classification is not the same as the prediction of the MAP hypothesis

8

P(+1 | x) = 0.4 P(-1| x) = 0.3 + 0.3

  • 1

h1

slide-9
SLIDE 9

Bayes Optimal Classifier

  • How should we use the general formalism?

– What should H be?

These two could be different! Selecting a function vs. entertaining all options until the last minute

9

H can be a collection of functions. H can be a collection of possible predictions

  • Given the training data, choose

an optimal function

  • Then, given new data, evaluate

the selected function on it

  • Given the data, try to directly

choose the optimal prediction

slide-10
SLIDE 10

Bayes Optimal Classifier

  • How should we use the general formalism?

– What should H be?

These two could be different! Selecting a function vs. entertaining all options until the last minute

10

H can be a collection of functions. H can be a collection of possible predictions

  • Given the training data, choose

an optimal function

  • Then, given new data, evaluate

the selected function on it

  • Given the data, try to directly

choose the optimal prediction

slide-11
SLIDE 11

Bayes Optimal Classifier

  • How should we use the general formalism?

– What should H be?

These two could be different! Selecting a function vs. entertaining all options until the last minute

11

H can be a collection of functions. H can be a collection of possible predictions

  • Given the training data, choose

an optimal function

  • Then, given new data, evaluate

the selected function on it

  • Given the data, try to directly

choose the optimal prediction

slide-12
SLIDE 12

Bayes Optimal Classification

Defined as the label produced by the most probable classifier Computing this can be hopelessly inefficient And yet an interesting theoretical concept because, no other classification method can outperform this method on average (using the same hypothesis space and prior knowledge)

12