the bayes optimal classifier
play

The Bayes Optimal Classifier Machine Learning 1 Most probable - PowerPoint PPT Presentation

The Bayes Optimal Classifier Machine Learning 1 Most probable classification In Bayesian learning, the primary question is: What is the most probable hypothesis given data? We can also ask: For a new test point, what is the most


  1. The Bayes Optimal Classifier Machine Learning 1

  2. Most probable classification • In Bayesian learning, the primary question is: What is the most probable hypothesis given data? • We can also ask: For a new test point, what is the most probable label, given training data? • Is this the same as the prediction of the maximum a posteriori hypothesis? 2

  3. Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 3

  4. Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 4

  5. Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 5

  6. Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 6

  7. Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 7

  8. Most probable classification Suppose our hypothesis space H has three functions h 1 , h 2 and h 3 P(h 1 | D) = 0.4, P(h 2 | D) = 0.3, P(h 3 | D) = 0.3 • What is the MAP hypothesis? h 1 • For a new instance x , suppose h 1 ( x ) = +1, h 2 ( x ) = -1 and h 3 ( x ) = -1 • What is the most probable classification of x ? • -1 P(+1 | x ) = 0.4 P(-1| x ) = 0.3 + 0.3 The most probable classification is not the same as the prediction of the MAP hypothesis 8

  9. Bayes Optimal Classifier • How should we use the general formalism? – What should H be? H can be a collection of functions. H can be a collection of possible predictions Given the training data, choose Given the data, try to directly • • an optimal function choose the optimal prediction Then, given new data, evaluate • the selected function on it These two could be different! Selecting a function vs. entertaining all options until the last minute 9

  10. Bayes Optimal Classifier • How should we use the general formalism? – What should H be? H can be a collection of functions. H can be a collection of possible predictions Given the training data, choose Given the data, try to directly • • an optimal function choose the optimal prediction Then, given new data, evaluate • the selected function on it These two could be different! Selecting a function vs. entertaining all options until the last minute 10

  11. Bayes Optimal Classifier • How should we use the general formalism? – What should H be? H can be a collection of functions. H can be a collection of possible predictions Given the training data, choose Given the data, try to directly • • an optimal function choose the optimal prediction Then, given new data, evaluate • the selected function on it These two could be different! Selecting a function vs. entertaining all options until the last minute 11

  12. Bayes Optimal Classification Defined as the label produced by the most probable classifier Computing this can be hopelessly inefficient And yet an interesting theoretical concept because, no other classification method can outperform this method on average (using the same hypothesis space and prior knowledge) 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend