generative and discriminative learning
play

Generative and Discriminative Learning Machine Learning 1 What we - PowerPoint PPT Presentation

Generative and Discriminative Learning Machine Learning 1 What we saw most of the semester A fixed, unknown distribution D over X Y X: Instance space, Y: label space (eg: {+1, -1}) Given a dataset S = {(x i , y i )} Learning


  1. Generative and Discriminative Learning Machine Learning 1

  2. What we saw most of the semester • A fixed, unknown distribution D over X £ Y – X: Instance space, Y: label space (eg: {+1, -1}) • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 2

  3. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 3

  4. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 4

  5. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 5

  6. What we saw most of the semester Is this different • A fixed, unknown distribution D over X £ Y from assuming a – X: Instance space, Y: label space (eg: {+1, -1}) distribution over X and a fixed oracle function f? • Given a dataset S = {(x i , y i )} • Learning – Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization) • The guarantee – If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function of H) 6

  7. Discriminative models Goal: learn directly how to make predictions • Look at many (positive/negative) examples • Discover regularities in the data • Use these to construct a prediction policy • Assumptions come in the form of the hypothesis class Bottom line: approximating ℎ: 𝑌 → 𝑍 is estimating the conditional probability 𝑄(𝑍|𝑌) 7

  8. Generative models • Explicitly model how instances in each category are generated by modeling the joint probability of X and Y, that is 𝑄(𝑍, 𝑌) • That is, learn 𝑄(𝑌|𝑍) and 𝑄(𝑍) • We did this for naïve Bayes – Naïve Bayes is a generative model • Predict 𝑄(𝑍|𝑌) using the Bayes rule 8

  9. Example: Generative story of naïve Bayes 9

  10. Example: Generative story of naïve Bayes P(Y) Y First sample a label 10

  11. Example: Generative story of naïve Bayes P(Y) Y X 1 P(X 1 | Y) Given the label, sample the features independently from the conditional distributions 11

  12. Example: Generative story of naïve Bayes P(Y) Y X 1 X 2 P(X 1 | Y) P(X 2 | Y) Given the label, sample the features independently from the conditional distributions 12

  13. Example: Generative story of naïve Bayes P(Y) Y X 1 X 2 X 3 P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) Given the label, sample the features independently from the conditional distributions 13

  14. Example: Generative story of naïve Bayes P(Y) Y . . . X 1 X 2 X 3 P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) Given the label, sample the features independently from the conditional distributions 14

  15. Example: Generative story of naïve Bayes P(Y) Y . . . X 1 X 2 X 3 X d P(X 1 | Y) P(X 2 | Y) P(X 3 | Y) P(X d | Y) Given the label, sample the features independently from the conditional distributions 15

  16. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use the capacity of the model to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names) 16

  17. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 17

  18. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 18

  19. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 19

  20. Generative vs Discriminative models • Generative models – learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model A generative model tries to characterize the distribution of the inputs, a discriminative model doesn’t care • Discriminative models – learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend