Generative and Discriminative Learning Machine Learning 1 What we - - PowerPoint PPT Presentation

generative and discriminative learning
SMART_READER_LITE
LIVE PREVIEW

Generative and Discriminative Learning Machine Learning 1 What we - - PowerPoint PPT Presentation

Generative and Discriminative Learning Machine Learning 1 What we saw most of the semester A fixed, unknown distribution D over X Y X: Instance space, Y: label space (eg: {+1, -1}) Given a dataset S = {(x i , y i )} Learning


slide-1
SLIDE 1

Machine Learning

Generative and Discriminative Learning

1

slide-2
SLIDE 2

What we saw most of the semester

  • A fixed, unknown distribution D over X £ Y

– X: Instance space, Y: label space (eg: {+1, -1})

  • Given a dataset S = {(xi, yi)}
  • Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

  • The guarantee

– If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function

  • f H)

2

slide-3
SLIDE 3

What we saw most of the semester

  • A fixed, unknown distribution D over X £ Y

– X: Instance space, Y: label space (eg: {+1, -1})

  • Given a dataset S = {(xi, yi)}
  • Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

  • The guarantee

– If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function

  • f H)

3

Is this different from assuming a distribution over X and a fixed oracle function f?

slide-4
SLIDE 4

What we saw most of the semester

  • A fixed, unknown distribution D over X £ Y

– X: Instance space, Y: label space (eg: {+1, -1})

  • Given a dataset S = {(xi, yi)}
  • Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

  • The guarantee

– If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function

  • f H)

4

Is this different from assuming a distribution over X and a fixed oracle function f?

slide-5
SLIDE 5

What we saw most of the semester

  • A fixed, unknown distribution D over X £ Y

– X: Instance space, Y: label space (eg: {+1, -1})

  • Given a dataset S = {(xi, yi)}
  • Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

  • The guarantee

– If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function

  • f H)

5

Is this different from assuming a distribution over X and a fixed oracle function f?

slide-6
SLIDE 6

What we saw most of the semester

  • A fixed, unknown distribution D over X £ Y

– X: Instance space, Y: label space (eg: {+1, -1})

  • Given a dataset S = {(xi, yi)}
  • Learning

– Identify a hypothesis space H, define a loss function L(h, x, y) – Minimize average loss over training data (plus regularization)

  • The guarantee

– If we find an algorithm that minimizes loss on the observed data – Then, learning theory guarantees good future behavior (as a function

  • f H)

6

Is this different from assuming a distribution over X and a fixed oracle function f?

slide-7
SLIDE 7

Discriminative models

Goal: learn directly how to make predictions

  • Look at many (positive/negative) examples
  • Discover regularities in the data
  • Use these to construct a prediction policy
  • Assumptions come in the form of the hypothesis class

Bottom line: approximating ℎ: 𝑌 → 𝑍 is estimating the conditional probability 𝑄(𝑍|𝑌)

7

slide-8
SLIDE 8

Generative models

  • Explicitly model how instances in each category are

generated by modeling the joint probability of X and Y, that is 𝑄(𝑍, 𝑌)

  • That is, learn 𝑄(𝑌|𝑍) and 𝑄(𝑍)
  • We did this for naïve Bayes

– Naïve Bayes is a generative model

  • Predict 𝑄(𝑍|𝑌) using the Bayes rule

8

slide-9
SLIDE 9

Example: Generative story of naïve Bayes

9

slide-10
SLIDE 10

Example: Generative story of naïve Bayes

10

Y P(Y)

First sample a label

slide-11
SLIDE 11

Example: Generative story of naïve Bayes

11

Y P(Y) X1 P(X1 | Y)

Given the label, sample the features independently from the conditional distributions

slide-12
SLIDE 12

Example: Generative story of naïve Bayes

12

Y P(Y) X1 P(X1 | Y) X2 P(X2 | Y)

Given the label, sample the features independently from the conditional distributions

slide-13
SLIDE 13

Example: Generative story of naïve Bayes

13

Y P(Y) X1 P(X1 | Y) X2 P(X2 | Y) X3 P(X3 | Y)

Given the label, sample the features independently from the conditional distributions

slide-14
SLIDE 14

Example: Generative story of naïve Bayes

14

Y P(Y) X1 P(X1 | Y) X2 P(X2 | Y) X3 P(X3 | Y)

. . .

Given the label, sample the features independently from the conditional distributions

slide-15
SLIDE 15

Example: Generative story of naïve Bayes

15

Y P(Y) X1 P(X1 | Y) X2 P(X2 | Y) X3 P(X3 | Y) Xd P(Xd | Y)

. . .

Given the label, sample the features independently from the conditional distributions

slide-16
SLIDE 16
  • Generative models

– learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model

  • Discriminative models

– learn P(y | x) – Use the capacity of the model to characterize the decision boundary

  • nly

– Eg: Logistic Regression, Conditional models (several names)

Generative vs Discriminative models

16

slide-17
SLIDE 17
  • Generative models

– learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model

  • Discriminative models

– learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models

Generative vs Discriminative models

17

slide-18
SLIDE 18
  • Generative models

– learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model

  • Discriminative models

– learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models

Generative vs Discriminative models

18

slide-19
SLIDE 19
  • Generative models

– learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model

  • Discriminative models

– learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models

Generative vs Discriminative models

19

slide-20
SLIDE 20
  • Generative models

– learn P(x, y) – Use the capacity of the model to characterize how the data is generated (both inputs and outputs) – Eg: Naïve Bayes, Hidden Markov Model

  • Discriminative models

– learn P(y | x) – Use model capacity to characterize the decision boundary only – Eg: Logistic Regression, Conditional models (several names), most neural models

Generative vs Discriminative models

20

A generative model tries to characterize the distribution of the inputs, a discriminative model doesn’t care