Model Selection and Nave Bayes Machine Learning - 10601 Geoff - - PDF document

model selection and na ve bayes
SMART_READER_LITE
LIVE PREVIEW

Model Selection and Nave Bayes Machine Learning - 10601 Geoff - - PDF document

9/23/2009 Model Selection and Nave Bayes Machine Learning - 10601 Geoff Gordon, Miroslav Dudk ([[[partly based on slides of Tom Mitchell] http://www.cs.cmu.edu/~ggordon/10601/ September 23, 2009 Announcements September 21,2009: Netflix


slide-1
SLIDE 1

9/23/2009 1

Machine Learning - 10601

Model Selection and Naïve Bayes

Geoff Gordon, Miroslav Dudík

([[[partly based on slides of Tom Mitchell]

http://www.cs.cmu.edu/~ggordon/10601/ September 23, 2009

Announcements

“You’re getting Ph.D.’s for a dollar an hour,” Reed Hastings, chief of Netflix, said of the people competing for the prize. September 21,2009: Netflix awards $1 Million prize to a team of statisticians, machine-learning experts and computer engineers

slide-2
SLIDE 2

9/23/2009 2

How to win $1 Million

Goal: (user,movie) -> rating Data: 100M (user,movie,date,rating) tuples Performance measure: root mean squared error

  • n withheld test set

How to win $1 Million

A part of the winning model is the “baseline model” capturing bulk of the information:

[Koren 2009]

slide-3
SLIDE 3

9/23/2009 3

How to win $1 Million

training set quiz set test set

FAQ: why quiz/test split?

We wanted a way of informing you … about your progress … while making it difficult for you to simply train and

  • ptimize against “the answer oracle”
slide-4
SLIDE 4

9/23/2009 4

FAQ: why quiz/test split? Two goals for withholding data

  • model selection
  • model assessment

What if data is scarce?

training set validation set test set

slide-5
SLIDE 5

9/23/2009 5

Cross-validation

  • split data randomly into K equal parts
  • for each model setting:

evaluate avg performance across K train-test splits

  • train the best model on the full data set

Part 1 Part 2 Part 3 Part 1 Part 2 evaluate error Part 1 evaluate error Part 3 evaluate error Part 2 Part 3

Depends on the size of the data set:

The best model…

y ≈ w0 + w1x + w2x2 + w3x3 + w4x4 + … + w10x10

slide-6
SLIDE 6

9/23/2009 6

K-fold cross-validation trains

  • n of the training data

Controlling model complexity

  • limit the number of features
  • add a “complexity penalty”
slide-7
SLIDE 7

9/23/2009 7

Regularized estimation

min errortrain(w) + regularization(w) min -log p(data|w) - log p(w)

Examples of regularization

min min

slide-8
SLIDE 8

9/23/2009 8

L2: L1:

training error regulari zation training error + regularization

L1 vs L2

L1

  • sparse solutions
  • more suitable when #features

much larger than training set L2

  • computationally better-behaved

How do you choose λ?

slide-9
SLIDE 9

9/23/2009 9

Announcements

HW #3 out due October 7

Classification

Goal: learn a map h: x y Data: (x1, y1), (x2, y2)… , (xN, yN) Performance measure:

slide-10
SLIDE 10

9/23/2009 10

All you need to know is p(X,Y)… If you knew p(X,Y), how would you classify an example x? Why?

How many parameters need to be estimated?

Y binary X described by M binary features X1,X2,…,XM Data: p(X,Y) described by numbers

slide-11
SLIDE 11

9/23/2009 11

Naïve Bayes Assumption

  • features of X conditionally

independent given class Y Example: Live in Sq Hill?

  • S=1 iff live in Sq Hill
  • G=1 iff shop in Sq Hill Giant Eagle
  • D=1 iff drive to CMU
  • A=1 iff owns a Mac
slide-12
SLIDE 12

9/23/2009 12

Naïve Bayes Assumption

  • usually incorrect…
  • Naïve Bayes often performs well,

even when the assumption is violated [see Domingos-Pazzani 1996]

Learning to classify text documents

  • which emails are spam?
  • which emails promise an attachment?
  • which web pages are student home pages?

What are the features of X?

slide-13
SLIDE 13

9/23/2009 13

Feature Xj is the jth word Assumption #1: Naïve Bayes

slide-14
SLIDE 14

9/23/2009 14

Assumption #2: “Bag of words” “Bag of words” approach

slide-15
SLIDE 15

9/23/2009 15

slide-16
SLIDE 16

9/23/2009 16

slide-17
SLIDE 17

9/23/2009 17

slide-18
SLIDE 18

9/23/2009 18

What you should know about Naïve Bayes

Naïve Bayes

  • assumption
  • why we use it

Text classification

  • bag of words model

Gaussian Naïve Bayes

  • each feature a Gaussian given the class