CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES - - PowerPoint PPT Presentation

cse217 introduction to data science
SMART_READER_LITE
LIVE PREVIEW

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES - - PowerPoint PPT Presentation

CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 6: LEARNING PRINCIPLES Spring 2019 Marion Neumann RECAP: MACHINE LEARNING Workflow 2 NOISE noisy samples from true function 3 WHY IS NOISE A PROBLEM? small random sample from the noisy


slide-1
SLIDE 1

CSE217 INTRODUCTION TO DATA SCIENCE

Spring 2019 Marion Neumann

LECTURE 6: LEARNING PRINCIPLES

slide-2
SLIDE 2

RECAP: MACHINE LEARNING

  • Workflow

2

slide-3
SLIDE 3

NOISE

  • noisy samples from true function

3

slide-4
SLIDE 4

WHY IS NOISE A PROBLEM?

  • small random sample from the noisy data

4

slide-5
SLIDE 5

WHY IS NOISE A PROBLEM?

  • best model for this (training) data

5

slide-6
SLIDE 6

WHY IS NOISE A PROBLEM?

à fitting the noise instead of the true function

6

slide-7
SLIDE 7

REGRESSION AND MODEL COMPLEXITY

7

PDSH p393 Linear Regression

Error on training set: linear model >> quadraEc >> 6-order polynomial

ß error is zero! Is the model with zero (training) error the best?

slide-8
SLIDE 8

EVALUATION FOR REGRESSION

  • Training Error vs. Test Error
  • Error measures:
  • RMSE: root mean squared error
  • MAE: mean absolute error

8

RMSE % &, &() =

+ , - .

(%

  • 0. − 0.)3

MAE % &, &() = +

, - .

|%

  • 0. − 0.|

% & = 6(7()) predictions for test data

slide-9
SLIDE 9

OVERFITTING

9

t

µ

Sgp fnderf.im

kfg

f

a

pH

linear

s

I

high

  • rder

poly

slide-10
SLIDE 10

EVALUATION FOR CLASSIFICATION

  • Quality Measures:
  • error rate (or misclassification rate) =

# #$%%&'(%%$)$*+ ,*%, -.$/,% # ,*%, -.$/,%

  • average accuracy (= 1 − 23343 3562)
  • Noise in Classification
  • where do labels come

from? à noisy labels

10

we have again training and test error (accuracy)

slide-11
SLIDE 11

EVALUATION FOR CLASSIFICATION

11

+1

  • 1

+1

  • 1

prediction true label

✓ ✓

false nega2ve predic2on

false positive prediction true positive prediction true negative prediction Can you define accuracy using these measures?

  • Confusion matrix

TPR

TP N

YE f

NR FF

P

FPR ftp.u TNI

ETNR

TIN

f

slide-12
SLIDE 12

CLASSIFICATION AND MODEL COMPLEXITY

12

to

slide-13
SLIDE 13

CLASSIFICATION AND MODEL COMPLEXITY

13

µy

eiE'oat

compare training

test errors

for all three models

slide-14
SLIDE 14

OVERFITTING

14

Draw this yourself

d

I

l

I

I

v

slide-15
SLIDE 15

COMBATING OVERFITTING

Several Strategies:

1) prefer simpler models over more complicated ones 2) use validation set for model selection 3) add a regularization term to your optimization problem during training

15

T

lDra

A

ground

Validation

truth

msn.ee EsmYegaePEt

B

prediction

Validation

4

Performance

C

Evaluation

Validation

vs penalize large weights in

slide-16
SLIDE 16

HOW MUCH DATA DO WE NEED?

  • Learning curve

16

slide-17
SLIDE 17

DATA ≠ DATA

  • Two kinds of data: population vs. sample

17

A population is the entire set

  • f objects or events under
  • study. Population can be

hypothetical “all students” or all students in this class. A sample is a (representative) subset of the objects or events under study. à needed because it’s impossible or intractable to

  • btain or use population data.

What are problems with sample data?

slide-18
SLIDE 18
  • What if our sample is biased?
  • Think about real world ML applica:ons where

this might have a (nega:ve) impact!

SAMPLING BIAS

18

slide-19
SLIDE 19

19

  • DSFS
  • Ch11 (p142-147)
  • PDSH
  • Ch5 (p357,370-373)
  • Ch5 (p393-398)

SUMMARY & READING

  • Avoid overfitting!
  • Model selection using a validation set can prevent
  • verfitting.
  • Learning curve à training data size matters and

influences model selection

  • Model evaluation for classification is more than just

looking at the error.