Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

department of computer science csci 5622 machine learning
SMART_READER_LITE
LIVE PREVIEW

Department of Computer Science CSCI 5622: Machine Learning Chenhao - - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm review 1 Theory PAC learning Bias-variance tradeoff Model selection Methods K-nearest neighbor Nave Bayes Linear


slide-1
SLIDE 1

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm review

1

slide-2
SLIDE 2

2

  • Theory
  • PAC learning
  • Bias-variance tradeoff
  • Model selection
  • Methods
  • K-nearest neighbor
  • Naïve Bayes
  • Linear regression
  • Regularization
  • Logistic regression
  • Neural networks
  • SVM
  • Multi-class classification
  • Feature engineering
  • Boosting
slide-3
SLIDE 3

Supervised learning

3

slide-4
SLIDE 4

PAC Learnability

4

slide-5
SLIDE 5

PAC Learnability

5

slide-6
SLIDE 6

PAC Learnability

6

slide-7
SLIDE 7

7

slide-8
SLIDE 8

Generalization error bounds

  • Finite consistent hypothesis class
  • Finite inconsistent hypothesis class

8

slide-9
SLIDE 9

Finite Consistent Hypothesis Class

9

slide-10
SLIDE 10

Finite Inconsistent Hypothesis Class

10

slide-11
SLIDE 11

Bias-variance tradeoff

11

slide-12
SLIDE 12

12

slide-13
SLIDE 13

13

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17
  • Theory
  • PAC learning
  • Bias-variance tradeoff
  • Model selection
  • Methods
  • K-nearest neighbor
  • Naïve Bayes
  • Linear regression
  • Regularization
  • Logistic regression
  • Neural networks
  • SVM
  • Multi-class classification
  • Feature engineering
  • Boosting

17

slide-18
SLIDE 18
  • Methods
  • Model
  • Algorithm

18

slide-19
SLIDE 19

K-nearest neighbors

19

slide-20
SLIDE 20

K-nearest neighbors

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

23

For text classification with Laplace smoothing:

slide-24
SLIDE 24

24

slide-25
SLIDE 25

Linear regression

  • Data are continuous inputs and outputs

25

slide-26
SLIDE 26

Objective function (model)

The objective function is called the residual sum of squares:

26

slide-27
SLIDE 27

Probabilistic interpretation

A discriminative model that assumes the response Gaussian with mean

27

slide-28
SLIDE 28

Regularization

28

slide-29
SLIDE 29

Prior Distribution

29

slide-30
SLIDE 30

Prior Distribution

  • Lasso's prior peaked at 0 means expect many params to be zero
  • Ridge's prior flatter and fatter around 0 means we expect many

coefficients to be smallish

30

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

Logistic regression

33

slide-34
SLIDE 34

Neural networks

34

slide-35
SLIDE 35

Gradient descent

35

slide-36
SLIDE 36

Stochastic gradient descent

36

slide-37
SLIDE 37

Forward algorithm

37

slide-38
SLIDE 38

Backpropagation

38

slide-39
SLIDE 39

Neural network techniques

  • Momentum
  • Dropout
  • Batch normalization
  • Weight initialization

39

slide-40
SLIDE 40

Hard-margin SVM

40

slide-41
SLIDE 41

Soft-margin SVM

41

slide-42
SLIDE 42

KKT conditions

42

slide-43
SLIDE 43

KKT conditions

43

slide-44
SLIDE 44

SMO algorithm

44

slide-45
SLIDE 45

Kernels

45

slide-46
SLIDE 46

46

slide-47
SLIDE 47

Feature engineering

47

slide-48
SLIDE 48

Multi-class Classification

  • Reduction
  • One-against-all
  • All-pairs

48

slide-49
SLIDE 49

One-against-all

  • Break k-class problem into k binary problems and solve separately
  • Combine predictions: evaluates all h’s, take the one with highest confidence

49

slide-50
SLIDE 50

All-pairs

  • Break k-class problem into k(k-1)/2 binary problems and solve separately
  • Combine predictions: evaluates all h’s, take the one with highest sum

confidence

50

slide-51
SLIDE 51

51

slide-52
SLIDE 52

Ensemble methods

  • Bagging
  • Train classifiers on subsets of data
  • Predict based on majority vote
  • Stacking
  • Take multiple classifiers’ outputs as inputs and train another classifier

to make final prediction

52

slide-53
SLIDE 53

Adaboost

53

slide-54
SLIDE 54

Adaboost

54

slide-55
SLIDE 55

Good luck!

55