AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten - - PowerPoint PPT Presentation

and machine learning
SMART_READER_LITE
LIVE PREVIEW

AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten - - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Example Handwritten Digit Recognition Polynomial Curve Fitting Sum-of-Squares Error Function 0 th Order Polynomial 1 st Order Polynomial 3 rd Order Polynomial 9 th Order


slide-1
SLIDE 1

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 1: INTRODUCTION

slide-2
SLIDE 2

Example

Handwritten Digit Recognition

slide-3
SLIDE 3

Polynomial Curve Fitting

slide-4
SLIDE 4

Sum-of-Squares Error Function

slide-5
SLIDE 5

0th Order Polynomial

slide-6
SLIDE 6

1st Order Polynomial

slide-7
SLIDE 7

3rd Order Polynomial

slide-8
SLIDE 8

9th Order Polynomial

slide-9
SLIDE 9

Over-fitting

Root-Mean-Square (RMS) Error:

slide-10
SLIDE 10

Polynomial Coefficients

slide-11
SLIDE 11

Data Set Size:

9th Order Polynomial

slide-12
SLIDE 12

Data Set Size:

9th Order Polynomial

slide-13
SLIDE 13

Regularization

Penalize large coefficient values

slide-14
SLIDE 14

Regularization:

slide-15
SLIDE 15

Regularization:

slide-16
SLIDE 16

Regularization: vs.

slide-17
SLIDE 17

Polynomial Coefficients

slide-18
SLIDE 18

Probability Theory

Apples and Oranges

slide-19
SLIDE 19

Probability Theory

Marginal Probability Conditional Probability Joint Probability

slide-20
SLIDE 20

Probability Theory

Sum Rule Product Rule

slide-21
SLIDE 21

The Rules of Probability

Sum Rule Product Rule

slide-22
SLIDE 22

Bayes’ Theorem

posterior  likelihood × prior

slide-23
SLIDE 23

Probability Densities

slide-24
SLIDE 24

Transformed Densities

slide-25
SLIDE 25

Expectations

Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

slide-26
SLIDE 26

Variances and Covariances

slide-27
SLIDE 27

The Gaussian Distribution

slide-28
SLIDE 28

Gaussian Mean and Variance

slide-29
SLIDE 29

The Multivariate Gaussian

slide-30
SLIDE 30

Gaussian Parameter Estimation

Likelihood function

slide-31
SLIDE 31

Maximum (Log) Likelihood

slide-32
SLIDE 32

Properties of and

slide-33
SLIDE 33

Curve Fitting Re-visited

slide-34
SLIDE 34

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

slide-35
SLIDE 35

Predictive Distribution

slide-36
SLIDE 36

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

slide-37
SLIDE 37

Bayesian Curve Fitting

slide-38
SLIDE 38

Bayesian Predictive Distribution

slide-39
SLIDE 39

Model Selection

Cross-Validation

slide-40
SLIDE 40

Curse of Dimensionality

slide-41
SLIDE 41

Curse of Dimensionality

Polynomial curve fitting, M = 3 Gaussian Densities in higher dimensions

slide-42
SLIDE 42

Decision Theory

Inference step Determine either or . Decision step For given x, determine optimal t.

slide-43
SLIDE 43

Minimum Misclassification Rate

slide-44
SLIDE 44

Minimum Expected Loss

Example: classify medical images as ‘cancer’ or ‘normal’

Decision Truth

slide-45
SLIDE 45

Minimum Expected Loss

Regions are chosen to minimize

slide-46
SLIDE 46

Reject Option

slide-47
SLIDE 47

Why Separate Inference and Decision?

  • Minimizing risk (loss matrix may change over time)
  • Reject option
  • Unbalanced class priors
  • Combining models
slide-48
SLIDE 48

Decision Theory for Regression

Inference step Determine . Decision step For given x, make optimal prediction, y(x), for t. Loss function:

slide-49
SLIDE 49

The Squared Loss Function

slide-50
SLIDE 50

Generative vs Discriminative

Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly

slide-51
SLIDE 51

Entropy

Important quantity in

  • coding theory
  • statistical physics
  • machine learning
slide-52
SLIDE 52

Entropy

Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x? All states equally likely

slide-53
SLIDE 53

Entropy

slide-54
SLIDE 54

Entropy

In how many ways can N identical objects be allocated M bins? Entropy maximized when

slide-55
SLIDE 55

Entropy

slide-56
SLIDE 56

Differential Entropy

Put bins of width ¢ along the real line Differential entropy maximized (for fixed ) when in which case

slide-57
SLIDE 57

Conditional Entropy

slide-58
SLIDE 58

The Kullback-Leibler Divergence

slide-59
SLIDE 59

Mutual Information