PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION - - PowerPoint PPT Presentation

pattern recognition and machine learning
SMART_READER_LITE
LIVE PREVIEW

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION - - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as


slide-1
SLIDE 1

PATTERN RECOGNITION

AND MACHINE LEARNING

CHAPTER 1: INTRODUCTION

slide-2
SLIDE 2

Pattern Recogniton

Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as classification).

Pattern Recognition

X

Y(X)

slide-3
SLIDE 3

Example

Handwritten Digit Recognition

slide-4
SLIDE 4

Some Terminologies

Supervised Learnings: Inputs with their corresponding

  • utputs are known.
  • Classification: Predicting the output into a finite

number of discrete categories after supervised learning.

  • Regression: Predicting the output as a continuous

variable after supervised learning

Unsupervised learning (density estimation):

  • Clustering the data into groups
slide-5
SLIDE 5

Some Terminologies

Training set: A given set of sample input data used to tune the model parameter. Target vector: Represents the desired output for a given inputs. Training Phase: Determining the precise from

  • f y(X) based on training data.

Generalization: The ability to correctly predict new data. Pre-processing: Reduction of dimension of X

slide-6
SLIDE 6

Polynomial Curve Fitting

slide-7
SLIDE 7

Sum-of-Squares Error Function

slide-8
SLIDE 8

0th Order Polynomial

slide-9
SLIDE 9

1st Order Polynomial

slide-10
SLIDE 10

3rd Order Polynomial

slide-11
SLIDE 11

9th Order Polynomial

slide-12
SLIDE 12

Over-fitting

Root-Mean-Square (RMS) Error:

slide-13
SLIDE 13

Polynomial Coefficients

slide-14
SLIDE 14

Data Set Size:

9th Order Polynomial

slide-15
SLIDE 15

Data Set Size:

9th Order Polynomial

slide-16
SLIDE 16

Regularization

Penalize large coefficient values

slide-17
SLIDE 17

Regularization:

slide-18
SLIDE 18

Regularization:

slide-19
SLIDE 19

Regularization: vs.

slide-20
SLIDE 20

Polynomial Coefficients

slide-21
SLIDE 21

Probability Theory

Apples and Oranges

slide-22
SLIDE 22

Probability Theory

Marginal Probability Conditional Probability Joint Probability

slide-23
SLIDE 23

Probability Theory

Sum Rule Product Rule

slide-24
SLIDE 24

The Rules of Probability

Sum Rule Product Rule

slide-25
SLIDE 25

Bayes’ Theorem

posterior  likelihood × prior

slide-26
SLIDE 26

Probability Densities

slide-27
SLIDE 27

Transformed Densities

slide-28
SLIDE 28

Expectations

Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

slide-29
SLIDE 29

Variances and Covariances

slide-30
SLIDE 30

The Gaussian Distribution

slide-31
SLIDE 31

Gaussian Mean and Variance

slide-32
SLIDE 32

The Multivariate Gaussian

slide-33
SLIDE 33

Gaussian Parameter Estimation

Likelihood function

slide-34
SLIDE 34

Maximum (Log) Likelihood

slide-35
SLIDE 35

Properties of and

slide-36
SLIDE 36

Curve Fitting Re-visited

slide-37
SLIDE 37

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

slide-38
SLIDE 38

Predictive Distribution

slide-39
SLIDE 39

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

slide-40
SLIDE 40

Bayesian Curve Fitting

slide-41
SLIDE 41

Bayesian Predictive Distribution

slide-42
SLIDE 42

Model Selection

Cross-Validation

slide-43
SLIDE 43

Curse of Dimensionality

slide-44
SLIDE 44

Curse of Dimensionality

Polynomial curve fitting, M = 3 Gaussian Densities in higher dimensions

slide-45
SLIDE 45

Decision Theory

Inference step Determine either or . Decision step For given x, determine optimal t.

slide-46
SLIDE 46

Minimum Misclassification Rate

slide-47
SLIDE 47

Minimum Expected Loss

Example: classify medical images as ‘cancer’ or ‘normal’

Decision Truth

slide-48
SLIDE 48

Minimum Expected Loss

Regions are chosen to minimize

slide-49
SLIDE 49

Reject Option

slide-50
SLIDE 50

Why Separate Inference and Decision?

  • Minimizing risk (loss matrix may change over time)
  • Reject option
  • Unbalanced class priors
  • Combining models
slide-51
SLIDE 51

Decision Theory for Regression

Inference step Determine . Decision step For given x, make optimal prediction, y(x), for t. Loss function:

slide-52
SLIDE 52

The Squared Loss Function

slide-53
SLIDE 53

Generative vs Discriminative

Generative approach: Model Use Bayes’ theorem Discriminative approach: Model directly

slide-54
SLIDE 54

Entropy

Important quantity in

  • coding theory
  • statistical physics
  • machine learning
slide-55
SLIDE 55

Entropy

Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x? All states equally likely

slide-56
SLIDE 56

Entropy

slide-57
SLIDE 57

Entropy

In how many ways can N identical objects be allocated M bins? Entropy maximized when

slide-58
SLIDE 58

Entropy

slide-59
SLIDE 59

Differential Entropy

Put bins of width ¢ along the real line Differential entropy maximized (for fixed ) when in which case

slide-60
SLIDE 60

Conditional Entropy

slide-61
SLIDE 61

The Kullback-Leibler Divergence

slide-62
SLIDE 62

Mutual Information