PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting - - PowerPoint PPT Presentation

pattern recognition and machine learning polynomial curve
SMART_READER_LITE
LIVE PREVIEW

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting - - PowerPoint PPT Presentation

Christopher M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0 th Order Polynomial 1 st Order Polynomial 3 rd Order Polynomial 9 th Order Polynomial Over-fitting Root-Mean-Square (RMS)


slide-1
SLIDE 1

Christopher M. Bishop

PATTERN RECOGNITION

AND MACHINE LEARNING

slide-2
SLIDE 2

Polynomial Curve Fitting

slide-3
SLIDE 3

Sum-of-Squares Error Function

slide-4
SLIDE 4

0th Order Polynomial

slide-5
SLIDE 5

1st Order Polynomial

slide-6
SLIDE 6

3rd Order Polynomial

slide-7
SLIDE 7

9th Order Polynomial

slide-8
SLIDE 8

Over-fitting

Root-Mean-Square (RMS) Error:

slide-9
SLIDE 9

Polynomial Coefficients

slide-10
SLIDE 10

Data Set Size:

9th Order Polynomial

slide-11
SLIDE 11

Data Set Size:

9th Order Polynomial

slide-12
SLIDE 12

Regularization

Penalize large coefficient values

slide-13
SLIDE 13

Regularization:

slide-14
SLIDE 14

Regularization:

slide-15
SLIDE 15

Regularization: vs.

slide-16
SLIDE 16

Polynomial Coefficients

slide-17
SLIDE 17

The Gaussian Distribution

slide-18
SLIDE 18

Gaussian Parameter Estimation

Likelihood function

slide-19
SLIDE 19

Maximum (Log) Likelihood

slide-20
SLIDE 20

Properties of and

slide-21
SLIDE 21

Curve Fitting Re-visited

slide-22
SLIDE 22

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

slide-23
SLIDE 23

Predictive Distribution

slide-24
SLIDE 24

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

slide-25
SLIDE 25

Bayesian Curve Fitting

slide-26
SLIDE 26

Bayesian Predictive Distribution

slide-27
SLIDE 27

Model Selection

Cross-Validation

slide-28
SLIDE 28

Parametric Distributions

Basic building blocks: Need to determine given Representation: or ? Recall Curve Fitting

slide-29
SLIDE 29

Binary Variables (1)

Coin flipping: heads=1, tails=0 Bernoulli Distribution

slide-30
SLIDE 30

Binary Variables (2)

N coin flips: Binomial Distribution

slide-31
SLIDE 31

Binomial Distribution

slide-32
SLIDE 32

Parameter Estimation (1)

ML for Bernoulli

Given:

slide-33
SLIDE 33

Parameter Estimation (2)

Example:

Prediction: all future tosses will land heads up

Overfitting to D

slide-34
SLIDE 34

Beta Distribution

Distribution over .

slide-35
SLIDE 35

Bayesian Bernoulli

The Beta distribution provides the conjugate prior for the Bernoulli distribution.

slide-36
SLIDE 36

Beta Distribution

slide-37
SLIDE 37

Prior ∙ Likelihood = Posterior

slide-38
SLIDE 38

Properties of the Posterior

As the size of the data set, N , increase

slide-39
SLIDE 39

Prediction under the Posterior

What is the probability that the next coin toss will land heads up?

slide-40
SLIDE 40

Multinomial Variables

1-of-K coding scheme:

slide-41
SLIDE 41

ML Parameter estimation

Given: Ensure , use a Lagrange multiplier, ¸.

slide-42
SLIDE 42

The Multinomial Distribution

slide-43
SLIDE 43

The Dirichlet Distribution

Conjugate prior for the multinomial distribution.

slide-44
SLIDE 44

Bayesian Multinomial (1)

slide-45
SLIDE 45

Bayesian Multinomial (2)

slide-46
SLIDE 46

The Gaussian Distribution

slide-47
SLIDE 47

Maximum Likelihood for the Gaussian (1)

Given i.i.d. data , the log likeli- hood function is given by Sufficient statistics

slide-48
SLIDE 48

Maximum Likelihood for the Gaussian (2)

Set the derivative of the log likelihood function to zero, and solve to obtain Similarly

slide-49
SLIDE 49

Maximum Likelihood for the Gaussian (3)

Under the true distribution Hence define

slide-50
SLIDE 50

Bayesian Inference for the Gaussian (1)

Assume ¾2 is known. Given i.i.d. data , the likelihood function for ¹ is given by This has a Gaussian shape as a function of ¹ (but it is not a distribution over ¹).

slide-51
SLIDE 51

Bayesian Inference for the Gaussian (2)

Combined with a Gaussian prior over ¹, this gives the posterior Completing the square over ¹, we see that

slide-52
SLIDE 52

Bayesian Inference for the Gaussian (3)

… where Note:

slide-53
SLIDE 53

Bayesian Inference for the Gaussian (4)

Example: for N = 0, 1, 2 and 10.

slide-54
SLIDE 54

Bayesian Inference for the Gaussian (5)

Sequential Estimation The posterior obtained after observing N { 1 data points becomes the prior when we

  • bserve the Nth data point.
slide-55
SLIDE 55

Bayesian Inference for the Gaussian (6)

Now assume ¹ is known. The likelihood function for ¸ = 1/¾2 is given by This has a Gamma shape as a function of ¸.

slide-56
SLIDE 56

Bayesian Inference for the Gaussian (7)

The Gamma distribution

slide-57
SLIDE 57

Bayesian Inference for the Gaussian (8)

Now we combine a Gamma prior, , with the likelihood function for ¸ to obtain which we recognize as with

slide-58
SLIDE 58

Bayesian Inference for the Gaussian (9)

If both ¹ and ¸ are unknown, the joint likelihood function is given by We need a prior with the same functional dependence on ¹ and ¸.

slide-59
SLIDE 59

Bayesian Inference for the Gaussian (10)

The Gaussian-gamma distribution

  • Quadratic in ¹.
  • Linear in ¸.
  • Gamma distribution over ¸.
  • Independent of ¹.
slide-60
SLIDE 60

Bayesian Inference for the Gaussian (11)

The Gaussian-gamma distribution

slide-61
SLIDE 61

Bayesian Inference for the Gaussian (12)

Multivariate conjugate priors

  • ¹ unknown, ¤ known: p(¹) Gaussian.
  • ¤ unknown, ¹ known: p(¤) Wishart,
  • ¤ and ¹ unknown: p(¹,¤) Gaussian-

Wishart,

slide-62
SLIDE 62

where Infinite mixture of Gaussians.

Student’s t-Distribution

slide-63
SLIDE 63

Student’s t-Distribution

slide-64
SLIDE 64

Student’s t-Distribution

Robustness to outliers: Gaussian vs t-distribution.

slide-65
SLIDE 65

Student’s t-Distribution

The D-variate case: where . Properties:

slide-66
SLIDE 66

The Exponential Family (1)

where ´ is the natural parameter and so g(´) can be interpreted as a normalization coefficient.

slide-67
SLIDE 67

The Exponential Family (2.1)

The Bernoulli Distribution Comparing with the general form we see that and so

Logistic sigmoid

slide-68
SLIDE 68

The Exponential Family (2.2)

The Bernoulli distribution can hence be written as where

slide-69
SLIDE 69

The Exponential Family (3.1)

The Multinomial Distribution where, , and

NOTE: The ´k parameters are not independent since the corresponding ¹k must satisfy

slide-70
SLIDE 70

The Exponential Family (3.2)

Let . This leads to and Here the ´k parameters are independent. Note that and

Softmax

slide-71
SLIDE 71

The Exponential Family (3.3)

The Multinomial distribution can then be written as where

slide-72
SLIDE 72

The Exponential Family (4)

The Gaussian Distribution where

slide-73
SLIDE 73

ML for the Exponential Family (1)

From the definition of g(´) we get Thus

slide-74
SLIDE 74

ML for the Exponential Family (2)

Give a data set, , the likelihood function is given by Thus we have

Sufficient statistic

slide-75
SLIDE 75

Conjugate priors

For any member of the exponential family, there exists a prior Combining with the likelihood function, we get

Prior corresponds to º pseudo-observations with value Â.