Regression, Curve Fitting and Optimisation Sam Tickle Supervised by - - PowerPoint PPT Presentation

regression curve fitting and optimisation
SMART_READER_LITE
LIVE PREVIEW

Regression, Curve Fitting and Optimisation Sam Tickle Supervised by - - PowerPoint PPT Presentation

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach Hard Functions An Application: Extreme Value Theory Conclusion Regression, Curve Fitting and Optimisation Sam Tickle Supervised by Elena Zanini


slide-1
SLIDE 1

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Regression, Curve Fitting and Optimisation

Sam Tickle Supervised by Elena Zanini

STOR-i, University of Lancaster

4 September 2015

Sam Tickle Regression, Curve Fitting and Optimisation

slide-2
SLIDE 2

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

1

Introduction Root Finding

2

Nelder-Mead Algorithm

3

Stochastic Algorithms Simulated Annealing

4

A Non-Parametric Approach

5

‘Hard’ Functions The Rosenbrock Banana Function

6

An Application: Extreme Value Theory

7

Conclusion

Sam Tickle Regression, Curve Fitting and Optimisation

slide-3
SLIDE 3

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Given a set of data, what is the optimum curve which may be fitted? This question has obvious importance in queries regarding relationships between two or more variables, as well as explaining data quantitatively.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-4
SLIDE 4

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

If a straight line is needed, we can do the standard trick of using Ordinary Least Squares (OLS). However, there will be situations in which this may not be appropriate.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-5
SLIDE 5

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Some Less Trivial Examples

5 10 15 20 100 200 300 400 x y 5 10 15 0.5 1.0 1.5 2.0 x y 10 20 30 5 10 15 20 25 30 x y

Sam Tickle Regression, Curve Fitting and Optimisation

slide-6
SLIDE 6

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

We observe that the OLS inference arises from an optimisation problem, namely argminb∈Rp||Y − Xb||2. So it makes sense to think about the problem of optimal curve fitting from the perspective of optimisation.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-7
SLIDE 7

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Optimisation has an obvious analogue in root finding. There are several core methods we can use for this: Bisection; Newton-Raphson; Secant; Muller’s. All of these (except Newton-Raphson) are derivative-free.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-8
SLIDE 8

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

In higher dimensions, one of the more effective non-derivative-free methods is the Broydon-Fletcher-Goldfarb-Shanno (BFGS) Method, which can be adapted to optimise by changing the iterative equation to xn+1 = xn − [Hf (xn)]−1∇f (xn).

Sam Tickle Regression, Curve Fitting and Optimisation

slide-9
SLIDE 9

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

The Nelder-Mead Algorithm

Suppose our goal is to minimise the function f (x), where x ∈ Rn.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-10
SLIDE 10

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Start with n + 1 test points: x1, ..., xn+1.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-11
SLIDE 11

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Order these points by output value, so that f (x1) ≤ f (x2) ≤ ... ≤ f (xn+1).

x1 x2 x3

Sam Tickle Regression, Curve Fitting and Optimisation

slide-12
SLIDE 12

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

We consider several different ‘candidate points’, and if these aren’t an improvement, then we shrink the simplex.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-13
SLIDE 13

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

How well does this work on the problem?

5 10 15 20 100 200 300 400 xrange yrange 5 10 15 0.5 1.0 1.5 2.0 xrange yrange 5 10 15 20 25 5 10 15 xrange yrange

Sam Tickle Regression, Curve Fitting and Optimisation

slide-14
SLIDE 14

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Disadvantages of Nelder-Mead

We usually require a reasonable idea of the form of the relationship between the two variables in question to produce a reasonable eventual plot; If the data do not conform well to the ‘true’ underlying relationship, the procedure can be very costly, and could arrive at an incorrect answer if the initial conditions are poorly specified.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-15
SLIDE 15

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Stochastic Algorithms

Several alternative methods of optimisation can be used which employ a probabilistic approach. These include: Simulated Annealing; Genetic Algorithms; Ant Colony Optimisation.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-16
SLIDE 16

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Simulated Annealing (SA) is a physical process describing the cooling of a material in a system with a controlled negative temperature gradient. It can be observed that under situations where a substance such as water cools in such a system, an ‘optimal’ solid arrangement is

  • btained.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-17
SLIDE 17

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

How SA works

To use Simulated Annealing in an optimisation problem, the following need to be well defined: The neighbours of each state - e.g. for a discrete domain, a rearrangement of two adjacent states; The energies of each state; The probability of moving from state S to state S′ - states with smaller energy preferred, so P(E, E ′, T) > P(E, E ′′, T) when E ′ < E ′′.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-18
SLIDE 18

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

How SA works

In the problem of curve fitting: We shall define a ‘neighbour’ of the current curve as an addition of a small, simple function; The probabilites shall be set as follows:

If E < E ′, then P(E, E ′, T) ∝ exp( E−E ′

T

); Else, P(E, E ′, T) ∝ 1.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-19
SLIDE 19

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

How well does this work on the problem?

10 20 30 6 8 10 12 14 16 18 xrange yrange 5 10 15 20 25 2 4 6 8 10 12 14 xrange yrange

Sam Tickle Regression, Curve Fitting and Optimisation

slide-20
SLIDE 20

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Disadvantages of SA

Often requires a high starting temperature to achieve a reasonable result; The model is very sensitive to starting temperature - choice is not obvious; Is very difficult to achieve a fairly accuracte solution, as it is difficult to construct well defined neighbours which enable effective ‘zeroing in’ on a state in a continuous domain.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-21
SLIDE 21

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

A Non-Parametric Approach

Suppose we had no intuition at all as to an underlying relationship, such as in the example shown below.

10 20 30 5 10 15 20 25 30 x y

Sam Tickle Regression, Curve Fitting and Optimisation

slide-22
SLIDE 22

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

One way of tackling the problem of curve fitting in this instance is to give each point an associated ‘reward’ function, with shape similar to a hillock.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-23
SLIDE 23

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

A reward function found to be useful is f (r) = kde−r0.55, where kd is a constant depending on the datapoint d and r is the Euclidean distance from the datapoint. Can take kd = e−Dd.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-24
SLIDE 24

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

A ‘total’ reward function is then constructed by summing all the reward functions, and then this can be optimised through a ‘brainless search’ for the curve that optimises reward.

5 10 15 20 25 5 10 15 20 Size of largest city proper by population size (millions) Size of second largest city proper by population size (millions)

Sam Tickle Regression, Curve Fitting and Optimisation

slide-25
SLIDE 25

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Disadvantages of this approach

Model prone to overfitting; Additional methodology may therefore be needed, such as Cross-Validation or Akaike’s Information Criterion; Depending on the initial weighting, the resultant ‘optimal’ curve can favour the OLS line.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-26
SLIDE 26

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

All these methods were tried on a series of standard test functions before moving on to a real-life application.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-27
SLIDE 27

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

‘Hard’ Functions

There are several functions which are notoriously tricky to optimise

  • numerically. These were used to test the robustness of the

algorithms involved. Some examples include: The Rosenbrock Banana Function; Five-Uneven-Peak Trap; Equal Maxima; Uneven Decreasing Maxima.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-28
SLIDE 28

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

The Rosenbrock Banana Function

This function takes the form f (x, y) = (a − x)2 + b(y − x2)2, for some a, b.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-29
SLIDE 29

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Extreme Value Theory, A Brief Background

One way of defining ‘extreme’ events is to define a threshold, and anything exceeding this threshold is classed as extreme. This gives rise to the Generalised Pareto Distribution (GPD), whose likelihood is L(σ, ξ) =

1 σk

k

i=1(1 + ξ yi σ )−(1+ 1

ξ ), ξ = 0,

L(σ, ξ) =

1 σk

k

i=1 exp( −yi σ ), ξ = 0;

where k is the number of datapoints exceeding the threshold, ξ is the shape and σ is the scale.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-30
SLIDE 30

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

500 1000 1500 −0.06 −0.04 −0.02 0.00 0.02 0.04 days elapsed difference in log of closures 5000 10000 15000 20 40 60 80 Day Number Amount of rainfall(mm)

The left figure corresponds to log-differences of daily closing prices between 1996 and 2000. The right figure shows daily rainfall accumulations in South West England between 1914 and 1962.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-31
SLIDE 31

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

We use Nelder-Mead to fit the GPD and obtain Dataset Threshold ˆ σ ˆ ξ Rain 30 7.44 0.18 Dow Jones 2 0.50 0.29 (Candidate thresholds were chosen by observation using the mean residual life plot.) Other procedures, such as Simulated Annealing, proved to be less successful than Nelder-Mead at finding the MLEs.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-32
SLIDE 32

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Other Extreme Value Theory Machinery

There are several other things we can consider: An alternative and theoretically equivalent approach would be to use a Poisson Point Process (PPP) model; Sometimes the underlying process is more complicated, and covariates need to be added to the model. The first of these is still relative straightforward using Nelder-Mead, however introducing covariates is more complex, and will often result in a convergence to a local optimum.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-33
SLIDE 33

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Conclusions

In general: Nelder-Mead remains a very effective algorithm used for ‘blind

  • ptimisation’;

SA shoud be preferred only if there is a strong intuition for a starting temperature. Pinning down a sensible starting value for the temperature may be a fruitful approach in further work; Computationally, gradient-free methods are preferred.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-34
SLIDE 34

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

Conclusions

With respect to Extreme Value Theory: Nelder-Mead becomes highly sensitive to initial conditions in the covariate case; Investigating the application of SA and an effective choice of threshold may be of interest.

Sam Tickle Regression, Curve Fitting and Optimisation

slide-35
SLIDE 35

Introduction Nelder-Mead Algorithm Stochastic Algorithms A Non-Parametric Approach ‘Hard’ Functions An Application: Extreme Value Theory Conclusion

References

Atkinson, K.E. (1989). An introduction to numerical analysis. Inference and background on deterministic algorithms. Nocedal, J. & Wright, S.J. (2006). Numerical optimization. Higher-dimensional deterministic methods. Reeves, C.R. (1995). Modern Heuristic Techniques for Combinatorial Problems. Simulated Annealing Reference. Coles, S. (2004). An Introduction to Statistical Modeling of Extreme Values.

Sam Tickle Regression, Curve Fitting and Optimisation