Multiple Changepoint Detection in Climate Time Series Robert Lund - - PowerPoint PPT Presentation

multiple changepoint detection in climate time series
SMART_READER_LITE
LIVE PREVIEW

Multiple Changepoint Detection in Climate Time Series Robert Lund - - PowerPoint PPT Presentation

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study Multiple Changepoint Detection in Climate Time Series Robert Lund Clemson Math Sciences Lund@Clemson.edu Joint work with Shanghong Li, Yingbo Li, and Hewa


slide-1
SLIDE 1

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Multiple Changepoint Detection in Climate Time Series

Robert Lund Clemson Math Sciences Lund@Clemson.edu

Joint work with Shanghong Li, Yingbo Li, and Hewa Priyadarshani

June 13, 2017

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-2
SLIDE 2

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

The Need to Detect Changepoints.

Changepoints are discontinuity times (inhomogeneities) in a time

  • series. In climate settings, these can be induced from changes in
  • bservation locations, equipment, measurement techniques,

environmental changes, etc. In this talk, a changepoint is a time where the mean of the series first undergoes a structural pattern change. Changepoint issues are critical when estimating trends. Many changepoints go undocumented. Changepoint techniques can help calibrate new gauges.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-3
SLIDE 3

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Tuscaloosa, AL Annual Temperatures

1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Time of Observation (Year) 15 16 17 18 19 20 Observed Temperature (Degrees Celsius)

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-4
SLIDE 4

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

New Bedford, MA Annual Temperatures

Time of Observation (Year) Observed Temperature (Degrees Celsius) 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 7 9 11 13 Yearly Temperatures at New Bedford MA With Least Squares Trends

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-5
SLIDE 5

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Key Questions

How many changepoints are there? At what times do the changepoints occur? Some recent penalized likelihood references:

  • 1. Davis, Lee, and Rodriguez-Yam, Journal of the American

Statistical Association, (2006).

  • 2. Lu, Lund, and Lee, Annals of Applied Statistics, (2010).
  • 3. Li and Lund, Journal of Climate, (2012).

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-6
SLIDE 6

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Our Model

For annual (T = 1), monthly (T = 12), or daily (T = 365) data,

  • ur model for the data {Xt}N

t=1 takes a time series regression:

XnT+ν = µν + α(nT + ν) + δnT+ν + ǫnT+ν. The seasonal index ν ∈ {1, . . . , T}. µν is the seasonal mean at season ν. α is a linear trend parameter, which may or may not be

  • needed. Other trend functions are possible.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-7
SLIDE 7

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

More on the Model

For annual (T = 1), monthly (T = 12) or daily (T = 365) data,

  • ur model for the data {Xt} is a time series regression:

XnT+ν = µν + α(nT + ν) + δnT+ν + ǫnT+ν. The mean shifts are parametrized in {δnT+ν}: δt =          ∆1 = 0, 1 ≤ t < τ1, ∆2, τ1 ≤ t < τ2, . . . ∆m+1, τm ≤ t < τm+1. The errors {ǫnT+ν} are a zero mean autoregressive process (this is periodic if T > 1).

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-8
SLIDE 8

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

The model for annual data

The model for annual data is Xt = µ + αt + δt + ǫt. Location parameter: µ Linear trend: αt Piecewise constant mean shifts: δt Stationary but correlated errors {ǫt}

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-9
SLIDE 9

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Periodic Autoregressions

A zero-mean series {ǫnT+ν} is called a periodic autoregression of

  • rder p (PAR(p)) if it satisfies the periodic linear difference

equation ǫnT+ν =

p

  • k=1

φk(ν)ǫnT+ν−k + ZnT+ν. Here, {ZnT+ν} is zero-mean periodic white noise with Var(ZnT+ν) = σ2(ν) > 0 for all seasons ν. φ1(ν), . . . , φp(ν) are the PAR coefficients during season ν. Such series are indeed “periodically stationary”.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-10
SLIDE 10

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Penalized Likelihood Methods

A penalized likelihood for our model has form − log(L∗(m, τ1, . . . , τm)) + Penalty(m, τ1, . . . , τm). L∗(m, τ1, . . . , τm) is an optimized model likelihood given the changepoint count m and location times τ1 < · · · < τm. Penalty(m, τ1, . . . , τm) is a penalty for the changepoint configuration. Common Penalty(m; τ1, . . . , τm) terms used: AIC = 2m. BIC = m ln(N). MDL = m+1

i=1 ln(τi − τi−1)/2 + ln(m) + m i=2 ln(τi).

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-11
SLIDE 11

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

New Bedford, MA Annual Precipitations

New Bedford, MA Annual Precipitation

Time of observation Annual Precipitation(mm) 1815 1835 1855 1875 1895 1915 1935 1955 1975 1995 500 700 900 1100 1300 1500 1700 1900

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-12
SLIDE 12

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Lognormal Annual Precipitation Setup

The logarithm of {Xt} is modeled as a Gaussian time series with no trend, multiple mean shifts, and autoregressive errors (AR(p)). Here, T = 1: no periodicities. For each changepoint configuration (m; τ1, . . . , τm), we must Fit a time series model with optimal time series parameters and mean shift sizes. Compute the penalty MDL(m; τ1, . . . , τm) =

m+1

  • i=1

ln(τi −τi−1)/2+ln(m)+

m

  • i=2

ln(τi).

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-13
SLIDE 13

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Two Segment Models

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-14
SLIDE 14

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Three Segment Models

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-15
SLIDE 15

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Four Segment Models

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-16
SLIDE 16

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Five Segment Models

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-17
SLIDE 17

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Six Segment Models

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-18
SLIDE 18

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Summary

The table below shows optimum MDL scores for various numbers

  • f model segments. These values were obtained by exhaustive

search and are exact.

Table: Optimum MDL Scores

# Segments Changepoint Times MDL Score 1 —

  • 296.7328

2 1967

  • 303.8382

3 1917, 1967

  • 306.6359

4 1867, 1910, 1967

  • 309.2878

5 1867, 1910, 1965, 1967

  • 309.8570

6 1829, 1832, 1867, 1910, 1967

  • 308.2182

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-19
SLIDE 19

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

The Combinatorial Wall

We need to minimize − log(L∗(m, τ1, . . . , τm)) + MDL(m, τ1, . . . , τm).

  • ver all m and τ1, . . . , τm.

An exhaustive search over all models with m changepoints requires evaluation of N

m

  • MDL scores.

Summing this over m = 0, 1, . . . , N − 1 shows that an exhaustive

  • ptimization requires 2N−1 different MDL evaluations.

We now devise a genetic algorithm for this. A genetic algorithm is an intelligent random walk search.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-20
SLIDE 20

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Genetic Algorithms (GAs)

Chromosome Representation. Each changepoint configuration has the form (m; τ1, . . . , τm).

  • Selection. Give mating preference to the fittest individuals,

allowing them to pass their genes on to the next generation. Fitness is determined by the objective function − log(L∗(m; τ1, . . . , τm)) + MDL(m; τ1, . . . , τm).

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-21
SLIDE 21

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

GA Details

Chromosome Crossover. Two individuals are chosen as parents from the current generation to breed — call these (m; τ1, . . . , τm) and (k; η1, . . . , ηk). A new chromosome (ℓ; ξ1, . . . , ξℓ), having traits of both parents, is created from these individuals to produce children for the next generation.

  • Mutation. Increases the diversity of the population, preventing

premature convergence to poor solutions. Our mutation mechanism allows a small portion of generated children to have extra changepoints. After each child is formed from its parents, each and every non-changepoint time is independently allowed to become a changepoint time with probability pm. Typically, pm is small: pm = 0.003.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-22
SLIDE 22

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

GA Details

Algorithm Termination. Successive generations are simulated until a termination condition has been reached. Common terminating conditions are: A solution is found that satisfies a minimum criteria. A fixed number of generations is reached. The generation’s fittest ranking member is peaking (successive iterations no longer produce better results). The fittest member of the terminating generation is deemed as the solution.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-23
SLIDE 23

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Optimal Model

The GA algorithm converged to a model with four changepoints at times 1867, 1910, 1965, and 1967. The minimum MDL score achieved was -309.8570. This segmentation is graphed against the data and appears visually reasonable.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-24
SLIDE 24

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Optimal Model Has Four Changepoints!

Fitted New Bedford, MA Model

Time of observation Annual Precipitation(mm) 1815 1835 1855 1875 1895 1915 1935 1955 1975 1995 500 700 900 1100 1300 1500 1700 1900

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-25
SLIDE 25

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Simulations — Set I

Mimics the New Bedford Data with lognormal distributions: 1000 series of length N = 200 with no trend, seasonality, or changepoints: µt ≡ 6.8. AR(1) errors {ǫt} with φ = 0.2 and σ2 = 0.025.

Table: Empirical proportions of estimated changepoint numbers. The correct value of m is zero.

m Percent 99.0 % 1 0.4 % 2 0.5 % 3+ 0.1 %

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-26
SLIDE 26

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Simulations — Set II

µt =        6.8 1 ≤ t ≤ 49 7.0 50 ≤ t ≤ 99 7.2 100 ≤ t ≤ 149 7.4 150 ≤ t ≤ 200 .

Table: Empirical proportions of estimated changepoint numbers (m = 3)

m Percent 0.0 % 1 3.6 % 2 28.8 % 3 63.1 % 4 4.3 % 5+ 0.2 %

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-27
SLIDE 27

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Count Detection Histogram

50 100 150 200 100 200 300 400 500 600 700 800 900 1000 Time Index Detection Count Log Normal Simulations−Set II

Figure: The detected changepoint times cluster around their true times

  • f 50, 100, and 150.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-28
SLIDE 28

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Simulations — Set III

µt =        6.8 1 ≤ t ≤ 24 7.0 25 ≤ t ≤ 74 6.6 75 ≤ t ≤ 99 6.8 100 ≤ t ≤ 200 .

Table: Empirical proportions of estimated changepoints (m = 3)

m Percent 0.0 % 1 6.0 % 2 19.5 % 3 69.2 % 4 5.1 % 5+ 0.2 %

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-29
SLIDE 29

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Count Detection Histogram

50 100 150 200 100 200 300 400 500 600 700 800 900 1000 Time Index Detection Count Log Normal Simulations−Set III

Figure: The detected changepoint times cluster around their true times

  • f 25, 75, and 100.

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series

slide-30
SLIDE 30

Introduction Annual New Bedford Precipitations Genetic Algorithms Simulation Study

Thank you!

Robert Lund Clemson Math Sciences Lund@Clemson.edu Multiple Changepoint Detection in Climate Time Series