[PPT] - Bayesian Minimal Description Lengths for Multiple Changepoint PowerPoint Presentation

SLIDE 1

Bayesian Minimal Description Lengths for Multiple Changepoint Detection

Yingbo Li Dept of Mathematical Sciences, Clemson University

Co-authors: Robert Lund (Clemson University), Anuradha Hewaarachchi (University of Kelaniya)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 1 / 26

SLIDE 2

A Motivating Example

Monthly maximum temperature series in Tuscaloosa, AL

Observed data

Time Tmax, observed value ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 40 50 60 70 80 90 100

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 2 / 26

SLIDE 3

A Motivating Example

Monthly maximum temperature series in Tuscaloosa, AL

Observed data − sample seasonal mean

Time Tmax, seasonal adjusted value ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 −5 5 10

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 2 / 26

SLIDE 4

A Motivating Example

Monthly maximum temperature series in Tuscaloosa, AL

Observed data − sample seasonal mean

Time Tmax, seasonal adjusted value ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 −5 5 10 1957Mar 1990Jan

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 2 / 26

SLIDE 5

A Motivating Example

Monthly maximum temperature series in Tuscaloosa, AL

Observed data − sample seasonal mean

Time Tmax, seasonal adjusted value ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 −5 5 10

x x x x x

1957Mar 1990Jan

Metadata (station history logs) Station relocations: 1921 Nov, 1939 Mar, 1956 Jun, 1987 May Instrumentation changes: 1956 Nov, 1987 May

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 2 / 26

SLIDE 6

A Motivating Example

Monthly maximum temperature series in Tuscaloosa, AL

Observed data − sample seasonal mean

Time Tmax, seasonal adjusted value ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 −5 5 10

x x x x x

1957Mar 1990Jan

Metadata (station history logs): more likely to induce mean shifts Station relocations: 1921 Nov, 1939 Mar, 1956 Jun, 1987 May Instrumentation changes: 1956 Nov, 1987 May

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 2 / 26

SLIDE 7

A Motivating Example

Monthly maximum and minimum temperature series

Observed data − sample seasonal mean

Tmax ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5 10 Tmin ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5 15 Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 3 / 26

SLIDE 8

A Motivating Example

Monthly maximum and minimum temperature series

Observed data − sample seasonal mean

Tmax ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5 10 1957Mar 1990Jan Tmin ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5 15 1918Feb 1957Jul 1990Jan Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 3 / 26

SLIDE 9

A Motivating Example

Monthly maximum and minimum temperature series

Observed data − sample seasonal mean

Tmax ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5 10 1957Mar 1990Jan Tmin ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5 15 1918Feb 1957Jul 1990Jan

Tmax and Tmin are likely to shift at the same time.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 3 / 26

SLIDE 10

A Brief Review of MDL

MDL for Changepiont Detection in Piecewise AR Series

Automatic MDL by Davis et al. [2006]: a penalized likelihood, with penalty being the code length (CL) of parameters log(m) + (m + 1) log(N) + m+1

r=1 log pr

+ m+1

r=1 pr+2 2

log Nr m regime lengths AR orders AR coefficients Automatic MDL rules:

◮ CL of an unbounded positive integer I: log(I) ◮ CL of a positive integer bounded above by U: log(U) ◮ CL of the MLE of a real-valued parameter estimated by N

bservations:

1 2 log N

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 4 / 26

SLIDE 11

A Brief Review of MDL

Model Selection Using MDL Principle

Description length [Risanen, 1989, Hansen and Yu, 2001]: the number of storage units to transmit a random dataset. In model selection, the true model has the smallest MDL.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 5 / 26

SLIDE 12

A Brief Review of MDL

Model Selection Using MDL Principle

Description length [Risanen, 1989, Hansen and Yu, 2001]: the number of storage units to transmit a random dataset. In model selection, the true model has the smallest MDL. Two-part MDL L(X, θ) = L(X | θ) + L(θ) transmit X transmit θ

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 5 / 26

SLIDE 13

A Brief Review of MDL

Model Selection Using MDL Principle

Description length [Risanen, 1989, Hansen and Yu, 2001]: the number of storage units to transmit a random dataset. In model selection, the true model has the smallest MDL. Two-part MDL L(X, θ) = − log f (X | θ) − log π(θ) transmit X transmit θ

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 5 / 26

SLIDE 14

A Brief Review of MDL

Model Selection Using MDL Principle

Description length [Risanen, 1989, Hansen and Yu, 2001]: the number of storage units to transmit a random dataset. In model selection, the true model has the smallest MDL. Two-part MDL L(X, θ) = − log f (X | θ) − log π(θ) transmit X transmit θ Mixture MDL L(X) = − log

f (X | θ)π(θ)dθ
marginal likelihood

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 5 / 26

SLIDE 15

A Brief Review of MDL

Model Selection Using MDL Principle

Description length [Risanen, 1989, Hansen and Yu, 2001]: the number of storage units to transmit a random dataset. In model selection, the true model has the smallest MDL. Two-part MDL L(X, θ) = − log f (X | θ) − log π(θ) transmit X transmit θ Mixture MDL L(X) = − log

f (X | θ)π(θ)dθ
marginal likelihood

◮ If the prior π(θ | τ), combine with two-part MDL:

L(X, ˆ τ) = − log

f (X | θ)π(θ | ˆ

τ)dθ − log π(ˆ τ) Closely related with empirical Bayes marginal likelihood.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 5 / 26

SLIDE 16

Bayesian MDL Method

Bayesian MDL

Observed time series: X1:N = (X1, X2, . . . , XN)′ Suppose m changepoints 1 ≤ τ1 < τ2 < · · · < τm ≤ N partition the timeline m + 1 distinct regimes (segments). Time t is in regime r ⇐ ⇒ τr−1 ≤ t < τr

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 6 / 26

SLIDE 17

Bayesian MDL Method

Bayesian MDL

Observed time series: X1:N = (X1, X2, . . . , XN)′ Suppose m changepoints 1 ≤ τ1 < τ2 < · · · < τm ≤ N partition the timeline m + 1 distinct regimes (segments). Time t is in regime r ⇐ ⇒ τr−1 ≤ t < τr Any time in t = {p + 1, p + 2, . . . , N} can be a changepoint

Definition

Denote a multiple changepoint configuration as a indicator vector η = (ηp+1, ηp+2, . . . , ηN)′, such that ηt =

1,

if time t is a changepoint 0, if time t is not a changepoint

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 6 / 26

SLIDE 18

Bayesian MDL Method

Bayesian MDL

Observed time series: X1:N = (X1, X2, . . . , XN)′ Suppose m changepoints 1 ≤ τ1 < τ2 < · · · < τm ≤ N partition the timeline m + 1 distinct regimes (segments). Time t is in regime r ⇐ ⇒ τr−1 ≤ t < τr Any time in t = {p + 1, p + 2, . . . , N} can be a changepoint

Definition

Denote a multiple changepoint configuration as a indicator vector η = (ηp+1, ηp+2, . . . , ηN)′, such that ηt =

1,

if time t is a changepoint 0, if time t is not a changepoint Number of changepoints in η: m = N

t=p+1 ηt

Total number of models: 2N−p

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 6 / 26

SLIDE 19

Bayesian MDL Method

Prior distribution π(η): Beta-Binomial

If time t is not in metadata ηt

iid

∼ Bernoulli

ρ(1)

, ρ(1) ∼ Beta

a, b(1)

If time t is in metadata ηt

iid

∼ Bernoulli

ρ(2)

, ρ(2) ∼ Beta

a, b(2)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 7 / 26

SLIDE 20

Bayesian MDL Method

Prior distribution π(η): Beta-Binomial

If time t is not in metadata ηt

iid

∼ Bernoulli

ρ(1)

, ρ(1) ∼ Beta

a, b(1)

If time t is in metadata ηt

iid

∼ Bernoulli

ρ(2)

, ρ(2) ∼ Beta

a, b(2)

π(η) has a closed form: π(η) =

2

k=1

1  

t(k)

π

ηt(k) | ρ(k)

  π

ρ(k)

dρ(k)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 7 / 26

SLIDE 21

Bayesian MDL Method

Prior distribution π(η): Beta-Binomial

If time t is not in metadata ηt

iid

∼ Bernoulli

ρ(1)

, ρ(1) ∼ Beta

a, b(1)

If time t is in metadata ηt

iid

∼ Bernoulli

ρ(2)

, ρ(2) ∼ Beta

a, b(2)

π(η) has a closed form: π(η) ∝

2

k=1

Γ

a + m(k)

Γ

b(k) + N(k) − m(k)

Number of:

◮ undocumented times N(1), undocumented changepoints m(1) ◮ documented times N(2), documented changepoints m(2) Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 7 / 26

SLIDE 22

Bayesian MDL Method

Parameter elicitation

Metadata times are more likely to be changepoints E

ρ(1)

= a a + b(1) < a a + b(2) = E

ρ(2)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 8 / 26

SLIDE 23

Bayesian MDL Method

Parameter elicitation

Metadata times are more likely to be changepoints E

ρ(1)

= a a + b(1) < a a + b(2) = E

ρ(2)

US temperature: 6 changepoints per century [Mitchell, 1953], 0.005 changepoint / month Default parameters: a = 1, b(1) = 239 = ⇒ E

ρ(1)

= 0.004 b(2) = 47 = ⇒ E

ρ(2)

= 0.021

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 8 / 26

SLIDE 24

Bayesian MDL Method

Likelihood: multivariate normal

Under a specific model η, Xt = sv(t) + µr(t) + ǫt, t = 1, 2, . . . , N. v(t) time t is in season v(t) r(t) time t is in regime r(t) sv seasonal mean, v = 1, 2, . . . , 12. µr regime mean, r = 1, 2, . . . , m + 1 {ǫt} Gaussian AR(p) errors, with white noise variance σ2 and autoregression coefficients φ1, . . . , φp For identifiability, µ1 = 0

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 9 / 26

SLIDE 25

Bayesian MDL Method

Likelihood: multivariate normal

Under a specific model η, Xt = sv(t) + µr(t) + ǫt, t = 1, 2, . . . , N. v(t) time t is in season v(t) r(t) time t is in regime r(t) sv seasonal mean, v = 1, 2, . . . , 12. µr regime mean, r = 1, 2, . . . , m + 1 {ǫt} Gaussian AR(p) errors, with white noise variance σ2 and autoregression coefficients φ1, . . . , φp For identifiability, µ1 = 0 Likelihood function: f

X(p+1):N | µ, s, σ2, φ, η, X1:p
Yingbo Li

(Clemson) Bayesian MDL Changepoint Detection QPRC 2017 9 / 26

SLIDE 26

Bayesian MDL Method

Likelihood: multivariate normal

Under a specific model η, Xt = sv(t) + µr(t) + ǫt, t = 1, 2, . . . , N. v(t) time t is in season v(t) r(t) time t is in regime r(t) sv seasonal mean, v = 1, 2, . . . , 12. µr regime mean, r = 1, 2, . . . , m + 1 {ǫt} Gaussian AR(p) errors, with white noise variance σ2 and autoregression coefficients φ1, . . . , φp For identifiability, µ1 = 0 Likelihood function: f

X(p+1):N | µ, s, σ2, φ, η, X1:p
Yingbo Li

(Clemson) Bayesian MDL Changepoint Detection QPRC 2017 9 / 26

SLIDE 27

Bayesian MDL Method

Likelihood: multivariate normal

Under a specific model η, Xt = sv(t) + µr(t) + ǫt, t = 1, 2, . . . , N. v(t) time t is in season v(t) r(t) time t is in regime r(t) sv seasonal mean, v = 1, 2, . . . , 12. µr regime mean, r = 1, 2, . . . , m + 1 {ǫt} Gaussian AR(p) errors, with white noise variance σ2 and autoregression coefficients φ1, . . . , φp For identifiability, µ1 = 0 Likelihood function: f

X(p+1):N | µ, s, σ2, φ, η, X1:p
Using mixture MDL on µ, and two-part MDL on the rest.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 9 / 26

SLIDE 28

Bayesian MDL Method

BMDL: has a closed form

mixture MDL two-part MDL two-part MDL BMDL(η) = L(X | ˆ s, ˆ σ2, ˆ φ, η) +L(ˆ s, ˆ σ2, ˆ φ | η) +L(η)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 10 / 26

SLIDE 29

Bayesian MDL Method

BMDL: has a closed form

mixture MDL two-part MDL two-part MDL BMDL(η) = L(X | ˆ s, ˆ σ2, ˆ φ, η) +L(ˆ s, ˆ σ2, ˆ φ | η) +L(η) Prior distribution on µ = (µ2, µ3, . . . , µm+1)′ µ | σ2, η ∼ N(0, νσ2Im).

◮ ν is pre-specified; default ν = 5. ◮ Normal-normal conjugacy: marginal likelihood has a closed form

f (X | s, σ2, φ, η) =

f
X | µ, s, σ2, φ, η
π(µ | σ2, η)dµ

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 10 / 26

SLIDE 30

Bayesian MDL Method

BMDL: has a closed form

mixture MDL two-part MDL two-part MDL BMDL(η) = L(X | ˆ s, ˆ σ2, ˆ φ, η) +L(ˆ s, ˆ σ2, ˆ φ | η) +L(η) Prior distribution on µ = (µ2, µ3, . . . , µm+1)′ µ | σ2, η ∼ N(0, νσ2Im).

◮ ν is pre-specified; default ν = 5. ◮ Normal-normal conjugacy: marginal likelihood has a closed form

f (X | s, σ2, φ, η) =

f
X | µ, s, σ2, φ, η
π(µ | σ2, η)dµ

ˆ s, ˆ σ2 = arg max f (X | s, σ2, φ, η) have closed forms. ˆ φ: Yule-Walker estimator.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 10 / 26

SLIDE 31

Bayesian MDL Method

BMDL: has a closed form

mixture MDL two-part MDL two-part MDL BMDL(η) = L(X | ˆ s, ˆ σ2, ˆ φ, η) +L(ˆ s, ˆ σ2, ˆ φ | η) +L(η) − log f (X | ˆ s, ˆ σ2, ˆ φ, η) + p+13

2

log(N − p) − log π(η) Prior distribution on µ = (µ2, µ3, . . . , µm+1)′ µ | σ2, η ∼ N(0, νσ2Im).

◮ ν is pre-specified; default ν = 5. ◮ Normal-normal conjugacy: marginal likelihood has a closed form

f (X | s, σ2, φ, η) =

f
X | µ, s, σ2, φ, η
π(µ | σ2, η)dµ

ˆ s, ˆ σ2 = arg max f (X | s, σ2, φ, η) have closed forms. ˆ φ: Yule-Walker estimator.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 10 / 26

SLIDE 32

Bayesian MDL Method

BMDL: has a closed form

mixture MDL two-part MDL two-part MDL BMDL(η) = L(X | ˆ s, ˆ σ2, ˆ φ, η) +L(ˆ s, ˆ σ2, ˆ φ | η) +L(η) − log f (X | ˆ s, ˆ σ2, ˆ φ, η) + constant − log π(η) Prior distribution on µ = (µ2, µ3, . . . , µm+1)′ µ | σ2, η ∼ N(0, νσ2Im).

◮ ν is pre-specified; default ν = 5. ◮ Normal-normal conjugacy: marginal likelihood has a closed form

f (X | s, σ2, φ, η) =

f
X | µ, s, σ2, φ, η
π(µ | σ2, η)dµ

ˆ s, ˆ σ2 = arg max f (X | s, σ2, φ, η) have closed forms. ˆ φ: Yule-Walker estimator.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 10 / 26

SLIDE 33

Bayesian MDL Method

Bayesian MDL formula

BMDL (for model η)

BMDL(η) = N − p 2 log

ˆ

σ2 + m 2 log(ν) + 1 2 log

D′

D + Im ν

−

2

k=1

log

Γ
a + m(k)

Γ

b(k) + N(k) − m(k)

.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 11 / 26

SLIDE 34

Bayesian MDL Method

Computation: stochastic model search using MCMC

Empirical Bayes posterior probability of model η pEB(η | X) ∝

f
X | µ,ˆ

s, ˆ σ2, ˆ φ, η

π(µ | ˆ

σ2, η)dµ · π(η) BMDL is closely related to EB. BMDL(η) = −log

f
X | µ,ˆ

s, ˆ σ2, ˆ φ, η

π(µ | ˆ

σ2, η)dµ−log π(η)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 12 / 26

SLIDE 35

Bayesian MDL Method

Computation: stochastic model search using MCMC

Empirical Bayes posterior probability of model η pEB(η | X) ∝

f
X | µ,ˆ

s, ˆ σ2, ˆ φ, η

π(µ | ˆ

σ2, η)dµ · π(η) BMDL is closely related to EB. BMDL(η) = − log pEB(η | X)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 12 / 26

SLIDE 36

Bayesian MDL Method

Computation: stochastic model search using MCMC

Empirical Bayes posterior probability of model η pEB(η | X) ∝

f
X | µ,ˆ

s, ˆ σ2, ˆ φ, η

π(µ | ˆ

σ2, η)dµ · π(η) BMDL is closely related to EB. BMDL(η) = − log pEB(η | X) Borrow stochastic model search algorithms from Bayesian model selection literature: Metropolis-Hastings (MCMC). η[0] → η[1] → η[2] → · · · → η[t] → · · ·

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 12 / 26

SLIDE 37

Bayesian MDL Method

Computation: stochastic model search using MCMC

Empirical Bayes posterior probability of model η pEB(η | X) ∝

f
X | µ,ˆ

s, ˆ σ2, ˆ φ, η

π(µ | ˆ

σ2, η)dµ · π(η) BMDL is closely related to EB. BMDL(η) = − log pEB(η | X) Borrow stochastic model search algorithms from Bayesian model selection literature: Metropolis-Hastings (MCMC). η[0] → η[1] → η[2] → · · · → η[t] → · · · Metropolis-Hastings algorithm [George and McCulloch, 1997] R package BayesMDL.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 12 / 26

SLIDE 38

Bayesian MDL Simulation examples

Scenario 1: monotonic shifts µ = (0, ∆, 2∆, 3∆)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 20 40

A sample simulated series

x x x x

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 13 / 26

SLIDE 39

Bayesian MDL Simulation examples

Scenario 1: monotonic shifts µ = (0, ∆, 2∆, 3∆)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 10

A sample simulated series (minus seasonal mean)

x x x x

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 13 / 26

SLIDE 40

Bayesian MDL Simulation examples

Scenario 1: monotonic shifts µ = (0, ∆, 2∆, 3∆)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 10

A sample simulated series (minus seasonal mean)

x x x x

100 200 300 400 500 600 40 80

Detection percentage: without metadata

36.1 41.4 37.7

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 13 / 26

SLIDE 41

Bayesian MDL Simulation examples

Scenario 1: monotonic shifts µ = (0, ∆, 2∆, 3∆)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 10

A sample simulated series (minus seasonal mean)

x x x x

100 200 300 400 500 600 40 80

Detection percentage: with metadata

76.2 41.1 37.4

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 13 / 26

SLIDE 42

Bayesian MDL Simulation examples

Scenario 2: non-monotonic shifts µ = (0, −∆, ∆, 0)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 10 30

A sample simulated series

x x x x

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 14 / 26

SLIDE 43

Bayesian MDL Simulation examples

Scenario 2: non-monotonic shifts µ = (0, −∆, ∆, 0)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 5

A sample simulated series (minus seasonal mean)

x x x x

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 14 / 26

SLIDE 44

Bayesian MDL Simulation examples

Scenario 2: non-monotonic shifts µ = (0, −∆, ∆, 0)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 5

A sample simulated series (minus seasonal mean)

x x x x

100 200 300 400 500 600 40 80

Detection percentage: without metadata

36.7 84.3 39.2

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 14 / 26

SLIDE 45

Bayesian MDL Simulation examples

Scenario 2: non-monotonic shifts µ = (0, −∆, ∆, 0)′

p = 3, φ = (0.2, 0.1, 0.05)′, ∆/σ = 1.5.

100 200 300 400 500 600 −10 5

A sample simulated series (minus seasonal mean)

x x x x

100 200 300 400 500 600 40 80

Detection percentage: with metadata

77.3 84.6 38.2

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 14 / 26

SLIDE 46

Bayesian MDL Asymptotic consistency

Infill asymptotics

Literature: Davis and Yau [2013], Du et al. [2016] Relative changepoint configuration: λ = (λ1, λ2, . . . , λm)′ Scale time to [0, 1] by mapping time t to t/N. λ : 0 < λ1 < · · · < λr < · · · < λm < 1 η : 0 < η1 < · · · < ηr < · · · < ηm < N

←

→ ηr = ⌊λrN⌋

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 15 / 26

SLIDE 47

Bayesian MDL Asymptotic consistency

Infill asymptotics

Literature: Davis and Yau [2013], Du et al. [2016] Relative changepoint configuration: λ = (λ1, λ2, . . . , λm)′ Scale time to [0, 1] by mapping time t to t/N. λ : 0 < λ1 < · · · < λr < · · · < λm < 1 η : 0 < η1 < · · · < ηr < · · · < ηm < N

←

→ ηr = ⌊λrN⌋ True model λ0 has m0 changepoints. The true number of changepoints m0 is unknown.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 15 / 26

SLIDE 48

Bayesian MDL Asymptotic consistency

Infill asymptotics

Literature: Davis and Yau [2013], Du et al. [2016] Relative changepoint configuration: λ = (λ1, λ2, . . . , λm)′ Scale time to [0, 1] by mapping time t to t/N. λ : 0 < λ1 < · · · < λr < · · · < λm < 1 η : 0 < η1 < · · · < ηr < · · · < ηm < N

←

→ ηr = ⌊λrN⌋ True model λ0 has m0 changepoints. The true number of changepoints m0 is unknown. Consider all relative changepoint configurations in Λ = {λ : 0 ≤ m ≤ M, min

r=1,2,...,m+1 λr − λr−1 ≥ d}

◮ M: a large integer, fixed, M > m0 ◮ d: a very small positive constant Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 15 / 26

SLIDE 49

Bayesian MDL Asymptotic consistency

Asymptotic selection consistency

The estimated relative changepoint model: ˆ λN = arg min

λ∈Λ BMDL(λ),

The estimated number of changepoints is ˆ mN = |ˆ λN|.

Theorem (Consistency for changepoint model estimation)

As N → ∞, we have ˆ mN

P

− → m0, and ˆ λN

P

− → λ0. Furthermore, for each r = 1, . . . , m0,

ˆ

λr − λ0

r

= OP

1 N

.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 16 / 26

SLIDE 50

Bayesian MDL Asymptotic consistency

Parameter estimation

Under the estimated chagnepoint model ˆ λ, Yule-Walker estimator for φ. Optimizers of BMDL for σ2 and s. Conditional posterior mean for µ.

Theorem (Consistency for parameter estimation)

As N → ∞, all parameter estimators converge to the true values, ˆ µN

P

− → µ0, ˆ sN

P

− → s0, ˆ σ2

N P

− → (σ2)0, ˆ φN

P

− → φ0.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 17 / 26

SLIDE 51

Generalization of BMDL: bivariate case

Outline

1

A Motivating Example

2

A Brief Review of MDL

3

Bayesian MDL Method Simulation examples Asymptotic consistency

4

Generalization of BMDL: bivariate case Method Simulation examples

5

Tuscaloosa data

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 18 / 26

SLIDE 52

Generalization of BMDL: bivariate case Method

Joint detection of Tmax and Tmin

Any time in t = {p + 1, p + 2, . . . , N} can be a changepoint, for either Tmax or Tmin, or both. If both, can shift in the same or opposite directions.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 19 / 26

SLIDE 53

Generalization of BMDL: bivariate case Method

Joint detection of Tmax and Tmin

Any time in t = {p + 1, p + 2, . . . , N} can be a changepoint, for either Tmax or Tmin, or both. If both, can shift in the same or opposite directions.

Definition (bivariate changepoint model η)

η = (ηp+1, ηp+2, . . . , ηN)′ ∈ R(N−p)×2, ηt =            (1, 1)′, if time t is a concurrent changepoint (1, 0)′, if time t is only a changepoint for Tmax (0, 1)′, if time t is only a changepoint for Tmin (0, 0)′, if time t is not a changepoint Number of changepoints in η: m1 (Tmax), m2 (Tmin)

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 19 / 26

SLIDE 54

Generalization of BMDL: bivariate case Method

Joint detection of Tmax and Tmin

Any time in t = {p + 1, p + 2, . . . , N} can be a changepoint, for either Tmax or Tmin, or both. If both, can shift in the same or opposite directions.

Definition (bivariate changepoint model η)

η = (ηp+1, ηp+2, . . . , ηN)′ ∈ R(N−p)×2, ηt =            (1, 1)′, if time t is a concurrent changepoint (1, 0)′, if time t is only a changepoint for Tmax (0, 1)′, if time t is only a changepoint for Tmin (0, 0)′, if time t is not a changepoint Number of changepoints in η: m1 (Tmax), m2 (Tmin) Dirichlet-Multinomial prior: closed form π(η). Hyper parameter choices: encourage concurrent shifts.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 19 / 26

SLIDE 55

Generalization of BMDL: bivariate case Method

Bivariate model

Under a specific model η, Xt,1 Xt,2

=

sv,1 sv,2

+

µr1,1 µr2,2

+

ǫt,1 ǫt,2

,

t = 1, 2, . . . , N. v time t is in season v ri time t is in regime ri, where i = 1(Tmax), 2(Tmin) sv,i seasonal mean; s1 ∈ R12, s2 ∈ R12 µri,i regime mean; µ1 ∈ Rm1, µ2 ∈ Rm2 {ǫt} Gaussian VAR(p) errors, with white noise covariance Σ ∈ R2×2 and autoregression coefficients Φ1, . . . , Φp ∈ R2×2. Likelihood: normal

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 20 / 26

SLIDE 56

Generalization of BMDL: bivariate case Method

Bivariate model

Under a specific model η, Xt,1 Xt,2

=

sv,1 sv,2

+

µr1,1 µr2,2

+

ǫt,1 ǫt,2

,

t = 1, 2, . . . , N. v time t is in season v ri time t is in regime ri, where i = 1(Tmax), 2(Tmin) sv,i seasonal mean; s1 ∈ R12, s2 ∈ R12 µri,i regime mean; µ1 ∈ Rm1, µ2 ∈ Rm2 {ǫt} Gaussian VAR(p) errors, with white noise covariance Σ ∈ R2×2 and autoregression coefficients Φ1, . . . , Φp ∈ R2×2. Likelihood: normal Prior π(µ1, µ2): normal ˆ s1,ˆ s2: closed forms

Σ,

Φ1, . . . , Φp: Yule-Walker estimators

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 20 / 26

SLIDE 57

Generalization of BMDL: bivariate case Method

Bivariate model

Under a specific model η, Xt,1 Xt,2

=

sv,1 sv,2

+

µr1,1 µr2,2

+

ǫt,1 ǫt,2

,

t = 1, 2, . . . , N. v time t is in season v ri time t is in regime ri, where i = 1(Tmax), 2(Tmin) sv,i seasonal mean; s1 ∈ R12, s2 ∈ R12 µri,i regime mean; µ1 ∈ Rm1, µ2 ∈ Rm2 {ǫt} Gaussian VAR(p) errors, with white noise covariance Σ ∈ R2×2 and autoregression coefficients Φ1, . . . , Φp ∈ R2×2. Likelihood: normal Prior π(µ1, µ2): normal BMDL has a closed form. ˆ s1,ˆ s2: closed forms

Σ,

Φ1, . . . , Φp: Yule-Walker estimators

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 20 / 26

SLIDE 58

Generalization of BMDL: bivariate case Simulation examples

Combining simulated data Scenario 1 and 2

Two concurrent shifts: 150 (↑↓), 300 (↑↑). Tmax (Series 1): shifts at 150, 300, 450. µ1 = (0, ∆, 2∆, 3∆)′ Tmin (Series 2): shifts at 150, 300, 375. µ2 = (0, −∆, ∆, 0)′

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 21 / 26

SLIDE 59

Generalization of BMDL: bivariate case Simulation examples

Combining simulated data Scenario 1 and 2

Two concurrent shifts: 150 (↑↓), 300 (↑↑). Tmax (Series 1): shifts at 150, 300, 450. µ1 = (0, ∆, 2∆, 3∆)′ Tmin (Series 2): shifts at 150, 300, 375. µ2 = (0, −∆, ∆, 0)′ VAR parameters Σ = 9 2 2 9

, ∆/3 = 1.5

p = 3, Φ1 = 0.2 0.02 0.02 0.2

, Φ2 =

0.1 0.01 0.01 0.1

,

Φ3 = 0.05 0.005 0.005 0.05

Yingbo Li

(Clemson) Bayesian MDL Changepoint Detection QPRC 2017 21 / 26

SLIDE 60

Generalization of BMDL: bivariate case Simulation examples

Combining simulated data Scenario 1 and 2

100 200 300 400 500 600 20 40

A sample simulated series, Tmax

100 200 300 400 500 600 −10 10 30

A sample simulated series, Tmin

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 22 / 26

SLIDE 61

Generalization of BMDL: bivariate case Simulation examples

Combining simulated data Scenario 1 and 2

100 200 300 400 500 600 −10 10

A sample simulated series (minus seasonal mean), Tmax

100 200 300 400 500 600 −10 5

A sample simulated series (minus seasonal mean), Tmin

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 22 / 26

SLIDE 62

Generalization of BMDL: bivariate case Simulation examples

Combining simulated data Scenario 1 and 2

100 200 300 400 500 600 40 80

Detection percentage using univariate BMDL: Tmax

36.1 41.4 37.7 100 200 300 400 500 600 40 80

Detection percentage using univariate BMDL: Tmin

36.7 84.3 39.2

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 22 / 26

SLIDE 63

Generalization of BMDL: bivariate case Simulation examples

Combining simulated data Scenario 1 and 2

100 200 300 400 500 600 40 80

Detection percentage using bivariate BMDL: Tmax

66.7 82.9 33.9 10.8 100 200 300 400 500 600 40 80

Detection percentage using bivariate BMDL: Tmin

66.4 83.4 34.2 21.3

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 22 / 26

SLIDE 64

Tuscaloosa data

Outline

1

A Motivating Example

2

A Brief Review of MDL

3

Bayesian MDL Method Simulation examples Asymptotic consistency

4

Generalization of BMDL: bivariate case Method Simulation examples

5

Tuscaloosa data

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 23 / 26

SLIDE 65

Tuscaloosa data

Univariate BMDL: no metadata

Tmax ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5

x x x x x

1957Mar 1990Jan Tmin ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 10

x x x x x

1918Feb 1957Jul 1990Jan

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 24 / 26

SLIDE 66

Tuscaloosa data

Univariate BMDL: with metadata

Tmax ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5

x x x x x

1956Nov 1987May Tmin ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 10

x x x x x

1921Nov 1956Jun 1987May

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 25 / 26

SLIDE 67

Tuscaloosa data

Bivariate BMDL: with metadata

Tmax ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 5

x x x x x

1921Nov 1956Jun 1987May Tmin ( ° F) 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 −10 10

x x x x x

1921Nov 1956Jun 1987May

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 26 / 26

SLIDE 68

Tuscaloosa data

Reference I

Richard A Davis and Chun Yip Yau. Consistency of minimum description length model selection for piecewise stationary time series models. Electronic Journal of Statistics, 7:381–411, 2013. Richard A Davis, Thomas C. M. Lee, and Gabriel A Rodriguez-Yam. Structural break estimation for nonstationary time series models. Journal of the American Statistical Association, 101(473):223–239, 2006. Chao Du, Chu-Lan Michael Kao, and S. C. Kou. Stepwise signal extraction via marginal

likelihood. Journal of the American Statistical Association, 111(513):314–330, 2016.

Edward I. George and Robert E. McCulloch. Approaches for Bayesian variable selection. Statistics Sinica, 7:339–373, 1997. Mark H. Hansen and Bin Yu. Model selection and the principle of minimum description

length. Journal of the American Statistical Association, 96(454):746–774, 2001.
J. Murray Mitchell. On the causes of instrumentally observed secular temperature
trends. Journal of Meteorology, 10:244–261, 1953.

Jorma Risanen. Stochastic Complexity in Statistical Inquiry, volume 511. World Scientific, Singapore, 1989. Claude Elwood Shannon. A mathematical theory of communication. Bell System Technical Journal, 27(143-157):623, 1948.

Yingbo Li (Clemson) Bayesian MDL Changepoint Detection QPRC 2017 26 / 26