[PPT] - An Exploratory Segmentation Method for Time Series Christian PowerPoint Presentation

SLIDE 1

An Exploratory Segmentation Method for Time Series

Christian Derquenne EDF R&D

SLIDE 2

COMPSTAT 2010

24th September 2010

Outline

Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches

SLIDE 3

24th September 2010

Issues and motivations

Decomposition of times series Decomposition of times series → Trend, Trend, seasonality, seasonality, volatility volatility and noise and noise More less regular with respect application case More less regular with respect application case

Evolution of electric consumption for 50 years Evolution of electric consumption for 50 years ⇒ Regular phenomena Regular phenomena ⇒ forecasting model at short term (MAPE < 1,5%) forecasting model at short term (MAPE < 1,5%)

Evolution of Evolution of financial series (CAC40, S&P 500, …) financial series (CAC40, S&P 500, …)

⇒ Trend and seasonality occur less Trend and seasonality occur less regularly and less regularly and less frequently frequently ⇒ Volatility and irregularly Volatility and irregularly ⇒ Behaviors breaks could characterize series (peaks, level breaks, trend Behaviors breaks could characterize series (peaks, level breaks, trend changes changes, volatility) , volatility) ⇒ The data modeling is very delicate, to The data modeling is very delicate, to forecast these series can be close to forecast these series can be close to an utopian view an utopian view

COMPSTAT 2010

SLIDE 4

Issues and motivations

Interest to detect be Interest to detect behavior breakpoints havior breakpoints

→ Building contiguous segments (segmentation)

Building contiguous segments (segmentation)

→ Interesting to detect behavior breakpoints

Interesting to detect behavior breakpoints

→ Achieving stationarity

Achieving stationarity

f ti
f time series with a segmentation model

me series with a segmentation model

→ Building symbolic curves to cluster series

Building symbolic curves to cluster series

→ Modeling multivari

Modeling multivariate time series te time series

Potent ntial ap ial applicatio lications ns

→ Economics, finance, human sequence, meteorology, energy

Economics, finance, human sequence, meteorology, energy management, etc. management, etc.

24th September 2010

COMPSTAT 2010

SLIDE 5

Issues and motivations

Some examples of methods Some examples of methods

→

Exploring the segmentation space for the Exploring the segmentation space for the assessment of assessment of multiple change multiple change-point point mo models dels [Guédon, Y [Guédon, Y. (2008)] (2008)]

→

Inference on the models with multiple br Inference on the models with multiple breakpoints in multivariate time series, eakpoints in multivariate time series, notably to notably to select o select optimal number timal number of

f br

breakpoints eakpoints [Lavielle, M Lavielle, M. et et al. (2006)

al. (2006)]

→

Sequential change Sequential change-point detection when

int detection when the pre-

the pre- and post-change and post-change parameters are parameters are unknown unknown [Lai, TL. et [Lai, TL. et al. (2009)]

al. (2009)]

Common point of these methods Common point of these methods

→

Using of dynamic programming to Using of dynamic programming to de decrea crease computation complexity of se computation complexity of segmentations (total numb segmentations (total number = ) er = )

→

Co Complexit lexity is is ge gene nerally in lly in O(ST ST 2) for the time and in ) for the time and in O(ST ST ) fo for the the line linear ar clustered s clustered space, but ace, but also: also: O(T T 2) a ) and O(MT MT 2)

where T = length of series ; S = number of segments ; M = number of de series

1

2 −

T

24th September 2010

COMPSTAT 2010

SLIDE 6

Issues and motivations

Three problems studied by these methods Three problems studied by these methods

(i) Change mean wi (i) Change mean with a constant variance th a constant variance (ii) (ii) Change of variance with a constant mean Change of variance with a constant mean (iii) (iii) Change for overall distribution Change for overall distribution of time series without change of

f time series without change of

level, in dispersion and on the distribution of errors level, in dispersion and on the distribution of errors

The proposed method The proposed method

→

Detection of increasing or decreasing trend [Perron Detection of increasing or decreasing trend [Perron & al al., 2008] ., 2008]

→

To reduce the computation To reduce the computation complexity in complexity in O(KT KT ), where ), where K is is the the smoothing degree, which is generally les smoothing degree, which is generally less than to s than to

→

Proposition of some solutions of segmentation containing Proposition of some solutions of segmentation containing segments with increasing or decre segments with increasing or decreasing trend, constant level and asing trend, constant level and different standard-deviatio different standard-deviations T

24th September 2010

COMPSTAT 2010

SLIDE 7

24th September 2010

Outline

Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches

COMPSTAT 2010

SLIDE 8

Proposed method

Let’s (Yt )t=1,T be a time series, we suppose that it is decomposed in accordance with an heteroskedastic linear model (or variance components)

[Rao & al., 1988, Searle & al., 1992]:

(1) where , and > 0, are respectively the level, trend and standard- deviation parameters for the segment τs , and εt is a N(0,1)

Ts = card(τs ) and then there are 3S parameters to estimate and the number S of segments

Inference: Inference: OLS ; ML ; REML

→ same solutions for and with the the three estimators three estimators → ML ML and REML REML estimate directly → Only REML REML provides an unbiased estimator of

( ) [

]

s

t S s t s s s t

t Y

τ

ε σ β β

∈ =

∑

+ + = 1

1 ) ( 1 ) ( ) ( s

β

) ( 1 s

β

s

σ

T T

S s s =

∑

=1

) ( s

β

) ( 1 s

β

2 s

σ

2 s

σ

24th September 2010

COMPSTAT 2010

SLIDE 9

Proposed method

Detailed process tailed process: preparing data preparing data

Step of smoothing: Step of smoothing: To keep only the « To keep only the « strong strong » trends trends

→ Using moving median:

Using moving median:

(2) where for j (smoothing degree) fixed: aj (t) = t et bj (t) = t + j -1 où t =1 à T –j +1

Remark Remark: The more j increases, the less irregularity of data is taken into account

A little example: A little example: Yt ~>

N(5 ; 0,01)

pour t = 1,40 Yt ~>

N(6 ; 0,01)

pour t = 41,100

[ ]

) ( ) (

) ( ), ( t t b t a t j

y med t m

j j

∈

=

24th September 2010

COMPSTAT 2010

SLIDE 10

Proposed method

Detailed process tailed process: preparing data preparing data

Step of smoothing: Step of smoothing: To keep only the « To keep only the « strong strong » trends trends

→ Using moving median:

Using moving median:

(2) where for j (smoothing degree) fixed: aj (t) = t et bj (t) = t + j -1 où t =1 à T –j +1

Remark Remark: The more j increases, the less irregularity of data is taken into account

[ ]

) ( ) (

) ( ), ( t t b t a t j

y med t m

j j

∈

=

24th September 2010

COMPSTAT 2010

SLIDE 11

Proposed method

Detailed process tailed process: preparing data preparing data

Differencing step: Differencing step: to detect the trends of smoothed data to detect the trends of smoothed data

→ Using a r Using a relative deviation: lative deviation:

(3)

where k = t – j /2 if j is even and k = t –(j+1)/2 if j is odd

This This diffe differenc ncing ing mus must be be s sufficiently high fficiently high to to r reve veal tr al trend de deviatio viations ns, but but no not to too much otherwi much otherwise it cou e it could be skipped d be skipped Remark Remark: it is only a visual choice and not a theoretical choice

( )

k t m k t m t m t d

j j j j

− − − = ) ( ) (

24th September 2010

COMPSTAT 2010

SLIDE 12

Proposed method

Detailed process tailed process: preparing data preparing data

Step of counting: Step of counting: number and size of number and size of initial segments initial segments (4) S segment segments: : with size with size and and

Justification: Justification: (i) the nb

f values with the same sign is reasonably linked to the smoothing deg.

(ii) The smaller smoothing degrees is, the smaller size of series of differences with same sign is

( )

( ) ( ) [ ]

∑

≥ − =

= =

2 ) 1 ( sign ) ( sign ) ( 1 , ) ( 1 ,

1 card

t t d t d j j

j j

T τ

( )

) ( , ) ( , ) ( 1 ,

,... ,...

S j s j j

τ τ τ

( )

) ( , ) ( , ) ( 1 ,

,... ,...

S j s j j

T T T T T

S s s j

=

∑

=1 ) ( , 24th September 2010

COMPSTAT 2010

SLIDE 13

Proposed method

Detailed process tailed process: preparing data preparing data

Step of counting: Step of counting: number and size of number and size of initial segments initial segments (4) S segment segments: : with size with size and and

Justification: Justification: (i) the nb

f values with the same sign is reasonably linked to the smoothing deg.

(ii) The smaller smoothing degrees is, the smaller size of series of differences with same sign is

( )

( ) ( ) [ ]

∑

≥ − =

= =

2 ) 1 ( sign ) ( sign ) ( 1 , ) ( 1 ,

1 card

t t d t d j j

j j

T τ

( )

) ( , ) ( , ) ( 1 ,

,... ,...

S j s j j

τ τ τ

( )

) ( , ) ( , ) ( 1 ,

,... ,...

S j s j j

T T T T T

S s s j

=

∑

=1 ) ( , 24th September 2010

COMPSTAT 2010

SLIDE 14

Proposed method

Detailed process tailed process: modeling data modeling data

Initial step: Initial step: To reduce the num To reduce the number of initial segments ber of initial segments

(5) Inference : Inference :

(i) Estimation of parameters with REML Estimation of parameters with REML (ii) Homogen Homogeneit ity test of varia y test of variance (homoskedast ce (homoskedasticty) icty)

⇒If H0

is kept: (6) (iii) Test of the coeffici Test of the coefficients: ents: for each segment with respect to (ii)

⇒ New model:

(7)

( ) [

]

) ( ,

1

1 , ) , ( 1 ) , (

s j

t S s t s j s j s j t

t Y

τ

ε σ β β

∈ =

∑

+ + =

( ) [

]

t j t S s s j s j t

s j

t Y ε σ β β

τ

+ + =

∈ =

∑

) ( ,

1

1 ) , ( 1 ) , (

[ ]

( ) [

]

) ( , ) , ( 1

1 1

1 , ) , ( 1 ) , (

s j s j

t S s t s j s j s j t

t Y

τ β

ε σ β β

∈ = ≠

∑

+ + =

) , ( 1

=

s j

β

24th September 2010

COMPSTAT 2010

SLIDE 15

Proposed method

Detailed process tailed process: modeling data modeling data

(iv) Aggregation of con Aggregation of consecut ecutive segmen ive segments (2 by 2) ts (2 by 2) (a) Test of equa Test of equal varia variances on ces on:

(8)

⇒If H0

is rejected then the both segments are not regrouped (b) Otherwise, a test of equa a test of equal coeffici l coefficient ents i is applied applied:

⇒If H0

is rejected then the both segments are not regrouped (c) Otherwise, a test of equa a test of equal in l intercepts is appli tercepts is applied:

⇒If H0

is rejected then the both segments are not regrouped

( ) [

] (

) [

]

) ( 1 , ) ( ,

1 1

1 , ) 1 , ( 1 ) 1 , ( , ) , ( 1 ) , (

+

∈ + + + ∈

+ + + + + =

s j s j

t t s j s j s j t t s j s j s j t

t t Y

τ τ

ε σ β β ε σ β β

) 1 , ( 1 ) , ( 1 +

=

s j s j

β β

) 1 , ( ) , ( +

=

s j s j

β β

24th September 2010

COMPSTAT 2010

SLIDE 16

Proposed method

Detailed process tailed process: modeling data modeling data

First First model: model: (9)

with S1 ≤ S where S1 is the new number

f segments

( ) [

]

) 1 ( , 1

1

1 , ) , ( 1 ) , (

s j

t S s t s j s j s j t

t Y

τ

ε σ β β

∈ =

∑

+ + =

24th September 2010

COMPSTAT 2010

SLIDE 17

Proposed method

Detailed process tailed process: modeling data modeling data

Further steps Further steps of modeling:

f modeling: until the number of segments is satisfactory

Inference : Inference :

(i)

The model (9) is submitted to the same process

f successive tests as presented previously

until the number of segments is satisfactory

(ii) In state of the work, the precise convergence

criteria (cf. further researches)

24th September 2010

COMPSTAT 2010

SLIDE 18

Proposed method

Models assessment Models assessment

(i) (i) A number

A number K of s

f smoothin
othing degrees is fi

g degrees is fixed and xed and K segmentations are obtained gmentations are obtained (ii) (ii) The final model for some segmentations The final model for some segmentations will allow to reconstitute well data will allow to reconstitute well data and will have a higher d will have a higher probability to provid probability to provide a e a good segmentation good segmentation

(iii) (iii) Remark

Remark: E : Even if the T s en if the T smoothin

othing degrees

g degrees are tried, the opti are tried, the optimal segmentation mal segmentation is not guaranteed with a probabi is not guaranteed with a probability equa y equal to one, but the go l to one, but the goal of this method al of this method is no is not this o t this one

(iv) (iv) Goal

Goal: to pro : to propose some interesting segmen e some interesting segmentations, in terms of de tations, in terms of decision aid cision aid

(v (v) ) To e

To evaluate each final model and to offer so aluate each final model and to offer some pos me possible s le segmentations, gmentations, REML ML and MAPE are used. Then the s and MAPE are used. Then the smaller aller valu values of these l es of these las ast ones are preferred

nes are preferred

to decide the quali to decide the quality leve y level of the se l of the segmentation gmentation

(vi (vi) Remark Remark: These measures are heuristic ch : These measures are heuristic choices because they

ices because they ca

can hav n have an impact in the process of impact in the process of segmen segmentation tation, n , notably tably to to select on select one or several e or several unin interesting segmentations teresting segmentations

24th September 2010

COMPSTAT 2010

SLIDE 19

Proposed method

Models assessment Models assessment

24th September 2010

COMPSTAT 2010

SLIDE 20

Proposed method

24th September 2010

COMPSTAT 2010

SLIDE 21

Proposed method

24th September 2010

COMPSTAT 2010

SLIDE 22

24th September 2010

Outline

Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches

COMPSTAT 2010

SLIDE 23

Application: a simulated case

MAPE = 7,1% ; %BCL = 85,3% MAPE = 7,1% ; %BCL = 85,3%

24th September 2010

COMPSTAT 2010

SLIDE 24

Application: a simulated case

24th September 2010

COMPSTAT 2010

SLIDE 25

Application: a simulated case

An An interest interest to to achieve statio achieve stationarity narity a time a time series series

24th September 2010

COMPSTAT 2010

SLIDE 26

Outline

Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches

24th September 2010

COMPSTAT 2010

SLIDE 27

Contributions, applications and further researches

The proposed method allows to segment a time series The proposed method allows to segment a time series

→ It offers an original process containing a stage

It offers an original process containing a stage of

f preparing

preparing data data which is essential to which is essential to build the most build the most adequat adequate structure t structure to initialize initialize stage of modelling stage of modelling

→ The modeling step is

The modeling step is in in accordance accordance with an heteroskedastic with an heteroskedastic linear inear model including the different model including the different trends, levels and varian trends, levels and variances ces

→ The

The goal goal of this method is not to

f this method is not to provide the optimal segmentation as

provide the optimal segmentation as the majority of the the majority of the methods methods discusse discussed in introduction, but t d in introduction, but to provide provide a decision aid. Indeed, decision aid. Indeed, even even if if th the minimum complexity of the other e minimum complexity of the other methods is in methods is in O(T2), it st ), it stays high, however ays high, however

→ The method introduced in this pa

The method introduced in this paper per uses onl uses only assessment c y assessment criteri iteria, , such as values of REML, MAPE and such as values of REML, MAPE and percent percentage of relative errors ge of relative errors le less th than 10% an 10%

24th September 2010

COMPSTAT 2010

SLIDE 28

Contributions, applications and further researches

→Complexity is in

Complexity is in O(T T ) for eac ) for each smoothin smoothing degree and the g degree and the number of this last one is number of this last one is rarely rarely greater than . Indeed, for greater than . Indeed, for high smoothin high smoothing degree, the qua g degree, the quality of segmentations lity of segmentations decre decreases ases rapidly, because they move away rapidly, because they move away optimality, even if this one is

ptimality, even if this one is

empirical empirical

→This

This method method can can be used in a lo be used in a lot of domains of application and t of domains of application and for a lot of objectives: searching of for a lot of objectives: searching of segments, segments, achieving achieving stationarity, building of stationarity, building of different different models on a same time serie models on a same time series having different beh ving different behaviors, simplifying (symbolic approach) of viors, simplifying (symbolic approach) of sever several time series to make l time series to make clust clustering of curves, etc ring of curves, etc

→This

This method method is rather prelim is rather preliminary and we work to improv inary and we work to improve e some st some steps of this met eps of this method, part

d, particular

icularly on the detection

n the detection of
f

volatility in data and on the ev volatility in data and on the evaluation and the validation aluation and the validation tools tools

f segmentations t
f segmentations to obtain
btain a

a be better means to h tter means to have a hier ve a hierarch archy of y of these last ones these last ones

T

24th September 2010

COMPSTAT 2010

SLIDE 29

Bibliography

Bartlett, M.S. (1937): Properties of sufficiency and statistical tests. Proceedings of the Royal Society of London, Series A 160 160, 268-282. Guédon, Y. (2008): Exploring the segmentation space for the assessment

f multiple change-point models. Institut

National de Recherche en Informatique et en Automatique, Cahier de recherche 6619. Harville, DA. (1977): Maximum likelihood approaches to variance Component estimation and to related problems. J Amer Stat Assoc 72 72, 320-340. Lai, TL. and Xing, H. (2009): Sequential Change-point Detection when the pre- and post-change parameters are unknown. Technical report 2009-5, Stanford University, Department of Statistics. Lavielle, M. and Teyssière, G. (2006): Détection de ruptures multiples dans des séries temporelles multivariées. Lietuvos Matematikos Rinikinys, Vol 46 46. Perron, P. and Kejriwal, M. (2006): Testing for Multiple Structural Changes in Cointegrated Regression

Models. Boston

University, C22.

24th September 2010

COMPSTAT 2010