An Exploratory Segmentation Method for Time Series
Christian Derquenne EDF R&D
An Exploratory Segmentation Method for Time Series Christian - - PowerPoint PPT Presentation
An Exploratory Segmentation Method for Time Series Christian Derquenne EDF R&D Outline Issues and motivations The proposed method Application : a simulated case Contributions, applications and further researches 24 th September 2010
Christian Derquenne EDF R&D
COMPSTAT 2010
24th September 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
Exploring the segmentation space for the Exploring the segmentation space for the assessment of assessment of multiple change multiple change-point point mo models dels [Guédon, Y [Guédon, Y. (2008)] (2008)]
Inference on the models with multiple br Inference on the models with multiple breakpoints in multivariate time series, eakpoints in multivariate time series, notably to notably to select o select optimal number timal number of
breakpoints eakpoints [Lavielle, M Lavielle, M. et et al. (2006)
Sequential change Sequential change-point detection when
the pre- and post-change and post-change parameters are parameters are unknown unknown [Lai, TL. et [Lai, TL. et al. (2009)]
Using of dynamic programming to Using of dynamic programming to de decrea crease computation complexity of se computation complexity of segmentations (total numb segmentations (total number = ) er = )
Co Complexit lexity is is ge gene nerally in lly in O(ST ST 2) for the time and in ) for the time and in O(ST ST ) fo for the the line linear ar clustered s clustered space, but ace, but also: also: O(T T 2) a ) and O(MT MT 2)
where T = length of series ; S = number of segments ; M = number of de series
1
T
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
[Rao & al., 1988, Searle & al., 1992]:
Ts = card(τs ) and then there are 3S parameters to estimate and the number S of segments
→ same solutions for and with the the three estimators three estimators → ML ML and REML REML estimate directly → Only REML REML provides an unbiased estimator of
]
s
t S s t s s s t
τ
∈ =
1 ) ( 1 ) ( ) ( s
) ( 1 s
s
S s s =
=1
) ( s
β
) ( 1 s
β
2 s
2 s
24th September 2010
COMPSTAT 2010
→ Using moving median:
(2) where for j (smoothing degree) fixed: aj (t) = t et bj (t) = t + j -1 où t =1 à T –j +1
Remark Remark: The more j increases, the less irregularity of data is taken into account
[ ]
) ( ), ( t t b t a t j
j j
∈
24th September 2010
COMPSTAT 2010
→ Using moving median:
(2) where for j (smoothing degree) fixed: aj (t) = t et bj (t) = t + j -1 où t =1 à T –j +1
Remark Remark: The more j increases, the less irregularity of data is taken into account
[ ]
) ( ), ( t t b t a t j
j j
∈
24th September 2010
COMPSTAT 2010
→ Using a r Using a relative deviation: lative deviation:
where k = t – j /2 if j is even and k = t –(j+1)/2 if j is odd
This This diffe differenc ncing ing mus must be be s sufficiently high fficiently high to to r reve veal tr al trend de deviatio viations ns, but but no not to too much otherwi much otherwise it cou e it could be skipped d be skipped Remark Remark: it is only a visual choice and not a theoretical choice
j j j j
24th September 2010
COMPSTAT 2010
Justification: Justification: (i) the nb
(ii) The smaller smoothing degrees is, the smaller size of series of differences with same sign is
( ) ( ) [ ]
≥ − =
2 ) 1 ( sign ) ( sign ) ( 1 , ) ( 1 ,
t t d t d j j
j j
) ( , ) ( , ) ( 1 ,
,... ,...
S j s j j
τ τ τ
) ( , ) ( , ) ( 1 ,
,... ,...
S j s j j
T T T T T
S s s j
=
=1 ) ( , 24th September 2010
COMPSTAT 2010
Justification: Justification: (i) the nb
(ii) The smaller smoothing degrees is, the smaller size of series of differences with same sign is
( ) ( ) [ ]
≥ − =
2 ) 1 ( sign ) ( sign ) ( 1 , ) ( 1 ,
t t d t d j j
j j
) ( , ) ( , ) ( 1 ,
,... ,...
S j s j j
τ τ τ
) ( , ) ( , ) ( 1 ,
,... ,...
S j s j j
T T T T T
S s s j
=
=1 ) ( , 24th September 2010
COMPSTAT 2010
Initial step: Initial step: To reduce the num To reduce the number of initial segments ber of initial segments
(i) Estimation of parameters with REML Estimation of parameters with REML (ii) Homogen Homogeneit ity test of varia y test of variance (homoskedast ce (homoskedasticty) icty)
is kept: (6) (iii) Test of the coeffici Test of the coefficients: ents: for each segment with respect to (ii)
(7)
]
) ( ,
1
1 , ) , ( 1 ) , (
s j
t S s t s j s j s j t
t Y
τ
ε σ β β
∈ =
+ + =
]
t j t S s s j s j t
s j
t Y ε σ β β
τ
+ + =
∈ =
) ( ,
1
1 ) , ( 1 ) , (
[ ]
]
) ( , ) , ( 1
1 1
1 , ) , ( 1 ) , (
s j s j
t S s t s j s j s j t
t Y
τ β
ε σ β β
∈ = ≠
+ + =
) , ( 1
=
s j
β
24th September 2010
COMPSTAT 2010
(iv) Aggregation of con Aggregation of consecut ecutive segmen ive segments (2 by 2) ts (2 by 2) (a) Test of equa Test of equal varia variances on ces on:
(8)
is rejected then the both segments are not regrouped (b) Otherwise, a test of equa a test of equal coeffici l coefficient ents i is applied applied:
is rejected then the both segments are not regrouped (c) Otherwise, a test of equa a test of equal in l intercepts is appli tercepts is applied:
is rejected then the both segments are not regrouped
] (
]
) ( 1 , ) ( ,
1 1
1 , ) 1 , ( 1 ) 1 , ( , ) , ( 1 ) , (
+
∈ + + + ∈
+ + + + + =
s j s j
t t s j s j s j t t s j s j s j t
t t Y
τ τ
ε σ β β ε σ β β
) 1 , ( 1 ) , ( 1 +
s j s j
) 1 , ( ) , ( +
s j s j
24th September 2010
COMPSTAT 2010
with S1 ≤ S where S1 is the new number
]
) 1 ( , 1
1
1 , ) , ( 1 ) , (
s j
t S s t s j s j s j t
t Y
τ
ε σ β β
∈ =
+ + =
24th September 2010
COMPSTAT 2010
Inference : Inference :
The model (9) is submitted to the same process
until the number of segments is satisfactory
criteria (cf. further researches)
24th September 2010
COMPSTAT 2010
(i) (i) A number
A number K of s
g degrees is fixed and xed and K segmentations are obtained gmentations are obtained (ii) (ii) The final model for some segmentations The final model for some segmentations will allow to reconstitute well data will allow to reconstitute well data and will have a higher d will have a higher probability to provid probability to provide a e a good segmentation good segmentation
(iii) (iii) Remark
Remark: E : Even if the T s en if the T smoothin
g degrees are tried, the opti are tried, the optimal segmentation mal segmentation is not guaranteed with a probabi is not guaranteed with a probability equa y equal to one, but the go l to one, but the goal of this method al of this method is no is not this o t this one
(iv) (iv) Goal
Goal: to pro : to propose some interesting segmen e some interesting segmentations, in terms of de tations, in terms of decision aid cision aid
(v (v) ) To e
To evaluate each final model and to offer so aluate each final model and to offer some pos me possible s le segmentations, gmentations, REML ML and MAPE are used. Then the s and MAPE are used. Then the smaller aller valu values of these l es of these las ast ones are preferred
to decide the quali to decide the quality leve y level of the se l of the segmentation gmentation
(vi (vi) Remark Remark: These measures are heuristic ch : These measures are heuristic choices because they
can hav n have an impact in the process of impact in the process of segmen segmentation tation, n , notably tably to to select on select one or several e or several unin interesting segmentations teresting segmentations
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
24th September 2010
COMPSTAT 2010
Bartlett, M.S. (1937): Properties of sufficiency and statistical tests. Proceedings of the Royal Society of London, Series A 160 160, 268-282. Guédon, Y. (2008): Exploring the segmentation space for the assessment
National de Recherche en Informatique et en Automatique, Cahier de recherche 6619. Harville, DA. (1977): Maximum likelihood approaches to variance Component estimation and to related problems. J Amer Stat Assoc 72 72, 320-340. Lai, TL. and Xing, H. (2009): Sequential Change-point Detection when the pre- and post-change parameters are unknown. Technical report 2009-5, Stanford University, Department of Statistics. Lavielle, M. and Teyssière, G. (2006): Détection de ruptures multiples dans des séries temporelles multivariées. Lietuvos Matematikos Rinikinys, Vol 46 46. Perron, P. and Kejriwal, M. (2006): Testing for Multiple Structural Changes in Cointegrated Regression
University, C22.
24th September 2010
COMPSTAT 2010