Kernels to detect abrupt changes in time series Alain Celisse 1 UMR - - PowerPoint PPT Presentation

kernels to detect abrupt changes in time series
SMART_READER_LITE
LIVE PREVIEW

Kernels to detect abrupt changes in time series Alain Celisse 1 UMR - - PowerPoint PPT Presentation

Intro. Framework Algorithm Change-pts location? ( D fixed) How many chg-pts? Kernels to detect abrupt changes in time series Alain Celisse 1 UMR 8524 CNRS - Universit e Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work


slide-1
SLIDE 1

1/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Kernels to detect abrupt changes in time series

Alain Celisse

1UMR 8524 CNRS - Universit´

e Lille 1

2Modal INRIA team-project 3SSB group – Paris

joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot “Computational and statistical trade-offs in learning” – IHES Paris, March 22nd, 2016

Kernels to detect abrupt changes in time series Alain Celisse

slide-2
SLIDE 2

2/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Outline

1 Motivating examples and framework (kernels) 2 KCP Algorithm and computational complexity 3 Where are the change-points (D fixed)? 4 How many change-points? Kernels to detect abrupt changes in time series Alain Celisse

slide-3
SLIDE 3

3/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Change-point detection: 1-D signal (example)

10 20 30 40 50 60 70 80 90 100 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 1.2

Position t Signal

Signal

  • Reg. func.

? ?

Kernels to detect abrupt changes in time series Alain Celisse

slide-4
SLIDE 4

4/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Detect abrupt changes. . .

General purposes:

1 Detect changes in (features of) the distribution (not only in

the mean)

Kernels to detect abrupt changes in time series Alain Celisse

slide-5
SLIDE 5

5/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Abrupt changes in high-order moments

− → Detecting changes in the mean is useless

Kernels to detect abrupt changes in time series Alain Celisse

slide-6
SLIDE 6

6/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Detect abrupt changes. . .

General purposes:

1 Detect changes in (features of) the distribution (not only in

the mean)

2 Complex data:

High-dimension: measures in Rd, curves,. . . Structured: audio/video streams, graphs, DNA sequence,. . .

Kernels to detect abrupt changes in time series Alain Celisse

slide-7
SLIDE 7

7/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Motivating example 1: Structured objects

Description: Video sequences from “Le grand ´ echiquier”, 70s-80s French talk show. At each time, one observes an image (high-dimensional). Each image is summarized by a histogram.

Kernels to detect abrupt changes in time series Alain Celisse

slide-8
SLIDE 8

8/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Motivating example 2: Structured objects

Observe networks along the time Goal: Detect abrupt changes in some features of the network

Kernels to detect abrupt changes in time series Alain Celisse

slide-9
SLIDE 9

9/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Detect abrupt changes. . .

General purposes:

1 Detect changes in (features of) the distribution (not only in

the mean)

2 Complex data:

High-dimension: measures in Rd, curves,. . . Structured: audio/video streams, graphs, DNA sequence,. . .

3 Fusion of heterogeneous data

Deal simultaneously with different types of complex data

4 Efficient algorithm allowing to deal with large data sets

(“Big data” challenge)

Kernels to detect abrupt changes in time series Alain Celisse

slide-10
SLIDE 10

10/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

I Kernel framework

Kernels to detect abrupt changes in time series Alain Celisse

slide-11
SLIDE 11

11/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Kernel and Reproducing Kernel Hilbert Space (RKHS)

X1, . . . , Xn ∈ X: initial observations. k(·, ·) : X × X → R: reproducing kernel (Aronszajn (1950)) H: RKHS associated with k(·, ·)

(φ : X → H s.t. φ(x) = k(x, ·): canonical feature map)

Assets: Versatile tool to work with different types of data Complex data (high dimensional/structured)

Kernels to detect abrupt changes in time series Alain Celisse

slide-12
SLIDE 12

12/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Instances of kernels

Gaussian kernel:

(with Rd-valued data)

kδ(x, y) = exp

  • −x − y2

δ

  • ,

δ > 0 . χ2-kernel:

(with histogram-valued data)

kI(p, q) = exp

I

  • i=1

(pi − qi)2 pi + qi

  • ·

. . .

Kernels to detect abrupt changes in time series Alain Celisse

slide-13
SLIDE 13

13/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Model

∀1 ≤ i ≤ n, Yi = φ(Xi) = µ⋆

i + εi

∈ H , where µ⋆

i ∈ H: mean element of PXi (distribution of Xi)

∀i, εi := Yi − µ⋆

i ,

with Eεi = 0, vi := E

  • εi2

H

  • .

Mean element of PXi The mean element of PXi: (H separable and E [ k(X, X) ] < +∞) < µ⋆

i , f >H= EXi [ < φ(Xi), f >H ] ,

∀f ∈ H . With characteristic kernels, PXi = PXj ⇒ µ⋆

i = µ⋆ j .

Kernels to detect abrupt changes in time series Alain Celisse

slide-14
SLIDE 14

14/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Estimation rather than identification

Assumption µ⋆ = (µ⋆

1, . . . , µ⋆ n)′ ∈ Hn :

piecewise constant. Fact: With finite sample, it is impossible to recover change-point in noisy regions.

55 60 65 70 75 80 85 90 95 100 −0.2 0.2 0.4 0.6 0.8 1 Signal: Y

  • Reg. func. s

Purpose: Estimate µ⋆ to recover change-points. Performance measure: µ⋆ − µ2 := n

i=1 µ⋆ i − µi2 H

Kernels to detect abrupt changes in time series Alain Celisse

slide-15
SLIDE 15

15/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

II Algorithm

Kernels to detect abrupt changes in time series Alain Celisse

slide-16
SLIDE 16

16/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Notation

Segmentation with D segments: τ = (τ0, . . . , τD), with 0 = τ0 < τ1 < τ2 < · · · < τD = n Quality of a segmentation τ: Following Hachaoui and Capp´ e (2007),

  • Rn (τ) = 1

n

n

  • i=1

k(Xi, Xi) − 1 n

D

  • ℓ=1

  1 τℓ − τℓ−1

τℓ

  • i=τℓ−1+1

τℓ

  • j=τℓ−1+1

k(Xi, Xj)   .

Rk: With the linear kernel k(x, x′) =< x, x′ > on X = Rd,

  • Rn (τ) reduces to the usual least-squares empirical risk.

Kernels to detect abrupt changes in time series Alain Celisse

slide-17
SLIDE 17

17/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

KCP Algorithm

Input:

  • bservations:

X1, . . . , Xn ∈ X, kernel: k : X × X → R,

Kernels to detect abrupt changes in time series Alain Celisse

slide-18
SLIDE 18

17/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

KCP Algorithm

Input:

  • bservations:

X1, . . . , Xn ∈ X, kernel: k : X × X → R, Step 1: ∀1 ≤ D ≤ Dmax, compute:

  • τ(D) ∈ Argminτ∈T D

n

  • Rn (τ)
  • → dynamic programming

T D

n =

  • (τ0, . . . , τD) ∈ ND+1 / 0 = τ0 < τ1 < τ2 < · · · < τD = n
  • Kernels to detect abrupt changes in time series

Alain Celisse

slide-19
SLIDE 19

17/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

KCP Algorithm

Input:

  • bservations:

X1, . . . , Xn ∈ X, kernel: k : X × X → R, Step 1: ∀1 ≤ D ≤ Dmax, compute:

  • τ(D) ∈ Argminτ∈T D

n

  • Rn (τ)
  • → dynamic programming

Step 2: Find:

  • D ∈ Argmin1≤D≤Dmax
  • Rn (

τ(D)) + pen ( τ(D))

  • → model selection

Output: sequence of change-points: τ = τ

  • D
  • .

T D

n =

  • (τ0, . . . , τD) ∈ ND+1 / 0 = τ0 < τ1 < τ2 < · · · < τD = n
  • Kernels to detect abrupt changes in time series

Alain Celisse

slide-20
SLIDE 20

18/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Computational complexity (Naive approach)

Dynamic programming (DP) update rule: ∀2 ≤ D ≤ Dmax, LD,n = min

t≤n−1 {LD−1,t + Ct,n} ,

where LD−1,t: cost of the best segmentation in D − 1 segments up to time t, Ct,n: cost of the segment 〚t,n〛. Cs,t =

t

  • i=s+1

k(Xi, Xi) − 1 t − s

t

  • i=s+1

t

  • j=s+1

k(Xi, Xj) Complexity (Naive approach): time: O(Dmaxn4) (computation of {Cs,t}1≤s,t≤n) space: O(n2) (storage of the cost matrix)

Kernels to detect abrupt changes in time series Alain Celisse

slide-21
SLIDE 21

19/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Computational complexity (Improvement)

Ideas: (with G. Rigaill and G. Marot) Never store the cost matrix Update each column C·,t+1 from C·,t Pseudo-code:

1: for t = 1 to n − 1 do 2:

Compute the (t + 1)-th column C·,t+1 from C·,t

3:

for D = 2 to min(t, Dmax) do

4:

LD,t+1 = mins≤t{LD−1,s + Cs,t+1}

5:

end for

6: end for

Computational complexity Space: O(Dmaxn) (only store C·,t ∈ Rn) Time: O(Dmaxn2) (update rule+DP complexity)

Kernels to detect abrupt changes in time series Alain Celisse

slide-22
SLIDE 22

20/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Runtime

Open questions: Reduce computation time by low-rank matrix approx. Quantify what has been lost by the approx.

Kernels to detect abrupt changes in time series Alain Celisse

slide-23
SLIDE 23

21/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

III Where are the change-points for a fixed D?

Kernels to detect abrupt changes in time series Alain Celisse

slide-24
SLIDE 24

22/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

KCP Algorithm (reminder)

Input:

  • bservations:

X1, . . . , Xn ∈ X, kernel: k : X × X → R, Step 1: ∀1 ≤ D ≤ Dmax, compute:

  • τ(D) ∈ Argminτ∈T D

n

  • Rn (τ)
  • → dynamic programming

Step 2: Find:

  • D ∈ Argmin1≤D≤Dmax
  • Rn (

τ(D)) + pen ( τ(D))

  • → model selection

Output: sequence of change-points: τ = τ

  • D
  • .

T D

n =

  • (τ0, . . . , τD) ∈ ND+1 / 0 = τ0 < τ1 < τ2 < · · · < τD = n
  • Kernels to detect abrupt changes in time series

Alain Celisse

slide-25
SLIDE 25

23/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Distance between segmentations

Hausdorff distance:

dH(τ, τ ′) = max

  • max

1≤i≤Dτ −1

min

1≤j≤Dτ′−1

  • τi − τ ′

j

  • ,

max

1≤j≤Dτ′−1

min

1≤i≤Dτ −1

  • τi − τ ′

j

  • Frobenius distance:

dF(τ, τ ′) =

  • Mτ − Mτ ′
  • F =

1≤i,j≤n

i,j − Mτ ′ i,j

2 , where Mτ

i,j =

1{i and j belong to the same segment of τ} Card(segment of τ containing i and j) .

Kernels to detect abrupt changes in time series Alain Celisse

slide-26
SLIDE 26

24/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Empirical assessment

Scenario 1: Changes in (mean,variance) R-valued X1, . . . , Xn with n = 1000 True partition of 〚1,n〛 in D∗ = 11 segments In each segment, randomly choose a distrib. among 7 of them

100 200 300 400 500 600 700 800 900 1000 5 10 15 20 25 30

Kernels to detect abrupt changes in time series Alain Celisse

slide-27
SLIDE 27

25/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 1: Changes in (mean,variance) with D∗ = 11

Hausdorff and Frobenius distances

20 40 60 80 100 20 40 60 80 100 Dimension Frobenius dist. 20 40 60 80 100 100 200 300 400 500 Hausdorff dist. Frobenius Hausdorff 20 40 60 80 100 20 40 60 80 100 Dimension Frobenius dist. 20 40 60 80 100 100 200 300 400 500 Hausdorff dist. Frobenius Hausdorff

(a) Gaussian (kG) (b) Linear (kLin) − → K Lin puts changes in noise

Kernels to detect abrupt changes in time series Alain Celisse

slide-28
SLIDE 28

26/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 1: Changes in (mean,variance) Cont’.

Change-points frequencies for D = D∗ (500 repetitions)

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

(a) Gaussian (kG) (b) Linear (kLin) − → K Lin puts changes in noise

Kernels to detect abrupt changes in time series Alain Celisse

slide-29
SLIDE 29

27/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Empirical assessment

Scenario 2: No change in (mean,variance) R-valued X1, . . . , Xn with n = 1000 True partition of 〚1,n〛 in D∗ = 11 segments In each segment, randomly choose a distrib. among 3 of them

100 200 300 400 500 600 700 800 900 1000 −1 1 2 3 4

Kernels to detect abrupt changes in time series Alain Celisse

slide-30
SLIDE 30

28/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 2: No change in (mean,variance)

Hausdorff and Frobenius distances

20 40 60 80 100 20 40 60 80 100 Dimension Frobenius dist. 20 40 60 80 100 100 200 300 400 500 Hausdorff dist. Frobenius Hausdorff 20 40 60 80 100 20 40 60 80 100 Dimension Frobenius dist. 20 40 60 80 100 100 200 300 400 500 Hausdorff dist. Frobenius Hausdorff

(a) Gaussian (kG) (b) Linear (kLin) − → K Lin puts changes in noise

Kernels to detect abrupt changes in time series Alain Celisse

slide-31
SLIDE 31

29/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 2: No change in (mean,variance) Cont’.

Change-points frequencies for D = D∗

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

(a) Gaussian (kG) (b) Linear (kLin) − → K Lin puts changes in noise

Kernels to detect abrupt changes in time series Alain Celisse

slide-32
SLIDE 32

30/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 2: No change in (mean,variance) Cont’.

Change-points frequencies for D = D∗

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

(a) Gaussian (kG) (b) Hermite (kH5) − → K H5 less sensitive to changes than K G

(characteristic kernels)

Kernels to detect abrupt changes in time series Alain Celisse

slide-33
SLIDE 33

31/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Empirical assessment

Scenario 3: Histogram-valued data Histogram-valued X1, . . . , Xn with 20 bins and n = 1000 True partition of 〚1,n〛 in D∗ = 11 segments In each segment, randomly choose DP(p1, . . . , p20) (Dirichlet)

100 200 300 400 500 600 700 800 900 1000 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 0.5 1 1.5 100 200 300 400 500 600 700 800 900 1000 0.5 1 1.5

Kernels to detect abrupt changes in time series Alain Celisse

slide-34
SLIDE 34

32/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 3: Histogram-valued data

Hausdorff and Frobenius distances

20 40 60 80 100 20 40 60 80 100 Dimension Frobenius dist. 20 40 60 80 100 100 200 300 400 500 Hausdorff dist. Frobenius Hausdorff 20 40 60 80 100 20 40 60 80 100 Dimension Frobenius dist. 20 40 60 80 100 100 200 300 400 500 Hausdorff dist. Frobenius Hausdorff

(a) χ2 (kχ2) (b) Gaussian (kG) − → K G misses change-points by ignoring the structure of the data

Kernels to detect abrupt changes in time series Alain Celisse

slide-35
SLIDE 35

33/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 3: Histogram-valued data Cont’.

Change-points frequencies for D = D∗

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

(a) χ2 (kχ2) (b) Gaussian (kG) − → potential gain in exploiting the structure of the data

Kernels to detect abrupt changes in time series Alain Celisse

slide-36
SLIDE 36

34/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

IV How many change-points?

Kernels to detect abrupt changes in time series Alain Celisse

slide-37
SLIDE 37

35/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

KCP Algorithm (reminder)

Input:

  • bservations:

X1, . . . , Xn ∈ X, kernel: k : X × X → R, Step 1: ∀1 ≤ D ≤ Dmax, compute:

  • τ(D) ∈ Argminτ∈T D

n

  • Rn (τ)
  • → dynamic programming

Step 2: Find:

  • D ∈ Argmin1≤D≤Dmax
  • Rn (

τ(D)) + pen ( τ(D))

  • → model selection

Output: sequence of change-points: τ = τ

  • D
  • .

T D

n =

  • (τ0, . . . , τD) ∈ ND+1 / 0 = τ0 < τ1 < τ2 < · · · < τD = n
  • Kernels to detect abrupt changes in time series

Alain Celisse

slide-38
SLIDE 38

36/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Empirical risk minimizer

Assumption: ∀1 ≤ i ≤ n, Yi = µ∗

i + εi

µ∗ = (µ∗

1, . . . , µ∗ n) :

piecewise-constant Model τ = (τ0, τ1, . . . , τD),

(with τ0 = 0 and τD = n)

Vector space (model):

Fτ =

  • (f1, . . . , fn) ∈ Hn | fτℓ−1+1 = · · · = fτℓ, ∀1 ≤ ℓ ≤ Dτ
  • (Dτ: number of segments of τ)

Estimator of µ∗:

  • µτ = Argminf ∈Fτ
  • Y − f 2

, with f 2 =

n

  • i=1

fi2

H

  • µτ = ΠFτ Y :
  • rthogonal projection onto Fτ

Kernels to detect abrupt changes in time series Alain Celisse

slide-39
SLIDE 39

37/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Choose the number of change-points

Ideal penalty: τ ∗ ∈ Argminτ∈Tn µ⋆ − µτ2 (oracle segmentation) = Argminτ∈Tn

  • Y −

µτ2 + penid(τ)

  • ,

with penid(τ) := 2 Πτε2 − 2 < (I − Πτ)µ⋆, ε >. Strategy

1 Concentration inequalities for linear and quadratic terms. 2 Derive a tight upper bound pen ≥ penid with high probability. Kernels to detect abrupt changes in time series Alain Celisse

slide-40
SLIDE 40

38/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Concentration of the quadratic term

Assumptions: maxi YiH ≤ M a.s. (Db) . maxi E

  • εi2

H

  • ≤ vmax

(Vmax) . Theorem (Quadratic term) Assuming (Db)-(Vmax), then for every τ ∈ Tn, x > 0, θ ∈ (0, 1],

  • Πτε2 − E
  • Πτε2
  • ≤ θE
  • Πτµ⋆ −

µτ2 + θ−1Lvmaxx ,

with probability at least 1 − 2e−x, where L is a constant. Rks: No Gaussian or constant-variance assumption Deals with Hilbert-valued vectors (not only in Rd) The x deviation term allows large collections

Kernels to detect abrupt changes in time series Alain Celisse

slide-41
SLIDE 41

39/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Oracle inequality

Theorem Assume (Db)-(Vmax). For every x > 0,

  • τ ∈ Argminτ
  • Y −

µτ2 + pen(τ)

  • ,

where pen(τ) = Dτ

  • C1 ln
  • n

  • + C2
  • (C1, C2 > 0).

Then with prob. ≥ 1 − 2e−x, µ⋆ − µ

τ2 ≤ ∆1 inf τ

  • µ⋆ −

µτ2 + pen(τ)

  • + ∆2 ,

where ∆1 ≥ 1 and ∆2 > 0 is a remainder term. Rk: In Birg´ e, Massart (2001), pen(τ) = Dτ

  • c1 ln
  • n

  • + c2
  • .

Kernels to detect abrupt changes in time series Alain Celisse

slide-42
SLIDE 42

40/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Model selection procedure

Algorithm

1 For every 1 ≤ D ≤ Dmax,

  • τ(D) ∈ Argminτ, Dτ=D
  • Y −

µτ2 ,

2 Define

  • D = ArgminD
  • Y −

µ

τ(D)

  • 2 + D
  • C1 ln

n D

  • + C2
  • .

where C1, C2: computed by simulations (slope heuristics).

3 Final estimator:

  • µ

τ :=

µ

τ( D).

Kernels to detect abrupt changes in time series Alain Celisse

slide-43
SLIDE 43

41/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 1: Changes in (mean,variance)

Behavior of the penalized criterion

20 40 60 80 100 −150 −100 −50 50 100 150 Dimension Penalized crit Risk Empirical risk 20 40 60 80 100 −2 −1 1 2 3 4 x 10

9

Dimension Penalized crit Risk Empirical risk

(a) Gaussian (kG) (b) Hermite (kH5) − → crit( τ(D)) looks like the risk for both kG and kH5

Kernels to detect abrupt changes in time series Alain Celisse

slide-44
SLIDE 44

42/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 1: Changes in (mean,variance) Cont’.

Change-points frequencies and ˆ D

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

4 5 6 7 8 9 10 11 12 13 14 15 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

(a) Fequencies (exact recovery) (b) Selected dimension (D∗ = 11)

Kernels to detect abrupt changes in time series Alain Celisse

slide-45
SLIDE 45

43/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 2: No change in (mean,variance)

Behavior of the penalized criterion

20 40 60 80 100 −150 −100 −50 50 100 150 Dimension Penalized crit Risk Empirical risk 20 40 60 80 100 −6 −4 −2 2 4 6 8 x 10

7

Dimension Penalized crit Risk Empirical risk

(a) Gaussian (kG) (b) Hermite (kH5) − → crit( τ(D)) looks like the risk for both kG and kH5

Kernels to detect abrupt changes in time series Alain Celisse

slide-46
SLIDE 46

44/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 2: No change in (mean,variance) Cont’.

Change-points frequencies and ˆ D

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position

  • Freq. of selected chgpts

4 5 6 7 8 9 10 11 12 13 14 15 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

(a) Fequencies (exact recovery) (b) Selected dimension (D∗ = 11)

Kernels to detect abrupt changes in time series Alain Celisse

slide-47
SLIDE 47

45/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 3: histogram-valued (Cont’.)

Behavior of the penalized criterion

20 40 60 80 100 −60 −40 −20 20 40 60 Dimension Penalized crit Risk Empirical risk 20 40 60 80 100 −40 −30 −20 −10 10 20 30 40 Dimension Penalized crit Risk Empirical risk

(a) χ2 (kχ2) (b) Gaussian (kG) − → Crit looks like the risk for both kG and kχ2

Kernels to detect abrupt changes in time series Alain Celisse

slide-48
SLIDE 48

46/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Concluding remarks

Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (Rd) and structured (graphs,. . . )

  • bjects

Kernels to detect abrupt changes in time series Alain Celisse

slide-49
SLIDE 49

46/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Concluding remarks

Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (Rd) and structured (graphs,. . . )

  • bjects

Statistical precision/computation trade-offs: Open challenges Reduce the O(n2) time complexity → approx. to the Gram matrix Investigate the link between kernel and abrupt changes Revisit the slope heuristic to: (i) preserve accuracy, and (ii) save computation resources

Thank you!

Kernels to detect abrupt changes in time series Alain Celisse

slide-50
SLIDE 50

46/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Concluding remarks

Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (Rd) and structured (graphs,. . . )

  • bjects

Statistical precision/computation trade-offs: Open challenges Reduce the O(n2) time complexity → approx. to the Gram matrix Investigate the link between kernel and abrupt changes Revisit the slope heuristic to: (i) preserve accuracy, and (ii) save computation resources

Thank you!

Kernels to detect abrupt changes in time series Alain Celisse

slide-51
SLIDE 51

47/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts? Kernels to detect abrupt changes in time series Alain Celisse

slide-52
SLIDE 52

48/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Scenario 3: Histogram-valued (Cont’.)

Change-points frequencies and ˆ D

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Position

  • Freq. of selected chgpts

100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Position

  • Freq. of selected chgpts

(a) χ2 (kχ2) (b) Gaussian (kG)

Kernels to detect abrupt changes in time series Alain Celisse

slide-53
SLIDE 53

49/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Sketch of proof

1 Πτε2 =

λ∈m 1 nλ

  • i∈λ εi
  • 2

H = λ∈m Tλ.

2

  • i∈λ εi
  • 2

H

  • λ∈m are independent r.v. .

3 Bernstein’s inequality to Πτε2

(⋆).

4 For every q ≥ 2, upper bound of E

  • T q

λ

  • .

5 Pinelis-Sakhanenko’s inequality on

  • i∈λ εi
  • H:

∀x > 0, P  

  • i∈λ

εi

  • H

> x   ≤ 2 exp

x2 2

  • σ2

λ + bλx

  • ,

with bλ = 2M/3 and σ2

λ = i∈λ vi.

Kernels to detect abrupt changes in time series Alain Celisse

slide-54
SLIDE 54

50/47

Intro. Framework Algorithm Change-pts location? (D fixed) How many chg-pts?

Bernstein rather than Talagrand

Talagrand’s inequality Πτε = supf ∈Bn < f , Πτε >= supf ∈Bn n

i=1 < fi, (Πτε)i >H

P

  • Πτε ≤ E [ Πτε ] +

√ 2vx + b 3x

  • ,

with v = n

i=1 supf E

  • < fi, (Πτε)i >2

H

  • + 16bE [ Πτε ].

Bernstein’s inequality σ2 = sup

f n

  • i=1

E

  • < fi, (Πτε)i >2

H

  • = E
  • Πτε2

.

Kernels to detect abrupt changes in time series Alain Celisse