Consistent Change-point Detection with Kernels
Damien Garreau 1 Sylvain Arlot 2
1Inria, DI ENS 2Université Paris-Sud, Laboratoire de Mathématiques d’Orsay
April 6, 2016
1
Consistent Change-point Detection with Kernels Damien Garreau 1 - - PowerPoint PPT Presentation
Consistent Change-point Detection with Kernels Damien Garreau 1 Sylvain Arlot 2 1 Inria, DI ENS 2 Universit Paris-Sud, Laboratoire de Mathmatiques dOrsay April 6, 2016 1 An example: shot detection in a movie 0.7 0.6 0.5 0.4 0.3 0.2
1Inria, DI ENS 2Université Paris-Sud, Laboratoire de Mathématiques d’Orsay
1
200 400 600 800 1000 1200 1400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 2
200 400 600 800 1000 1200 1400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 3
4
◮ detect abrupt changes in the distribution of the data ◮ deal with interesting (structured) data: each point is a
5
◮ X arbitrary (measurable) set, n < +∞, and
◮ ∀i ∈
1, . . . , n , PXi the distribution of Xi.
◮ Given (Xi)1≤i≤n, we want to find the locations of the
6
◮ Take any D ∈
1, . . . , n + 1 , the set of sequences of D − 1
n ·
(τ0, . . . , τD) ∈ ND+1, 0 = τ0 < τ1 < · · · < τD = n .
◮ τ1, . . . , τD−1 are the change-points, τ is a segmentation of
1, . . . , n into Dτ segments.
◮ τ ⋆ the true segmentation, D⋆ = Dτ ⋆ the true number of
7
40 60 80 100 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Time X t0 t1 t2 t3
8
40 60 80 100 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 Time X
9
◮ With finite sample size, it is not easy to recover the true
◮ When X = Rd and the changes occur in the first moments
◮ Kernel change-point detection can tackle more subtle
10
11
◮ Let k : X × X → R be a semidefinite kernel. ◮ k is a measurable function s.t. ∀x1, . . . , xm ∈ X, the matrix
◮ Examples include
◮ the linear kernel k(x, y) = x, y, ◮ the Gaussian kernel k(x, y) = exp(− x − y2 /(2h2)), ◮ the histogram kernel k(x, y) = p
k=1 min(xk, yk),
◮ . . . 12
◮ Intuition: least-squares criterion
D
τℓ
◮ Define
n
D
τℓ
τℓ
.
◮ This is just a kernelized version, the two definitions
13
τ∈T n
criterion
function
14
100 200 300 400 500 600 700 800 900 1000 −1 1 2 3 4
(courtesy of [Arlot et al., 2012])
15
100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position
100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position
100 200 300 400 500 600 700 800 900 1000 0.1 0.2 0.3 0.4 0.5 0.6 Position
Linear, Hermite, and Gaussian kernels (courtesy of [Arlot et al., 2012]).
16
17
◮ Along with the kernel k comes a reproducing kernel Hilbert
◮ There exists a mapping Φ : X → H such that, for any
◮ The algorithm is looking for breaks in the “mean” of
◮ Whenever possible, define µ⋆ i the mean of Yi; it satisfies
i , gH = E [g(Xi)] = E [Yi, gH] . ◮ We write Yi = µ⋆ i + εi.
18
◮ H is separable. ◮ Bounded data/kernel:
◮ Finite variance:
H
19
◮ Assume that (Db) holds true; ◮ Suppose that pen(·) is “large enough”; ◮ Suppose that ∆2 × Γ is “large enough”, where
i =µ⋆ i+1
i − µ⋆ i+1
◮ Then, with high probability, D τ = D⋆.
20
2
D
τ = D⋆ ≥ 1 − e−y.
21
◮ We consider only segmentation with the same number of
◮ Several possibilities, equivalent under assumptions
λ∈τ |λ| . ◮ We focus on
1≤i≤D⋆−1
i − τ 2 i
22
◮ Assume that D⋆ is known and that (V) holds true. ◮ Take δn > 0, and choose
n (δn) ·
τ ∈ T n, Λτ ≥ δn, Dτ = D⋆.
◮ Then, for any 0 < x < Λτ ⋆,
1
1
◮ This goes to 0 whenever δn → 0 and nδn → +∞.
23
τ∈T D⋆
n (δn)
.
2
24
25
◮ Kernelized version of the change-point detection procedure
◮ Detection of changes in the distribution, not only the first
◮ Possible to deal with structured data more efficiently. ◮ Under reasonable assumptions and for a class of penalty
◮ we dispose of an oracle inequality ◮ the procedure is consistent ◮ it recovers the true localization of the change-points 26
◮ Exchange the hypothesis and still prove our results (in
◮ Tackle dependency structures within the Xis as
◮ Learn how to choose the kernel; ◮ Find interesting data!
27
28
29
30
τ2 + 2
τ, ε − 1
τ is the projection of µ⋆ on the subset of Hn “constant
f ∈ Hn, fτℓ−1+1 = · · · = fτℓ ∀1 ≤ ℓ ≤ Dτ .
30
τ2 + 2
τ, ε − 1
τ is the projection of µ⋆ on the subset of Hn “constant
f ∈ Hn, fτℓ−1+1 = · · · = fτℓ ∀1 ≤ ℓ ≤ Dτ .
τ2 + 2
τ, ε + 1
30
◮ the linear term: |µ⋆ − µ⋆ τ, ε| θ µ⋆ − µ⋆ τ2 + 1 θM2x, ◮ the quadratic term:
◮ pen(τ) − pen(τ ⋆) via technical lemmas.
31
τ∈T n
≥ 1 − 4
n =
n−1
D−1
≤ (n e /d)d.
n
n
n
d
32
1≤ℓ≤Dτ Tℓ, a sum of independant random
1 τℓ−τℓ−1
j=τℓ−1+1 εj
T q
ℓ
as an integral depending upon P
j=τℓ−1+1 εj
33