Statistical Analysis of Persistent Homology Genki Kusano (Tohoku - - PowerPoint PPT Presentation

statistical analysis of persistent homology
SMART_READER_LITE
LIVE PREVIEW

Statistical Analysis of Persistent Homology Genki Kusano (Tohoku - - PowerPoint PPT Presentation

Statistical Analysis of Persistent Homology Genki Kusano (Tohoku University, D1) Topology and Computer 2016, Oct 28 @ Akita. Collaborators Kenji Fukumizu (The Institute of Statistical Mathematics Yasuaki Hiraoka (Tohoku University, AIMR)


slide-1
SLIDE 1

Statistical Analysis of Persistent Homology

Genki Kusano (Tohoku University, D1)

Topology and Computer 2016, Oct 28 @ Akita. Collaborators: Kenji Fukumizu (The Institute of Statistical Mathematics) Yasuaki Hiraoka (Tohoku University, AIMR)

Persistence weighted Gaussian kernel for topological data analysis.

Proceedings of the 33rd ICML, pp. 2004–2013, 2016

slide-2
SLIDE 2
  • Interests : Applied topology, topological data analysis


B3 : Homology group, homological algebra
 B4 : Persistent homology, computational homology 
 M1 : Applied topology to sensor network


“Relative interleavings and applications to sensor networks”,
 JJIAM, 33(1),99-120, 2016.


M2 : Statistics, machine learning, kernel methods


“Persistence weighted Gaussian kernel for topological data analysis”, 
 ICML, pp. 2004–2013, 2016.


D1(now) : Time series analysis, dynamics, information geometry, …

Self introduction

  • Announcement : Joint Mathematics Meetings, January 4, 2017, Atlanta


★Statistical Methods in Computational Topology and Applications
 Sheaves in Topological Data Analysis

slide-3
SLIDE 3

Topological Data Analysis (TDA, 位相的データ解析) Mathematical methods for characterizing “shapes of data”

−10 −5 5 10 15 −5 5 10 −15 −10 −5 5 10 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30

Liquid Glass Solid Motivation of this work

Atomic configurations of liquid, glass, and solid state of silica ( , silica — composed of silicon and oxygen)

SiO2

At the configuration level, it is difficult to distinguish liquid and glass state.

slide-4
SLIDE 4

[˚ A2] [˚ A2]

−0.5 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

Multiplicity

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

[˚ A2] [˚ A2]

−0.5 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

Multiplicity

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

[˚ A2] [A ]

−0.5 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

Multiplicity

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

−10 −5 5 10 15 −5 5 10 −15 −10 −5 5 10 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 30

Persistent homology / Persistence diagram Topological descriptor of data

  • Y. Hiraoka et al., Hierarchical structures of amorphous solids

characterized by persistent homology, PNAS, 113(26):7035–7040, 2016.

Motivation of this work Liquid Glass Solid ! ?

slide-5
SLIDE 5

Liquid Glass

[˚ A2] [˚ A2]

−0.5 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

Multiplicity

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

[˚ A2] [˚ A2]

−0.5 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3

Multiplicity

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Classification problem

  • Q. Can we distinguish them mathematically?
  • A. Make a statistical framework for persistence diagrams

Motivation of this work

slide-6
SLIDE 6

Section 1 What is a persistence diagram

slide-7
SLIDE 7

β X bβ dβ bβ xβ dβ R2 Persistence diagram β β

Xr = S

xi∈X B(xi; r)

B(x; r) = {z | d(x, z) ≤ r}

slide-8
SLIDE 8

α X D1(X) = {xα, xβ, xγ, · · · | xa = (bα, dα)} dβ bα D1(X) xβ xα xβ R2 bβ Persistence diagram β

slide-9
SLIDE 9

Definition of persistence diagram

For a filtration , we compute homology groups with a field coefficient 
 and obtain a sequence .

Definition

This sequence is called a persistent homology. A persistent homology can be seen as a representation of 


  • quiver.

Hq(X)

An

In this talk, we set a filtration by the union of balls

Xr = S

xi∈X B(xi; r)

X : X1 ⊂ X2 ⊂ · · · ⊂ Xn Hq(X) : Hq(X1)→Hq(X2)→ · · · →Hq(Xn) K

slide-10
SLIDE 10

Hq(X) ∼ = M

i∈I

I[bi, di] (I is a finite set)

From the decomposition , the persistence diagram is defined by .

Hq(X) ∼ = M

i∈I

I[bi, di]

Dq(X) = {(bi, di) | i ∈ I}

I[b, d] : 0 ← → 0 ← → · · · ← → 0 ← →

b

F ← → F ← → · · · ← →

d

F ← → 0 ← → · · · ← → 0

From Gabriel and Krull-Remak-Schmidt theorem, there is the following decomposition:

Definition of persistence diagram

Remark: can be seen as a module over and 
 it can be decomposed from the structure theorem for PID.

K[t]

L

r Hq(Xr)

slide-11
SLIDE 11

The lifetime of a cycle is called persistence.

β

α

Persistence pers(xα) = dα − bα

Definition

xα D1(X) xβ xα xβ R2

pers(x) = kx ∆k∞

γ xγ

A cycle with small persistence can be seen as a small cycle, and sometimes noisy cycle.

∆ = {(a, a) | a ∈ R}

slide-12
SLIDE 12

Remark: is a metric space.

Metric structure of persistence diagram (D, dB)

The set of persistence diagrams is defined by , where .

R2

ul = {(b, d) | b ≤ d ∈ R}

D = {D | D is a multiset in R2

ul and |D| < ∞}

The bottleneck ( -Wasserstein) metric , where is the diagonal set, becomes a distance on the set of persistence diagrams.

∆ = {(a, a) | a ∈ R}

dB(D, E) = inf

γ

sup

x∈D∪∆

kx γ(x)k∞ (γ : D [ ∆!E [ ∆ is bijective)

slide-13
SLIDE 13

Stability theorem

This map is Lipchitz continuous. ( Betti number is not continuous.) Significant property

X→Dq(X) X→βq(X) = dim Hq(X)

X Y

Birth Death

For finite subsets , , where is the Hausdorff distance.

dH(X, Y ) = max ⇢ max

p∈X min q∈Y d(p, q), max q∈Y min p∈X d(p, q)

  • Theorem[Cohen-Steiner et al., 2007]

dB(Dq(X), Dq(Y )) ≤ dH(X, Y )

X, Y ⊂ Rd

slide-14
SLIDE 14

Stability theorem

X Y

β1 α1

Birth Death α1 β1

This map is Lipchitz continuous. ( Betti number is not continuous.) Significant property

X→Dq(X) X→βq(X) = dim Hq(X)

For finite subsets , , where is the Hausdorff distance.

dH(X, Y ) = max ⇢ max

p∈X min q∈Y d(p, q), max q∈Y min p∈X d(p, q)

  • Theorem[Cohen-Steiner et al., 2007]

dB(Dq(X), Dq(Y )) ≤ dH(X, Y )

X, Y ⊂ Rd

slide-15
SLIDE 15

Statistical Topological Data Analysis

Data Persistence diagram X Dq(X) Prediction Classification Testing Estimation Facts (1)A persistence diagram is not a vector. (2)Standard statistical method is for vectors 
 (multivariate analysis)

slide-16
SLIDE 16

Statistical Topological Data Analysis

Data Persistence diagram X Dq(X) Vector Prediction Classification Testing Estimation Facts (1)A persistence diagram is not a vector. (2)Standard statistical method is for vectors 
 (multivariate analysis) Make a vector representation of persistence diagram by kernel method This work

slide-17
SLIDE 17

Section 2 Kernel method

~Statistical method for non-vector data~

slide-18
SLIDE 18

Statistics for non-vector data

Let be a data set and be obtained data. To consider statistical properties of the data, it is sometimes needed to calculate summaries, like mean/average:

x1, · · · , xn ∈ Ω

x1, · · · , xn → 1 n

n

X

i=1

xi

To calculate statistical summaries, the data set is desired to have structures of addition, multiplication by scalers, and inner product, that is, should be an inner product space.

Ω Ω

The space of persistence diagrams is not an inner space.

slide-19
SLIDE 19

While does not always have an inner product, by defining a map , where is an inner product space, we can consider statistical summaries in .

φ : Ω→H

H

H

x1, · · · , xn → φ(x1), · · · , φ(xn) → 1 n

n

X

i=1

φ(xi) ∈ H

(well-defined) Fact Many statistical summaries and machine learning techniques are calculated from the value of inner product:

hφ(xi), φ(xj)iH Statistics for non-vector data

slide-20
SLIDE 20

In kernel method, a positive definite kernel is used as “non-linear’’ inner product on the data set.

k : Ω × Ω→R

k(x, y) = hφ(x), φ(y)iH

For an element , is a function and a vector in the functional space .

k(·, x) : Ω→R

C(Ω)

x ∈ Ω

In many cases, what we need is just the Gram matrix

(k(xi, xj))i,j=1,··· ,n Kernel method

H

(k(xi, xj))

Machine learning Statistics

xi

φ(xi) = k(·, xi)

slide-21
SLIDE 21

(k(xi, xj))

Machine learning Statistics

xi H

φ(xi) = k(·, xi)

Kernel trick

Kernel method

In kernel method, a positive definite kernel is used as “non-linear’’ inner product on the data set.

k : Ω × Ω→R

k(x, y) = hφ(x), φ(y)iH

For an element , is a function and a vector in the functional space .

k(·, x) : Ω→R

C(Ω)

x ∈ Ω

In many cases, what we need is just the Gram matrix

(k(xi, xj))i,j=1,··· ,n

slide-22
SLIDE 22

Kernel method

Definition Let be a set. A function is called a positive definite kernel when satisfies

  • For any , the matrix is semi-

positive definite (this matrix is called the Gram matrix).

k : Ω × Ω→R

Ω k k(x, y) = k(y, x) (k(xi, xj))

x1, . . . , xn ∈ Ω

Examples in case of ,

  • linear kernel
  • polynominal kernel
  • Gaussian kernel

Ω = Rd kL(x, y) = hx, yi kP (x, y) = (hx, yi + c)d

kG(x, y) = e− kxyk2

2σ2

slide-23
SLIDE 23

That is, is an element of the Hilbert space .
 (RKHS vector)

k(·, x) : Ω→R Hk

The inner product is computed by

. hk(·, y), k(·, x)iHk = k(x, y) Moore-Aronszajn theorem

A p.d. kernel uniquely defines a Hilbert space which is called reproducing kernel Hilbert space (RKHS) satisfying

  • for any , the function is in .
  • is dense in .
  • for any and , .

k Hk x ∈ Ω k(·, x) : Ω→R Hk Hk x ∈ Ω f ∈ Hk hf, k(·, x)iHk = f(x)

Span{k(·, x) | x ∈ Ω}

Kernel method

slide-24
SLIDE 24

µ 7! Ek(µ) := Z k(·, x)dµ(x) 2 Hk Mb(Ω) 3

Here, let be a locally compact Hausdorff space and be the set of finite signed Radon measures. Then, measures can be represented as an element of RKHS:

Mb(Ω)

This map is called kernel embedding and this integral is interpreted as the Bochner integral.

Kernel embedding

In order to consider statistical properties of (probability) distributions on , we vectorize them by a positive definite kernel.

slide-25
SLIDE 25

If is -universal, is injective, and hence becomes a metric on .

Ek

C0

k

dk(µ, ν) := kEk(µ) Ek(ν)kHk Mb(Ω)

Proposition [B.K. Sriperumbudur et al., 2011] Definition A p.d. kernel is said to be -universal if is in and is dense in .

k C0

C0(Ω)

Hk

C0(Ω)

k(·, x)

µ 7! Ek(µ) := Z k(·, x)dµ(x) 2 Hk Mb(Ω) 3

If a kernel is “nice”, the kernel embedding becomes injective. The Gaussian kernel is -universal.

kG(x, y) = e− kxyk2

2σ2

C0 Kernel embedding

slide-26
SLIDE 26

Section 3 Kernel on persistence diagram

slide-27
SLIDE 27

Birth

Death

Birth

Death

A persistence diagram can be seen as a counting measure where is the Delta measure.

δx

µD = X

x∈D

δx

Observation Point close to the diagonal has a small persistence, so it sometimes can be seen as a noisy cycle.

Kernel on persistence diagram

slide-28
SLIDE 28

By defining an appropriate function , a persistence diagram is represented as a weighted measure: µw

D =

X

x∈D

w(x)δx

Birth

Death

Birth

Death

w : R2→R

A persistence diagram can be seen as a counting measure where is the Delta measure.

δx

µD = X

x∈D

δx

Kernel on persistence diagram

slide-29
SLIDE 29

For the weighted measure, we consider the kernel embedding:

D 7! µw

D 7! Ek(µw D) =

X

x∈D

w(x)k(·, x) 2 Hk

Then, we define a kernel on persistence diagram as the Gaussian kernel on the RKHS:

KG(D, E) = exp ✓ 1 2τ 2 kEk(µw

D) Ek(µw E)k2 Hk

We can define a linear kernel on RKHS

KL(D, E) = hEk(µw

D), Ek(µw E)iHk

Kernel on persistence diagram

slide-30
SLIDE 30

While the vector is an element of , the inner product is easy to compute:

.

Hk ⊂ C(R2

ul)

The distance also has the following expansion:

hEk(µw

D), Ek(µw E)iHk =

X

x∈D

X

y∈E

w(x)w(y)k(x, y)

Ek(µw

D)

Metric on RKHS vectors

kEk(µw

D) Ek(µw E)k2 Hk =

X

x2D

X

x02D

w(x)w(x0)k(x, x0) + X

y2E

X

y02E

w(y)w(y0)k(y, y0) 2 X

x2D

X

y2E

w(x)w(y)k(x, y)

slide-31
SLIDE 31

Machine learning Statistics

D Hk D` Ek(µw

D`)

(K(Di, Dj)) µw

D`

weighted measure kernel embedding Gram matrix

Kernel on persistence diagram

slide-32
SLIDE 32

Machine learning Statistics

(K(Di, Dj)) Gram matrix

KG(D, E) = exp ✓ 1 2τ 2 kEk(µw

D) Ek(µw E)k2 Hk

kEk(µw

D) Ek(µw E)k2 Hk =

X

x2D

X

x02D

w(x)w(x0)k(x, x0) + X

y2E

X

y02E

w(y)w(y0)k(y, y0) 2 X

x2D

X

y2E

w(x)w(y)k(x, y)

D Hk D` Ek(µw

D`)

µw

D`

weighted measure kernel embedding

Kernel on persistence diagram

computable

slide-33
SLIDE 33

The reason of is for stability result.

warc(

pers(x) = d − b (x = (b, d) ∈ R2

ul)

To obtain a RKHS representation, we have used a weight function and a kernel . In this talk, we set the following functions:

D

  • !

Hk 2 2 D 7 ! Ek(µw

D) = P x∈D w(x)k(·, x)

w k kG(x, y) = e− kxyk2

2σ2

Choice of functions in PWGK

warc(x) = arctan(Cpers(x)p) (C, p > 0)

slide-34
SLIDE 34

C p

: small : large

Weight function

Strong (light) red mean that the value of weight function is high (low).

Parameters appearing control : where is a noisy region, : how sharp is the noisy decision boundary.

p C

warc(x) = arctan(Cpers(x)p)

slide-35
SLIDE 35

Main theorem [K, Fukumizu, Hiraoka, 2016] Stability theorem For a compact subset and two finite point sets in , if , then , where .

M ⊂ Rd p > d + 1

L(M, d; C, p, σ) = (√ 2 σ p p − dCMdiam(M)p−d + 4p(p − 1) p − 1 − dCMdiam(M)p−1−d ) C

X, Y M

  • EkG(µwarc

Dq(X)) − EkG(µwarc Dq(Y ))

  • HkG

≤ L(M, d; C, p, σ)dB(Dq(X), Dq(Y ))

  • Kernel embedding is injective and continuous


( is a -universal kernel)

D 7! X

x∈D

w(x)k(·, x)

C0 kG

slide-36
SLIDE 36

Statistical Topological Data Analysis

Data Prediction Classification Testing Estimation Persistence diagram X Dq(X) RKHS vector K(·, Dq(X)) (1) (2) (3) Softwares (1)CGAL+PHAT, GUDHI, Dionysus, 
 R-TDA, Ripser, HomCloud(Hiraoka lab), … (2)Kernel on persistence diagrams (This work) (3)libsvm, e-1071, scikit-learn, …

slide-37
SLIDE 37

Section 4 Demonstrations

slide-38
SLIDE 38

Synthesized two circles data

d = 0.05k (k = 0, · · · , 40)

( x2(i) = cos(ti

2) + d

y2(i) = sin(ti

2)

( x1(i) = cos(ti

1) − d

y1(i) = sin(ti

1)

ti

1 ∼

( [arccos(d), 2π − arccos(d)] (d < 1) [0, 2π] (d > 1) ti

2 ∼

( [−π + arccos(d), π − arccos(d)] (d < 1) [−π, π] (d > 1)

Xk := {(x1(i), y1(i))}i=1,··· ,100 ∪ {(x2(i), y2(i))}i=1,··· ,100 X0 X10 X20 X30

Goal : Detect the change point in this system

slide-39
SLIDE 39

Procedures

  • 1. Prepare data (two circles data)


  • 2. Compute persistence diagrams

  • 3. Compute the Gram matrix


  • 4. Apply statistical methods


kernel change point analysis
 kernel principal component analysis (kPCA)

X1, · · · , Xn D(X1), · · · , D(Xn)

KG(D(Xi), D(Xj)) = exp ✓ − 1 2τ 2

  • Ek(µw

D(Xi)) − Ek(µw D(Xj))

  • 2

Hk

slide-40
SLIDE 40

Section 5 Applications

slide-41
SLIDE 41

Application for glass transition problem

Data are atomic configurations of silica obtained from several temperatures (from liquid to glass).

slide-42
SLIDE 42

Liquid Glass

Where (which index ) is a transition point?

Application for glass transition problem D1 D80 D` D`+1 · · · · · · `

Data are atomic configurations of silica obtained from several temperatures. What we have to do is just to compute the Gram matrix of
 persistence diagrams (Kernel method).

(K(Di, Dj))i,j=1,...,n

slide-43
SLIDE 43

This KFDR result tells us that is an estimated change point and it is in the liquid-glass transition interval which is determined by physics.

The change point is estimated from the peak of the KFDR graph. D1 D80 · · · · · · KFDR(`)

[2000K, 3500K] Remark: Our method uses only topological structure of silica.

Application for glass transition problem

D39 ' 3100K D39

slide-44
SLIDE 44
  • 0.5

0.5 1

  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6

Glass: KPCA, PWGK

The 2-dimensional kernel principal component analysis (PCA) plot of persistence diagrams and colors are assigned before and after .

Application for glass transition problem

Contribution rate : 81.9%

3100K

3100K

slide-45
SLIDE 45

Conclusion Our contribution Acknowledgment Reference

Thank you for your attention

: Kernel based statistical framework on persistence diagram : Takenobu Nakamura (Tohoku University) Ippei Obayashi (Tohoku University) Emerson Escolar (Tohoku University)

  • G. Kusano, Kenji Fukumizu, and Yasuaki Hiraoka. Persistence weighted

Gaussian kernel for topological data analysis, ICML, pp. 2004–2013, 2016.