Kernel Methods for Topological Data Analysis
Kenji Fukumizu
The Institute of Statistical Mathematics (Tokyo, Japan) Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku Univ.), supported by JST CREST STM2016 at ISM. July 22, 2016
1
Kernel Methods for Topological Data Analysis Kenji Fukumizu The - - PowerPoint PPT Presentation
Kernel Methods for Topological Data Analysis Kenji Fukumizu The Institute of Statistical Mathematics (Tokyo, Japan) Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku Univ.), supported by JST CREST STM2016 at ISM. July 22, 2016 1
Kenji Fukumizu
The Institute of Statistical Mathematics (Tokyo, Japan) Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku Univ.), supported by JST CREST STM2016 at ISM. July 22, 2016
1
information of data. Key technology = Persistence homology
(Edelsbrunner et al 2002; Carlsson 2005) Background
Data with complex structure must be analyzed.
Computing topological invariants becomes easy.
2
3
Brain artery trees
e.g. age effect (Bendich et al 2014)
Brain Science
Structure change
(Kovacev-Nikolic et al 2015)
Material Science Computer Vision
Shape signature, natural image statistics
(Freedman & Chen 2009)
Data of highly complex geometric structure
Often difficult to define good feature vectors / descriptors
etc…
Non-crystal materials
(Nakamura, Hiraoka, Hirata, Escolar, Nishiura. Nanotechnology 26 (2015))
Liquid Glass
Persistence homology provides a compact representation for such data. Biochemistry
4
5
Topology: two sets are equivalent if one is deformed to the other without tearing or attaching. Topological invariants: any equivalent sets take the same value.
6
≅ ≅ ≅ ≅ ≅ ≅
Connected components
Ring Cavity
1 2 1 1 1 1
7
Algebraic
Compute various topological invariances e.g. Euler number
Simplicial complex (union of simplexes)
Classify topological spaces with topological invariances.
8
𝐼𝑙(𝑌): 𝑙-th homology group of topological space 𝑌 (𝑙 = 0,1,2, …)
𝐼0(𝑌): connected components 𝐼1(𝑌): rings 𝐼2(𝑌): cavities …
≅ ≅ ≅
𝐼0(𝑌)
ℤ ⊕ ℤ ℤ ℤ ℤ ℤ
𝐼1(𝑌) 𝐼2(𝑌)
ℤ ≅ ℤ ℤ ⊕ ℤ
The generators of 1st homology group
𝑙-dimensional holes
ℤ
9
Noisy finite sample True structure 𝜁 −balls (e.g. manifold learning) Small 𝜁 disconnected object Large 𝜁 small ring is not visible
Stable extraction of topology is NOT easy!
𝑌 = 𝑦𝑗 𝑗=1
𝑛
⊂ 𝐒𝑒 , 𝑌𝜁 ≔∪𝑗=1
𝑛
𝐶𝜁(𝑦𝑗)
10
𝜁 small 𝜁 large
Two rings (generators of 1 dim homology) persist in a long interval.
Filtration of topological spaces X ∶ 𝑌1 ⊂ 𝑌2 ⊂ ⋯ ⊂ 𝑌𝑀 𝑄𝐼𝑙(X): 𝐼𝑙 𝑌1 → 𝐼𝑙 𝑌2 → ⋯ → 𝐼𝑙(𝑌𝑀) ≅ ⊕𝑗=1
𝑛𝑙 𝐽[𝑐𝑗, 𝑒𝑗]
𝐽 𝑐, 𝑒 ≅ 0 → ⋯ → 0 → 𝐿 → ⋯ → 𝐿 → 0 → ⋯ → 0
11
at 𝑌𝑐 at 𝑌𝑒
𝐿: field
Irreducible decomposition
The lifetime (birth, death) of each generator is rigorously defined, and can be computed numerically. Birth and death of a generator of 𝑄𝐼1(𝑌)
12
𝛽 𝜁
Barcodes and PD are considered for each dimension. Bar from the birth to death
Barcode Persistence diagram (PD)
Plots of the birth (b) and death (d)
in a 2D graph (𝑒 ≥ 𝑐).
Handy descriptors or features
13
Barcodes of 1-dim PH 𝜁
14
15
Data Computation of PH Visualization(PD) Analysis by experts
Software CGAL / PHAT
CGAL: The Computational Geometry Algorithms Library http://www.cgal.org/ PHAT: Persistent Homology Algorithm Toolbox https://bitbucket.org/phat-code/phat
e.g. Molecular dynamics simulation
(Kusano, Fukumizu, Hiraoka ICML 2016; Reininghaus et al CVPR 2015; Kwitt et al NIPS2015; Fasy et al 2014)
16
Many data sets
Computation
PD1 PD2 PD3 PDn
Many PD’s
Statistical analysis of PD’s Features / Descriptors But how?
ℇ𝑙: 𝜈𝐸 ↦ ∫ 𝑙 ⋅, 𝑦 𝑒𝜈𝐸 𝑦 = σ𝑗 𝑙(⋅, 𝑦𝑗) ∈ 𝐼𝑙, Vectorization
SVM, regression, PCA, CCA, etc.
17
𝑙: positive definite kernel 𝐼𝑙: corresponding RKHS
Generators close to the diagonal may be noise, and should be discounted. 𝑙𝑄𝑋𝐻 𝑦, 𝑧 = 𝑥 𝑦 𝑥 𝑧 exp −
𝑧−𝑦 2 2𝜏2
𝑥 𝑦 = 𝑥𝐷,𝑞 𝑦 ≔ arctan 𝐷Pers 𝑦 𝑞 (𝐷, 𝑞 > 0) Pers 𝑦 ≔ 𝑒 − 𝑐 for 𝑦 ∈ { 𝑐, 𝑒 ∈ 𝐒2|𝑒 ≥ 𝑐}
18
Pers(x1)
𝑒𝑙 𝐸1, 𝐸2 ≔ ℇ𝑙 𝐸1 − ℇ𝑙 𝐸2
𝐼𝑙, 𝐸1, 𝐸2: persistence diagrams
Stability Theorem (Kusano, Hiraoka, Fukumizu 2015) 𝑁: compact subset in 𝐒𝑒. 𝑇 ⊂ 𝑁, 𝑈 ⊂ 𝐒𝑒: finite sets. If 𝑞 > 𝑒 + 1, then with PWG kernel (𝑞, 𝐷, 𝜏), 𝑒𝑙 𝐸𝑟(𝑇), 𝐸𝑟(𝑈) ≤ 𝑀 𝑒𝐼 𝑇, 𝑈 .
𝑀: constant depending only on 𝑁, 𝑞, 𝑒, 𝐷, 𝜏 𝐸𝑟(𝑇): 𝑟 th persistence diagram of 𝑇 𝑒𝐼: Haussdorff distance
This stability is NOT known for Gaussian kernel.
19
A small change of a set causes only a small change in PD Lipschitz continuity
2nd-level kernel (SVM for measures, Muandet, Fukumizu, Dinuzzo, Schölkopf 2012)
𝐿 𝜒1, 𝜒2 = exp −
𝜒1−𝜒2 𝐼𝑙
2
2𝜐2
derives 𝐿 𝐸𝑗, 𝐸
𝑘 = exp − ℇ𝑙(𝐸𝑗)−ℇ𝑙(𝐸𝑘) 𝐼𝑙
2
2𝜐2
20
PD1 PD2 PD3 PDm
ℇ𝑙 𝑄𝐸1 ℇ𝑙 𝑄𝐸2 … ℇ𝑙 𝑄𝐸𝑛 Vectors in RKHS PD’s Data sets
Application of pos. def.
Kernel on RKHS
𝐸𝑗, 𝐸
𝑘: Persistence diagrams
Data analysis method Embedding
The number of generators in a PD may be large (≥ 103, 104 ) For 𝑄𝐸𝑗 = σ𝑏=1
𝑂𝑗
𝜀𝑦𝑏
(𝑗) ∪ Δ, 𝐿 𝑄𝐸𝑗, 𝑄𝐸
𝑘 = exp − ℇ𝑙(𝑄𝐸𝑗)−ℇ𝑙(𝑄𝐸𝑘) 𝐼𝑙
2
2𝜐2
requires computation
ℇ𝑙(𝑄𝐸𝑗) − ℇ𝑙(𝑄𝐸
𝑘) 𝐼𝑙 2
= σ𝑏 =1
𝑂𝑗
σ𝑐 =1
𝑂𝑗
𝑙 𝑦𝑏
𝑗 , 𝑦𝑐 𝑗
+ σ𝑏 =1
𝑂𝑘
σ𝑐 =1
𝑂𝑘
𝑙 𝑦𝑏
𝑘 , 𝑦𝑐 𝑘
− 2 σ𝑏 =1
𝑂𝑗
σ𝑐 =1
𝑂𝑘
𝑙 𝑦𝑏
𝑗 , 𝑦𝑐 𝑘
.
The number of exp −
𝑦𝑏−𝑦𝑐 2 2𝜏2
= 𝑃(𝑛2𝑂2) computationally expensive for 𝑂 ≈ 104
21
𝑂 = max{𝑂𝑗|𝑗 = 1, … , 𝑜}
By Bochner’s theorem exp − 𝑦𝑏−𝑦𝑐 2
2𝜏2
= 𝐷 ∫ 𝑓 −1𝜕𝑈 𝑦𝑏−𝑦𝑐
𝜏2 2𝜌 𝑓−𝜏2 𝜕 2
2
𝑒𝜕 Approximation by sampling: 𝜕1, … , 𝜕𝑀: 𝑗. 𝑗. 𝑒. ~ 𝑅𝜏 exp − 𝑦𝑏−𝑦𝑐 2
2𝜏2
≈ 𝐷 1
𝑀 σℓ=1 𝑀
𝑓 −1𝜕ℓ
𝑈𝑦𝑏 𝑓 −1𝜕ℓ 𝑈𝑦𝑐
σ𝑏 =1
𝑂𝑗
σ𝑐 =1
𝑂𝑘
𝑙 𝑦𝑏
𝑗 , 𝑦𝑐 𝑘
≈
𝐷 𝑀 σ𝑏 =1 𝑂𝑗
σ𝑐 =1
𝑂𝑘
σℓ=1
𝑀
𝑥 𝑦𝑏
𝑗
𝑥 𝑦𝑐
𝑘
𝑓 −1𝜕ℓ
𝑈𝑦𝑏 (𝑗) 𝑓 −1𝜕ℓ 𝑈𝑦𝑐 (𝑘)
=
𝐷 𝑀 σℓ =1 𝑀
σ𝑏 =1
𝑂𝑗
𝑥 𝑦𝑏
𝑗
𝑓 −1𝜕ℓ
𝑈𝑦𝑏 (𝑗) σ𝑐=1
𝑂𝑘 𝑥 𝑦𝑐 𝑘
𝑓 −1𝜕ℓ
𝑈𝑦𝑐 (𝑘)
Computational cost 𝑃(𝑀𝑂) 2nd level Gram matrix 𝑃(𝑛𝑀𝑂 + 𝑛2𝑀). c.f. 𝑃(𝑛2𝑂2) Big reduction if 𝑀, 𝑜 ≪ 𝑂
22
Gaussian distribution =: 𝑅𝜏
𝑀 dim. (Fourier transform)
(Reininghaus et al 2015)
𝑙𝑆 𝑦, 𝑧 = 1 8𝜌𝑢 exp 𝑦 − 𝑧 2 8𝑢 − exp 𝑦 − ത 𝑧 2 8𝑢 ത 𝑧 = (𝑒, 𝑐) for 𝑧 = (𝑐, 𝑒). ℇ𝑙(𝐸) is considered.
bandwidth parameter.
23
0 on Δ.
S0 S1 S1 noise
Data points 1 Data points 2
No S0
with or without small circle 𝑇0.
24
S1 S0
PD1
𝑍 = 1
25
If cooled down rapidly from the liquid state, SiO2 changes into the glass state (not to crystal). Goal: identify the temperature of phase transition. Data: Molecular Dynamics simulation for SiO2. 3D arrangements of the atoms are used for computing PD at 80 temperatures.
(Nakamura et al 2015; Hiraoka et al 2015)
26
Examples of PD’s
Liquid Glass (Amorphous)
Amorphous: “soft” structure
27
𝑌𝑢, 𝑢 = 1, … , 𝑈. Kernel Change Point Analysis with Fisher Discriminant score (Harchoui et al 2009): For each 𝑢, two classes are defined by the data before and after 𝑢. Fisher score on RKHS is used.
𝑛1:𝑢 =
1 𝑢 σ𝑗=1 𝑢
Φ(𝑌𝑗) and ෝ 𝑛𝑢+1:𝑈 =
1 𝑈−𝑢 σ𝑗=𝑢+1 𝑈
Φ(𝑌𝑗).
𝑊
1:𝑢 + 𝑊 𝑢+1:𝑈 + 𝛿𝐽 −1
2( ෝ
𝑛1:𝑢 − ෝ 𝑛𝑢+1:𝑈)
𝐼𝑙 2
.
𝑢
Δ𝑢.
28
Change point 𝑢
Estimation using derivatives of enthalpy curve, but not so accurate.
Persistence diagrams, and then change point detection by Kernel FDR.
29
30
Detected change point = 3100K Enthalpy by physicist: [2000K, 3500K]
Δ𝑢
31
Sharp change between the two phases.
(Colored by the result of change point detection. Colors are not used for KPCA).
The result indicates that the phase can be identified by the snap-shot, while this is still controversial among physicists.
Liquid state glass state
represented by persistence homology
SVM is used.
32
a target of medicine. Biding an inhibitor changes the structure
M2 channel.
100 random choices for CV.
33
Cang, Mu, Wu, Opron, Xia, Wei, Molecular Based Mathematical Biology (2015) Fig. 3
for testing, and the rest used for training.
34
Relaxed (R) Taut (T) Cang, Mu, Wu, Opron, Xia, Wei, Molecular Based Mathematical Biology (2015) Fig. 4
made Molecular Topological Fingerprint (MTF) .
35
PWGK 100 88.90 MTF* (nbd) 93.91 / (bd) 98.31 84.50
# Dim Description 1 2nd longest lifetime 2 3rd longest lifetime 3 Total sum of lifetme 4 Average lifetime 5 1 Birth point of the longest generator 6 1 Longest lifetime 7 1 Birth points of the shortest generator among lifetime ≥1.5Å 8 1
9 1 Number of generators in [4.5, 5.5]Å, divided by total #atoms. 10 1 Number of generators in [3.5, 4.5)Å and (5.5, 6.5]Å, divided by total #atoms. 11 1 Total sum of lifetmes 12 1 Average lifetime 13 2 The birth point of the first generator.
MTF CV classification rates
* Results of MTF are taken from Cang et al. Molecular Based Mathematical Biology (2015).
structures.
36
References
Kusano, G., Fukumizu, K., Hiraoka, Y. (2015) Persistence weighted Gaussian kernel for topological data analysis.
Carlsson, G. (2009) Topology and data. Bull. Amer. Math. Soc., 46(2):255–308. http://dx.doi.org/10.1090/S0273- 0979-09-01249-X . Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E. G., Matsue, K., and Nishiura, Y. (2016) Description of medium- range order in amorphous structures by persistent homology. PNAS, 113(26), 7035–7040. Nakamura, T., Hiraoka, Y., Hirata, A., Escolar, E. G., and Nishiura, Y. (2015) Persistent homology and many-body atomic structure for medium-range order in the glass. Nanotechnology, 26 (304001). Reininghaus, J., Huber, S., Bauer, U., and Kwitt, R. (2015) A stable multi-scale kernel for topological machine
Kwitt, R., Huber, S., Niethammer, M., Lin, W., and Bauer, U. (2015) Statistical topological data analysis - a kernel
Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., and Singh, A. (2014) Confidence sets for persistence diagrams. The Annals of Statistics, 42(6):2301–2339, Cang, Z., Mu, L., Wu, K., Opron, K., Xia, K., and Wei, G. W. (2015) A topological approach for protein
37