Kernel Methods for Topological Data Analysis Kenji Fukumizu The - PowerPoint PPT Presentation

Kernel Methods for Topological Data Analysis Kenji Fukumizu The Institute of Statistical Mathematics (Tokyo, Japan) Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku Univ.), supported by JST CREST STM2016 at ISM. July 22, 2016 1

Topological Data Analysis • TDA: a new method for extracting topological or geometrical information of data. Key technology = Persistence homology （ Edelsbrunner et al 2002; Carlsson 2005) Background • Complex data: Data with complex structure must be analyzed. • Progress of computational topology: Computing topological invariants becomes easy. 2

TDA: Various applications Computer Vision Data of highly complex geometric structure Often difficult to define good feature vectors / descriptors Shape signature, Biochemistry natural image statistics Material Science (Freedman & Chen 2009) Brain Science Glass Liquid Structure change of proteins eg. open / closed Brain artery trees (Kovacev-Nikolic et al 2015) e.g. age effect Non-crystal materials (Bendich et al 2014) ( Nakamura, Hiraoka, Hirata, Escolar, Nishiura. etc … Nanotechnology 26 (2015) ) Persistence homology provides a compact representation for such data. 3

Outline • A brief introduction to persistence homology • Statistical approach with kernels to topological data analysis • Applications • Material science • Protein classification • Summary 4

Topology ≅ 5

Topology: two sets are equivalent if one is deformed to the other without tearing or attaching. Topological invariants: any equivalent sets take the same value. Connected Ring Cavity components 1 0 ≅ ≅ 0 0 0 2 ≅ 1 0 1 ≅ ≅ 1 0 1 ≅ 6

Algebraic Topology • Algebraic treatment of topological spaces Compute various topological ≅ invariances Algebraic e.g. Euler number operations Simplicial complex (union of simplexes) Classify topological spaces with topological invariances. 7

• Homology group: independent “holes” 𝐼 𝑙 (𝑌) : 𝑙 -th homology group of topological space 𝑌 ( 𝑙 = 0,1,2, … ) 𝐼 0 (𝑌) 𝐼 2 (𝑌) 𝐼 1 (𝑌) 𝑙 -dimensional holes 𝐼 0 (𝑌) : connected components 0 0 ℤ ≅ 𝐼 1 (𝑌) : rings 𝐼 2 (𝑌) : cavities 0 0 … ≅ ℤ ⊕ ℤ 0 ℤ ℤ ≅ 0 ℤ ≅ ℤ The generators of 1st ℤ ℤ ℤ ⊕ ℤ homology group 8

Topology of statistical data? True structure 𝜁 − balls (e.g. manifold learning) Small 𝜁  disconnected object Noisy finite Stable extraction of topology sample is NOT easy! Large 𝜁  small ring is not visible 9

Persistence Homology • All 𝜁 considered ⊂ 𝐒 𝑒 , 𝑌 𝜁 ≔∪ 𝑗=1 𝑛 𝑛 𝑌 = 𝑦 𝑗 𝑗=1 𝐶 𝜁 (𝑦 𝑗 ) 𝜁 small 𝜁 large Two rings （ generators of 1 dim homology ） persist in a long interval. 10

• Persistence homology (formal definition) X ∶ 𝑌 1 ⊂ 𝑌 2 ⊂ ⋯ ⊂ 𝑌 𝑀 Filtration of topological spaces 𝑛 𝑙 𝐽[𝑐 𝑗 , 𝑒 𝑗 ] 𝑄𝐼 𝑙 ( X ): 𝐼 𝑙 𝑌 1 → 𝐼 𝑙 𝑌 2 → ⋯ → 𝐼 𝑙 (𝑌 𝑀 ) ≅ ⊕ 𝑗=1 Irreducible decomposition at 𝑌 𝑐 at 𝑌 𝑒 𝐽 𝑐, 𝑒 ≅ 0 → ⋯ → 0 → 𝐿 → ⋯ → 𝐿 → 0 → ⋯ → 0 𝐿: field The lifetime (birth, death) of each generator is rigorously defined, and can be computed numerically. Birth and death of a generator of 𝑄𝐼 1 (𝑌) 11

• Two popular (equivalent) expressions of PH Barcode Persistence diagram (PD) Bar from the birth to death of each generator 𝛽 Plots of the birth (b) and death (d) of each generator of PH 𝜁 in a 2D graph ( 𝑒 ≥ 𝑐 ). Handy descriptors or features of complex geometric objects Barcodes and PD are considered for each dimension. 12

Beyond topology • PH contains geometrical information more than topology Barcodes of 1-dim PH 𝜁 13

Statistical approach with kernels to topological data analysis 14

Statistical approach to TDA • Conventional TDA Data Computation of PH Visualization （ PD ） Analysis by experts e.g. Molecular Software dynamics simulation CGAL / PHAT CGAL: The Computational Geometry Algorithms Library http://www.cgal.org/ PHAT: Persistent Homology Algorithm Toolbox https://bitbucket.org/phat-code/phat 15

• Statistical approach to TDA （ Kusano, Fukumizu, Hiraoka ICML 2016; Reininghaus et al CVPR 2015; Kwitt et al NIPS2015; Fasy et al 2014 ） Many data sets Many PD’s But how? PD 1 PD 2 Statistical analysis of PD’s PD 3 Computation of PH PD n Features / Descriptors 16

Kernel representation of PD • Vectorization of PD by positive definite kernel • PD = Discrete measure 𝜈 𝐸 ≔ σ 𝑨∈𝑄𝐸 𝜀 𝑨 • Kernel embedding of PD’s into RKHS 𝜈 𝐸 ↦ ∫ 𝑙 ⋅, 𝑦 𝑒𝜈 𝐸 𝑦 = σ 𝑗 𝑙(⋅, 𝑦 𝑗 ) ℇ 𝑙 : ∈ 𝐼 𝑙 , Vectorization 𝑙 : positive definite kernel • For some kernels (e.g., Gaussian, Laplace), ℇ 𝑙 is injective. 𝐼 𝑙 : corresponding RKHS • By vectorization, • a number of methods for data analysis can be applied, SVM, regression, PCA, CCA, etc. • tractable computation is possible with kernel trick. 17

Persistence Weighted Gaussian (PWG) Kernel Generators close to the diagonal may be noise, and should be discounted. 𝑧−𝑦 2 𝑙 𝑄𝑋𝐻 𝑦, 𝑧 = 𝑥 𝑦 𝑥 𝑧 exp − 2𝜏 2 𝑥 𝑦 = 𝑥 𝐷,𝑞 𝑦 ≔ arctan 𝐷Pers 𝑦 𝑞 (𝐷, 𝑞 > 0) Pers 𝑦 ≔ 𝑒 − 𝑐 for 𝑦 ∈ { 𝑐, 𝑒 ∈ 𝐒 2 |𝑒 ≥ 𝑐} Pers(x1) 18

• Stability with PWG kernel embedding • PWGK defines a distance on the persistence diagrams, 𝑒 𝑙 𝐸 1 , 𝐸 2 ≔ ℇ 𝑙 𝐸 1 − ℇ 𝑙 𝐸 2 𝐼 𝑙 , 𝐸 1 , 𝐸 2 : persistence diagrams Stability Theorem (Kusano, Hiraoka, Fukumizu 2015) 𝑁: compact subset in 𝐒 𝑒 . 𝑇 ⊂ 𝑁, 𝑈 ⊂ 𝐒 𝑒 : finite sets. A small change of a set causes only a small If 𝑞 > 𝑒 + 1 , then with PWG kernel ( 𝑞, 𝐷, 𝜏) , change in PD 𝑒 𝑙 𝐸 𝑟 (𝑇), 𝐸 𝑟 (𝑈) ≤ 𝑀 𝑒 𝐼 𝑇, 𝑈 . Lipschitz continuity 𝑀 : constant depending only on 𝑁, 𝑞, 𝑒, 𝐷, 𝜏 𝐸 𝑟 (𝑇) : 𝑟 th persistence diagram of 𝑇 𝑒 𝐼 : Haussdorff distance This stability is NOT known for Gaussian kernel. 19

2nd-level kernel Data analysis method PD1 Embedding ℇ 𝑙 𝑄𝐸 1 PD2 ℇ 𝑙 𝑄𝐸 2 PD3 Application of pos. def. … Kernel on RKHS ℇ 𝑙 𝑄𝐸 𝑛 PDm PD’s Data sets Vectors in RKHS 2nd-level kernel (SVM for measures, Muandet, Fukumizu, Dinuzzo, Schölkopf 2012) 2 𝜒 1 −𝜒 2 𝐼𝑙 • RKHS-Gaussian kernel 𝐿 𝜒 1 , 𝜒 2 = exp − 2𝜐 2 derives 2 ℇ 𝑙 (𝐸 𝑗 )−ℇ 𝑙 (𝐸 𝑘 ) 𝐼𝑙 𝐸 𝑗 , 𝐸 𝑘 : Persistence diagrams 𝐿 𝐸 𝑗 , 𝐸 𝑘 = exp − 2𝜐 2 20

Computational issue The number of generators in a PD may be large ( ≥ 10 3 , 10 4 ) 2 ℇ 𝑙 (𝑄𝐸 𝑗 )−ℇ 𝑙 (𝑄𝐸 𝑘 ) 𝐼𝑙 𝑂 𝑗 (𝑗) ∪ Δ ， 𝐿 𝑄𝐸 𝑗 , 𝑄𝐸 For 𝑄𝐸 𝑗 = σ 𝑏=1 𝜀 𝑦 𝑏 𝑘 = exp − requires 2𝜐 2 computation 2 ℇ 𝑙 (𝑄𝐸 𝑗 ) − ℇ 𝑙 (𝑄𝐸 𝑘 ) 𝐼 𝑙 𝑘 , 𝑦 𝑐 𝑗 , 𝑦 𝑐 𝑗 , 𝑦 𝑐 𝑂 𝑘 𝑂 𝑘 𝑂 𝑘 𝑂 𝑗 𝑂 𝑗 𝑂 𝑗 𝑗 𝑘 𝑘 = σ 𝑏 =1 σ 𝑐 =1 + σ 𝑏 =1 σ 𝑐 =1 − 2 σ 𝑏 =1 σ 𝑐 =1 𝑙 𝑦 𝑏 𝑙 𝑦 𝑏 𝑙 𝑦 𝑏 . 𝑦 𝑏 −𝑦 𝑐 2 ＝ 𝑃(𝑛 2 𝑂 2 )  computationally expensive for The number of exp − 2𝜏 2 𝑂 ≈ 10 4 𝑂 = max{𝑂 𝑗 |𝑗 = 1, … , 𝑜} 21

• Approximation by random features (Rahimi & Recht 2008) Gaussian distribution =: 𝑅 𝜏 By Bochner’s theorem 2𝜌 𝑓 − 𝜏2 𝜕 2 exp − 𝑦 𝑏 −𝑦 𝑐 2 𝜏 2 = 𝐷 ∫ 𝑓 −1𝜕 𝑈 𝑦 𝑏 −𝑦 𝑐 𝑒𝜕 （ Fourier transform ） 2 2𝜏 2 Approximation by sampling: 𝜕 1 , … , 𝜕 𝑀 : 𝑗. 𝑗. 𝑒. ~ 𝑅 𝜏 exp − 𝑦 𝑏 −𝑦 𝑐 2 𝑈 𝑦 𝑏 𝑓 −1𝜕 ℓ ≈ 𝐷 1 𝑈 𝑦 𝑐 𝑀 𝑓 −1𝜕 ℓ 𝑀 σ ℓ=1 2𝜏 2 (𝑗) 𝑓 −1𝜕 ℓ 𝑗 , 𝑦 𝑐 (𝑘) 𝑂 𝑘 𝑂 𝑘 𝐷 𝑈 𝑦 𝑏 𝑈 𝑦 𝑐 𝑂 𝑗 𝑂 𝑗 𝑘 𝑗 𝑘 𝑀 𝑓 −1𝜕 ℓ σ 𝑏 =1 σ 𝑐 =1 𝑙 𝑦 𝑏 ≈ 𝑀 σ 𝑏 =1 σ 𝑐 =1 σ ℓ=1 𝑥 𝑦 𝑏 𝑥 𝑦 𝑐 (𝑗) σ 𝑐=1 𝑂 𝑘 𝑥 𝑦 𝑐 (𝑘) 𝑈 𝑦 𝑏 𝑈 𝑦 𝑐 𝐷 𝑂 𝑗 𝑗 𝑘 𝑀 𝑓 −1𝜕 ℓ 𝑓 −1𝜕 ℓ 𝑀 σ ℓ =1 σ 𝑏 =1 = 𝑥 𝑦 𝑏 𝑀 dim.  2nd level Gram matrix 𝑃(𝑛𝑀𝑂 + 𝑛 2 𝑀) . c.f. 𝑃(𝑛 2 𝑂 2 ) Computational cost 𝑃(𝑀𝑂) Big reduction if 𝑀, 𝑜 ≪ 𝑂 22

Comparison: Persistence Scale Space Kernel (Reininghaus et al 2015) • PSS Kernel 𝑦 − 𝑧 2 𝑧 2 𝑙 𝑆 𝑦, 𝑧 = 1 𝑦 − ത 8𝜌𝑢 exp − exp 8𝑢 8𝑢 𝑧 = (𝑒, 𝑐) for 𝑧 = (𝑐, 𝑒) . ത Pos. def. on 𝑐, 𝑒 𝑒 ≥ 𝑐 0 on Δ . ℇ 𝑙 (𝐸) is considered. • Comparison between PWGK and PSSK • PWGK can control the discount around the diagonal independently of the bandwidth parameter. • PSSK is not shift-invariant  Random feature approximation is not applicable. • In Reininghaus et al 2015, 2nd level kernel is not considered. 23

Kernel Methods for Topological Data Analysis Kenji Fukumizu The - PowerPoint PPT Presentation

Kernel Methods for Topological Data Analysis Kenji Fukumizu The Institute of Statistical Mathematics (Tokyo, Japan) Joint work with Genki Kusano and Yasuaki Hiraoka (Tohoku Univ.), supported by JST CREST STM2016 at ISM. July 22, 2016 1

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Topological Sort Shivam Patel Viktor Zenkov Questions 1. Who first described topological sort?

Topological invariants in disordered topological insulators Subtitle: Spectral localizer of

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Topological Structures in the Analysis of Images and Data Chao Chen City University of New York

Introduction to Topological Data Analysis Persistent Homology Norm Matloff University of

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Software for TDA ACM-BCB Workshop on TDA October 2, 2016 by Svetlana Lockwood Topological Data

the multiple Chernoff distance Ke Li California Institute of Technology QMath 13, Georgia Tech

Embeddings of statistical manifolds H ong V an L e Institute of Mathematics, CAS

Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 L. Rosasco Bayesian

On the Properties of Variational Approximations in Statistical Learning. Pierre Alquier UCD

Constrained optimal discrimination designs for Fourier regression models S. Biedermann, School of

Multilevel methods for fast Bayesian optimal experimental design Ra ul Tempone Alexander von

Design and analysis of follow-up studies with genetic component Juha Karvanen Department of

Existence of the free boundary in a diffusive ow in porous media Gabriela Marinoschi