Persistent Homology in Data Science Salzburg University of Applied - - PowerPoint PPT Presentation

persistent homology in data science
SMART_READER_LITE
LIVE PREVIEW

Persistent Homology in Data Science Salzburg University of Applied - - PowerPoint PPT Presentation

Persistent Homology in Data Science Salzburg University of Applied Sciences, Austria May 13, 2020 1 Not at Dornbirn, Austria due to COVID-19. Partially supported by Digitiales Transferzentrum, Salzburg. Stefan Huber: Persistent Homology in


slide-1
SLIDE 1

Persistent Homology in Data Science

Stefan Huber <stefan.huber@fh-salzburg.ac.at>

Salzburg University of Applied Sciences, Austria

iDSC 20201 — 127.0.0.1 May 13, 2020

1

Not at Dornbirn, Austria due to COVID-19. Partially supported by Digitiales Transferzentrum, Salzburg. Stefan Huber: Persistent Homology in Data Science 1 of 15

slide-2
SLIDE 2

Data has shape

Topological Data Analysis: Often data displays some shape that carries valuable information. ◮ Persistent homology gives us the notion of components, holes, tunnels, cavities, and so on and quantifjes their “signifjcance”. Fourier analysis : signal = persistent homology : shape

Stefan Huber: Persistent Homology in Data Science 2 of 15

slide-3
SLIDE 3

An intuitive approach: Mountains and volcanoes

Let f : [0, 1]2 → [0, 1] be in C0, say, a height profjle of a geographic map. What mathematical notion is natural to capture “mountains” or “volcanoes”? ◮ Mountains are local maxima in f . Data has noise. How to fjlter to get “real mountains”? ◮ What about signifjcance, which is not height? What about volcanoes?

Stefan Huber: Persistent Homology in Data Science 3 of 15

slide-4
SLIDE 4

Topological evolution

In our simple setting, the method of persistent homology is known as watershed transformation: ◮ The super-level set Uc is the landmass above sea level c: Uc = f −1([c, 1]) = {x ∈ [0, 1]2 : f (x) ≥ c} ◮ Uc grows as c declines, starting at c = 1. Persistent homology keeps track of the topological evolution of Uc.

Stefan Huber: Persistent Homology in Data Science 4 of 15

slide-5
SLIDE 5

Topological evolution

In our simple setting, the method of persistent homology is known as watershed transformation: ◮ The super-level set Uc is the landmass above sea level c: Uc = f −1([c, 1]) = {x ∈ [0, 1]2 : f (x) ≥ c} ◮ Uc grows as c declines, starting at c = 1. Persistent homology keeps track of the topological evolution of Uc.

Stefan Huber: Persistent Homology in Data Science 4 of 15

slide-6
SLIDE 6

Topological evolution

In our simple setting, the method of persistent homology is known as watershed transformation: ◮ The super-level set Uc is the landmass above sea level c: Uc = f −1([c, 1]) = {x ∈ [0, 1]2 : f (x) ≥ c} ◮ Uc grows as c declines, starting at c = 1. Persistent homology keeps track of the topological evolution of Uc.

Stefan Huber: Persistent Homology in Data Science 4 of 15

slide-7
SLIDE 7

Topological evolution

In our simple setting, the method of persistent homology is known as watershed transformation: ◮ The super-level set Uc is the landmass above sea level c: Uc = f −1([c, 1]) = {x ∈ [0, 1]2 : f (x) ≥ c} ◮ Uc grows as c declines, starting at c = 1. Persistent homology keeps track of the topological evolution of Uc.

Stefan Huber: Persistent Homology in Data Science 4 of 15

slide-8
SLIDE 8

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-9
SLIDE 9

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-10
SLIDE 10

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-11
SLIDE 11

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-12
SLIDE 12

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-13
SLIDE 13

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-14
SLIDE 14

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-15
SLIDE 15

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-16
SLIDE 16

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-17
SLIDE 17

General setting

An n-simplex is the convex hull of n points: We have a simplicial complex S as underlying space. ◮ A fjltration (Si) is a sequence of simplicial complexes ∅ = S0 ⊂ · · · ⊂ Sm = S Think of (Si) as iteratively adding adding simplices. ◮ At each step a feature is born or dies. ◮ The lifespan of a feature (component, hole, . . . ) is its signifjcance.

1

Independent classes in the persistent homology group. Stefan Huber: Persistent Homology in Data Science 5 of 15

slide-18
SLIDE 18

Persistence diagram

ti tj We associate at timestamp ti ∈ R to the i-th step in the fjltration (Si) with t0 ≤ t1 ≤ · · · ≤ tm ◮ The persistent Betti number µi,j

p counts how many

p-dimensional features were born at time ti and died at time tj. The p-th persistence diagram is a summary description: We place a point ti tj with multiplicity

i j p .

Persistence is tj ti.

Stefan Huber: Persistent Homology in Data Science 6 of 15

slide-19
SLIDE 19

Persistence diagram

ti tj birth death ti tj (ti, tj) persistence We associate at timestamp ti ∈ R to the i-th step in the fjltration (Si) with t0 ≤ t1 ≤ · · · ≤ tm ◮ The persistent Betti number µi,j

p counts how many

p-dimensional features were born at time ti and died at time tj. The p-th persistence diagram is a summary description: ◮ We place a point (ti, tj) with multiplicity µi,j

p .

◮ Persistence is tj − ti.

Stefan Huber: Persistent Homology in Data Science 6 of 15

slide-20
SLIDE 20

Application: Peak detection for signal analysis

The function P stems from a system identifjcation for a closed-loop controller in motion control. ◮ Task: Detect peak at non-zero frequency, which is the natural frequency of the system. 0-th persistence diagram of super-levelset fjltration of P. Can be computed in a few dozen lines of code in C, as fast as sorting numbers.

20 40 60 80 100 Frequency 0.0 0.5 1.0 1.5 Amplitude P

Stefan Huber: Persistent Homology in Data Science 7 of 15

slide-21
SLIDE 21

Application: Peak detection for signal analysis

The function P stems from a system identifjcation for a closed-loop controller in motion control. ◮ Task: Detect peak at non-zero frequency, which is the natural frequency of the system. ◮ 0-th persistence diagram of super-levelset fjltration of P. ◮ Can be computed in a few dozen lines of code in C, as fast as sorting numbers.

20 40 60 80 100 Frequency 0.0 0.5 1.0 1.5 Amplitude 1 2 3 4 5 6 7 8 Persistence P

0.0 0.5 1.0 1.5 Birth level inf 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Death level 1 2 3 4 5

Stefan Huber: Persistent Homology in Data Science 7 of 15

slide-22
SLIDE 22

Application: Images analysis

The 20 most persistent 0-dimensional features to detect animal paws.

Stefan Huber: Persistent Homology in Data Science 8 of 15

slide-23
SLIDE 23

Application: Images analysis

Segmentation of cell boundaries. ◮ Chosen 1-dimensional features (cycles) by thresholding in 1st persistence diagram. ◮ Like fjnding volcanoes in geographic height maps.

Stefan Huber: Persistent Homology in Data Science 9 of 15

slide-24
SLIDE 24

Application: Shape analysis of points

◮ Place a ball Bt of radius t around each point and consider the union Pt.

◮ The connected components of Pt build clusters.

◮ The sequence (Pt) forms a fjltration. ◮ The 0-th persistence diagram encodes the evolution and signifjcance of clusters. ◮ Higher dimensional persistence diagram gives us additional information about holes.

Stefan Huber: Persistent Homology in Data Science 10 of 15

slide-25
SLIDE 25

Application: Shape analysis of points

◮ Place a ball Bt of radius t around each point and consider the union Pt.

◮ The connected components of Pt build clusters.

◮ The sequence (Pt) forms a fjltration. ◮ The 0-th persistence diagram encodes the evolution and signifjcance of clusters. ◮ Higher dimensional persistence diagram gives us additional information about holes.

Stefan Huber: Persistent Homology in Data Science 10 of 15

slide-26
SLIDE 26

Application: Shape analysis of points

◮ Place a ball Bt of radius t around each point and consider the union Pt.

◮ The connected components of Pt build clusters.

◮ The sequence (Pt) forms a fjltration. ◮ The 0-th persistence diagram encodes the evolution and signifjcance of clusters. ◮ Higher dimensional persistence diagram gives us additional information about holes.

Stefan Huber: Persistent Homology in Data Science 10 of 15

slide-27
SLIDE 27

Application: Shape analysis of points

◮ Place a ball Bt of radius t around each point and consider the union Pt.

◮ The connected components of Pt build clusters.

◮ The sequence (Pt) forms a fjltration. ◮ The 0-th persistence diagram encodes the evolution and signifjcance of clusters. ◮ Higher dimensional persistence diagram gives us additional information about holes.

Stefan Huber: Persistent Homology in Data Science 10 of 15

slide-28
SLIDE 28

Application: Shape analysis of polygons

Geometric shapes are often modeled as polygons, possibly with holes. ◮ A fjltration is obtained by a (reversed) ofgset process, e.g., Minkowski ofgsets or mitered ofgsets. ◮ [Hub18] gave effjcient algorithms to compute persistent homology based on Voronoi diagrams and straight skeletons by proving homotopy equivalence. ◮ Applications: Polygon decomposition, e.g., for high-speed NC-machining.

Stefan Huber: Persistent Homology in Data Science 11 of 15

slide-29
SLIDE 29

Application: Topological machine learning

Persistence diagrams are a summary description of topological features. ◮ How to use this topological information for machine learning?

Stefan Huber: Persistent Homology in Data Science 12 of 15

slide-30
SLIDE 30

Application: Topological machine learning

Persistence diagrams are a summary description of topological features. ◮ How to use this topological information for machine learning?

Task: texture recognition Task: object recognition Task: shape retrieval

SVM PCA k-Means T

  • pological data analysis

Machine learning

Kernel k : D × D → R

Stefan Huber: Persistent Homology in Data Science 12 of 15

slide-31
SLIDE 31

Application: Topological machine learning

Idea of [Rei+15]: Given F, solve a heat-difgusion PDE on Ω = {{x, y} ∈ R2 : y ≥ x} ◮ Solution at time t denoted by ut : Ω → R. ◮ Initial condition u0 =

p∈F δp with Dirac delta δp.

◮ Boundary condition ut = 0 on ∂Ω, as points on diagonal shall have no infmuence. Φ F Φ(F) = ut ∈ L2(Ω) We directly constructed a feature map Φ : D → L2(Ω) on the set D of persistence diagrams. ◮ The kernel is given by k(F, G) = Φ(F), Φ(G). ◮ Important: The resulting kernel is stable, i.e., Lipschitz-continuous.

Stefan Huber: Persistent Homology in Data Science 13 of 15

slide-32
SLIDE 32

Conclusion

Persistent homology turns out to be useful: ◮ Clustering, image analysis, shape recognition, image segmentation, time series analysis, analysis of biological

structures (drug molecules, roots, . . . ), material analysis, . . .

It contributes to data science in two ways:

1 Persistent diagrams make various methods of data science applicable. 2 It is a tool within data science to help understanding methods.

◮ E.g., explainable AI based on persistence of the inter-layer mapping in feed forward nets. [CG18].

Stefan Huber: Persistent Homology in Data Science 14 of 15

slide-33
SLIDE 33

Interreg Österreich-Bayern project

KI-Net – Bausteine für KI-basierte Optimierungen in der industriellen Fertigung: ◮ Lead: SCCH Hagenberg (OÖ) ◮ FH Salzburg ◮ TH Rosenheim ◮ Universität Innsbruck ◮ Hochschule Kempten

Stefan Huber: Persistent Homology in Data Science 15 of 15

slide-34
SLIDE 34

Bibliography I

[CG18] Gunnar E. Carlsson and Rickard Brüel Gabrielsson. “Topological Approaches to Deep Learning.” In: CoRR abs/1811.01122 (2018). arXiv: 1811.01122. url: http://arxiv.org/abs/1811.01122. [Hub18] Stefan Huber. “The Topology of Skeletons and Ofgsets.” In: Proc. 34th Europ. Workshop on

  • Comp. Geom. (EuroCG ’18). Mar. 2018.

[Rei+15]

  • J. Reininghaus et al. “A Stable Multi-Scale Kernel for Topological Machine Learning.” In:
  • Proc. 2015 IEEE Conf. Comp. Vision & Pat. Rec. (CVPR ’15). Boston, MA, USA, June

2015, pp. 4741–4748.