A primer in persistent homology Bastian Rieck Motivation What is - - PowerPoint PPT Presentation

a primer in persistent homology
SMART_READER_LITE
LIVE PREVIEW

A primer in persistent homology Bastian Rieck Motivation What is - - PowerPoint PPT Presentation

A primer in persistent homology Bastian Rieck Motivation What is the shape of data? Bastian Rieck A primer in persistent homology 1 A simple example What is the shape of this set of points? Technically, a set of points does not have a


slide-1
SLIDE 1

A primer in persistent homology

Bastian Rieck

slide-2
SLIDE 2

Motivation

What is the ‘shape’ of data?

Bastian Rieck A primer in persistent homology 1

slide-3
SLIDE 3

A simple example

What is the shape of this set of points?

Technically, a set of points does not have a ‘shape’ . Still, we perceive the points to be arranged in a circle. How can we quantify this?

Bastian Rieck A primer in persistent homology 2

slide-4
SLIDE 4

A simple example

What is the shape of this set of points?

We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see.

Bastian Rieck A primer in persistent homology 3

slide-5
SLIDE 5

A simple example

What is the shape of this set of points?

We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see.

Bastian Rieck A primer in persistent homology 3

slide-6
SLIDE 6

A simple example

What is the shape of this set of points?

We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see.

Bastian Rieck A primer in persistent homology 3

slide-7
SLIDE 7

A simple example

What is the shape of this set of points?

We can ‘squint’ our eyes and look at how the connectivity of the points changes. The more we squint, the more connections we see.

Bastian Rieck A primer in persistent homology 3

slide-8
SLIDE 8

What did we see?

Points are arranged in a circle, as long as the radius of the disks we use to cover them does not exceed a certain critical threshold. How can we formulate this more precisely?

Bastian Rieck A primer in persistent homology 4

slide-9
SLIDE 9

Algebraic topology

The branch of mathematics that is concerned with fjnding invariant properties of high-dimensional objects.

Simple invariants

1 Dimension: R2 = R3 because 2 = 3 2 Determinant: If matrices A and B are similar, their

determinants are equal.

Bastian Rieck A primer in persistent homology 5

slide-10
SLIDE 10

Betti numbers

A topological invariant

Informally, they count the number of holes in difgerent dimensions that occur in an object. β0 Connected components β1 Tunnels β2 Voids . . . . . . Space β0 β1 β2 Point 1 Circle 1 1 Sphere 1 1 Torus 1 2 1

Bastian Rieck A primer in persistent homology 6

slide-11
SLIDE 11

Calculating Betti numbers

The kth Betti number βk is the rank of the kth homology group Hk(X) of the topological space X. To defjne this formally, we require a notion of ‘holes’ in simplicial

  • complexes. This, in turn, requires the concepts of boundaries and

cycles.

Technically, I should write simplicial homology group every time. I am not going to do this. Instead, let us fjrst talk about simplicial complexes.

Bastian Rieck A primer in persistent homology 7

slide-12
SLIDE 12

Simplicial complexes

A family of sets K with a collection of subsets S is called an abstract simplicial complex if:

1 {v} ∈ S for all v ∈ K. 2 If σ ∈ S and τ ⊆ σ, then τ ∈ K.

The elements of a simplicial complex are called simplices. A k-simplex consists of k + 1 indices.

Bastian Rieck A primer in persistent homology 8

slide-13
SLIDE 13

Simplicial complexes

Example Valid Invalid

Bastian Rieck A primer in persistent homology 9

slide-14
SLIDE 14

Chain groups

Given a simplicial complex K, the pth chain group Cp of K contains all linear combinations of p-simplices in the complex. Coeffjcients are in Z2, hence all elements of Cp are of the form

j σj, for

σj ∈ K. The group operation is addition with Z2 coeffjcients. We need chain groups to algebraically express the concept of a boundary.

Bastian Rieck A primer in persistent homology 10

slide-15
SLIDE 15

Boundary homomorphism

Given a simplicial complex K, the pth boundary homomorphism is the homomorphism that assigns each simplex σ = {v0, . . . , vp} ∈ K to its boundary: ∂pσ =

  • i

{v0, . . . , ˆ vi, . . . , vk} In the equation above, ˆ vi indicates that the set does not contain the ith vertex. The function ∂p : Cp → Cp−1 is thus a homomorphism between the chain groups.

Bastian Rieck A primer in persistent homology 11

slide-16
SLIDE 16

Fundamental lemma & chain complex

For all p, we have ∂p−1 ◦ ∂p = 0: Boundaries do not have a boundary

  • themselves. This leads to the chain complex:

∂n+1

− − − → Cn

∂n

− → Cn−1

∂n−1

− − − → . . . ∂2 − → C1

∂1

− → C0

∂0

− → 0

Bastian Rieck A primer in persistent homology 12

slide-17
SLIDE 17

Cycle and boundary groups

Cycle group Zp = ker ∂p Boundary group Bp = im ∂p+1 We have Bp ⊆ Zp in the group-theoretical sense. In other words, every boundary is also a cycle.

Bastian Rieck A primer in persistent homology 13

slide-18
SLIDE 18

Homology groups & Betti numbers

The pth homology group Hp is a quotient group, defjned by ‘removing’ cycles that are boundaries from a higher dimension: Hp = Zp/Bp = ker ∂p/ im ∂p+1, With this defjnition, we may fjnally calculate the pth Betti number: βp = rank Hp Intuitively: Calculate all boundaries, remove the boundaries that come from higher-dimensional objects, and count what is left.

Bastian Rieck A primer in persistent homology 14

slide-19
SLIDE 19

Real-world multivariate data

Often: Unstructured point clouds n items with D attributes; n × D matrix Non-random sample from RD

Manifold hypothesis

There is an unknown d-dimensional manifold M ⊆ RD, with d ≪ D, from which our data have been sampled.

Bastian Rieck A primer in persistent homology 15

slide-20
SLIDE 20

Converting unstructured data into a simplicial complex

Rips graph Rǫ

Use a distance measure dist(·,·) such as the Euclidean distance and a threshold parameter ǫ. Connect u and v if dist(u, v) ≤ ǫ.

Bastian Rieck A primer in persistent homology 16

slide-21
SLIDE 21

How to get a simplicial complex from Rǫ?

Construct the Vietoris–Rips complex Vǫ by adding a k-simplex whenever all of its (k − 1)-dimensional faces are present.

Bastian Rieck A primer in persistent homology 17

slide-22
SLIDE 22

How to calculate Betti numbers?

Direct calculations are unstable ǫ = 0.35 ǫ = 0.53 ǫ = 0.88 ǫ = 1.05

0.5 1 1 ϵ β1

Bastian Rieck A primer in persistent homology 18

slide-23
SLIDE 23

Persistent homology

Note that the ‘correct’ Betti number of the data persists over a certain range of the threshold parameter ǫ. To formalize this, assume that simplices in the Vietoris–Rips complex are added one after the other with an associated weight. This gives rise to a fjltration, ∅ = K0 ⊆ K1 ⊆ · · · ⊆ Kn−1 ⊆ Kn = K, such that each Ki is a valid simplicial subcomplex of K. We write w(Ki) to denote the weight of Ki.

Bastian Rieck A primer in persistent homology 19

slide-24
SLIDE 24

Similar to what we have previously seen, this gives rise to a sequence of homomorphisms, fi,j

p : Hp(Ki) → Hp(Kj),

and a sequence of homology groups, i.e.

0 = Hp(K0)

f0,1

p

− − − → Hp(K1)

f1,2

p

− − − → . . .

fn−2,n−1

p

− − − − − − − → Hp(Kn−1)

fn−1,n

p

− − − − − → Hp(Kn) = Hp(K),

where p denotes the dimension of the homology groups.

Bastian Rieck A primer in persistent homology 20

slide-25
SLIDE 25

Persistent homology group

Given two indices i ≤ j, the pth persistent homology group Hi,j

p is

defjned as Hi,j

p

:= Zp (Ki) / (Bp (Kj) ∩ Zp (Ki)) , which contains all the homology classes of Ki that are still present in Kj. We may now track the difgerent homology classes through the individual homology groups.

Bastian Rieck A primer in persistent homology 21

slide-26
SLIDE 26

Tracking of homology classes

Creation in Ki : c ∈ Hp (Ki), but c / ∈ Hi−1,i

p

Destruction in Kj : fi,j−1

p

(c) / ∈ Hi−1,j−1

p

and fi,j

p (c) ∈ Hi−1,j p

The persistence of a class c that is created in Ki and destroyed in Kj is defjned as pers(c) = |w(Kj) − w(Ki)|, and measures the ‘scale’ at which a certain topological feature

  • ccurs.

Bastian Rieck A primer in persistent homology 22

slide-27
SLIDE 27

ǫ = 0.35 ǫ = 0.53 ǫ = 0.88 ǫ = 1.05

Here, the topological feature is the circle that underlies the data. It persists from ǫ = 0.53 to ǫ = 1.05, so its persistence is: pers = 1.05 − 0.53 = 0.52 In general, a high persistence indicates relevant features.

Bastian Rieck A primer in persistent homology 23

slide-28
SLIDE 28

How to represent topological information?

Persistence diagram

Given a topological feature created in Ki and destroyed in Kj, add a point with coordinates (w(Ki), w(Kj)) to a diagram: This summarizing description is always two-dimensional, regardless

  • f the dimensionality of the input data!

Bastian Rieck A primer in persistent homology 24

slide-29
SLIDE 29

Uses for persistence diagrams

Well-defjned distance measures

Persistence diagrams from the same object. Some noise has been added to the object, resulting in spurious topological features. Large-scale features remain the same, though!

Bastian Rieck A primer in persistent homology 25

slide-30
SLIDE 30

Distance measure

Second Wasserstein distance

W2(X, Y ) =

  • inf

η : X→Y

  • x∈X

x − η(x)2

Bastian Rieck A primer in persistent homology 26

slide-31
SLIDE 31

Stability

Theorem

Let f and g be two Lipschitz-continuous functions. There are constants k and C that depend on the input space and on the Lipschitz constants

  • f f and g such that

W2(X, Y ) ≤ Cf − g

1− k

2

, (1) where X and Y refer to the persistence diagrams of f and g.

Bastian Rieck A primer in persistent homology 27

slide-32
SLIDE 32

Summarizing statistics

Given a persistence diagram D, there are various summary statistics that we can calculate: ∞-norm: D∞ = max

(x,y)∈D |c − d|

p-norm: Dp =  

(x,y)∈D

(x − y)p  

1 p

Total persistence: pers(D)p =

  • (x,y)∈D

(x − y)p

Bastian Rieck A primer in persistent homology 28

slide-33
SLIDE 33

Scalar fjeld analysis

Climate research

Bastian Rieck A primer in persistent homology 29

slide-34
SLIDE 34

Scalar fjeld analysis

What are the issues?

Need to know about large-scale & small-scale difgerences in qualitative behaviour of the fjelds Similar phenomena may appear at difgerent regions in the data Time-varying aspects: What are outlying time steps with markedly difgerent properties than the remaining ones? Using a 2D simplicial complex (surface of the Earth), we can only fjnd topological features in dimensions 0, 1, and 2.

Bastian Rieck A primer in persistent homology 30

slide-35
SLIDE 35

Combined persistence diagram

1460 time steps, dimension 1

Bastian Rieck A primer in persistent homology 31

slide-36
SLIDE 36

Combined persistence diagram

Outliers

Bastian Rieck A primer in persistent homology 32

slide-37
SLIDE 37

What do the outliers represent?

Time steps in the simulation with extremal temperature phenomena at difgerent places in the world. Except by visual inspection, this cannot be detected by other methods!

Bastian Rieck A primer in persistent homology 33

slide-38
SLIDE 38

Analysis of cyclical behaviour using summary statistics

Embedding based on the Wasserstein distance

500 1,000 Outliers can easily be spotted; cyclical behaviour is indicated by points of difgerent colours that are situated next to each other

Bastian Rieck A primer in persistent homology 34

slide-39
SLIDE 39

Analysis of cyclical behaviour using summary statistics

Heatmap visualization of the sorted distance matrix

Cyclical structure is hinted at by the block structure.

Bastian Rieck A primer in persistent homology 35

slide-40
SLIDE 40

2-norm of all persistence diagrams

200 400 600 800 1,000 1,200 1,400 1,600 0.2 0.25 0.3 Detection of cyclical behaviour (seasons, micro-climate) regardless

  • f the physical location.

Bastian Rieck A primer in persistent homology 36

slide-41
SLIDE 41

2-norm vs. ∞-norm

All time steps

0.18 0.2 0.22 0.24 0.26 0.28 0.3 3 4 5 ·10−2 2-norm ∞-norm

Bastian Rieck A primer in persistent homology 37

slide-42
SLIDE 42

2-norm vs. ∞-norm

Interesting time steps: Large 2-norm, small ∞-norm

0.18 0.2 0.22 0.24 0.26 0.28 0.3 3 4 5 ·10−2 2-norm ∞-norm

Bastian Rieck A primer in persistent homology 38

slide-43
SLIDE 43

Conclusion

Take-away messages

1 Persistent homology is a new way of looking at complex data. 2 It has a rich mathematical theory and many desirable

properties (robustness, invariance).

3 Lots of interesting applications.

Interested? Drop me a line at bastian.rieck@iwr.uni-heidelberg.de!

Bastian Rieck A primer in persistent homology 39