Statistical topological data analysis using persistence landscapes - - PowerPoint PPT Presentation

statistical topological data analysis using persistence
SMART_READER_LITE
LIVE PREVIEW

Statistical topological data analysis using persistence landscapes - - PowerPoint PPT Presentation

Application Theory Analysis Statistical topological data analysis using persistence landscapes applied to brain arteries CANSSISAMSI Workshop: Geometric Topological and Graphical Model Methods in Statistics Peter Bubenik Department of


slide-1
SLIDE 1

1/20 Application Theory Analysis

Statistical topological data analysis using persistence landscapes applied to brain arteries

CANSSI–SAMSI Workshop: Geometric Topological and Graphical Model Methods in Statistics

Peter Bubenik

Department of Mathematics Cleveland State University p.bubenik@csuohio.edu http://academic.csuohio.edu/bubenik_p/

May 23, 2014 funded by AFOSR

Peter Bubenik Persistence landscapes

slide-2
SLIDE 2

2/20 Application Theory Analysis

Statistical topological data analysis

The plan: Data Geometric

  • bject

Topological summary Statistical analysis

Peter Bubenik Persistence landscapes

slide-3
SLIDE 3

3/20 Application Theory Analysis Brain arteries Geometry Topology

Brain arteries

Joint work with Ezra Miller (Duke/SAMSI), J.S. Marron (UNC-CH), Paul Bendich (Duke) and Sean Skwerer (UNC-CH).

Peter Bubenik Persistence landscapes

slide-4
SLIDE 4

3/20 Application Theory Analysis Brain arteries Geometry Topology

Brain arteries

Goal: Analyze the shape of brain arteries in order to understand normal changes with respect to age detect and locate pathology (tumors) predict stroke risk

Peter Bubenik Persistence landscapes

slide-5
SLIDE 5

4/20 Application Theory Analysis Brain arteries Geometry Topology

The data

Bullitt and Aylward (2002) MRA → Tubes

Peter Bubenik Persistence landscapes

slide-6
SLIDE 6

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-7
SLIDE 7

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-8
SLIDE 8

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-9
SLIDE 9

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-10
SLIDE 10

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-11
SLIDE 11

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-12
SLIDE 12

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-13
SLIDE 13

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-14
SLIDE 14

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-15
SLIDE 15

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-16
SLIDE 16

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-17
SLIDE 17

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-18
SLIDE 18

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-19
SLIDE 19

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-20
SLIDE 20

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-21
SLIDE 21

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-22
SLIDE 22

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-23
SLIDE 23

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-24
SLIDE 24

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-25
SLIDE 25

5/20 Application Theory Analysis Brain arteries Geometry Topology

Filling the arteries – increasing sublevel sets

Peter Bubenik Persistence landscapes

slide-26
SLIDE 26

6/20 Application Theory Analysis Brain arteries Geometry Topology

Mathematical viewpoint

Let X be a graph representing the brain arteries of one subject: vertices with (x, y, z, r) coordinates edges connecting adjacent vertices

Peter Bubenik Persistence landscapes

slide-27
SLIDE 27

6/20 Application Theory Analysis Brain arteries Geometry Topology

Mathematical viewpoint

Let X be a graph representing the brain arteries of one subject: vertices with (x, y, z, r) coordinates edges connecting adjacent vertices Let Xt denotes the full subgraph on the vertices with z coordinate at most t. ∅ = X0 ⊆ X1 ⊆ X2 ⊆ · · · ⊆ XN = X Take homology in degree 0. H0(X0) → H0(X1) → H0(X2) → · · · → H0(HN)

Peter Bubenik Persistence landscapes

slide-28
SLIDE 28

7/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

More general setup

For each t, have a simplicial complex Xt a vector space H(Xt) For t ≤ t′, have an inclusion Xt ⊆ Xt′ a linear map H(Xt) → H(Xt′) Persistent homology is the image of this map. This set of vector spaces and linear maps is called a persistence module. We want a summary of the persistence module that is amenable to statistical analysis.

Peter Bubenik Persistence landscapes

slide-29
SLIDE 29

8/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Persistence landscape

Recall that the persistence module consisted of linear maps H(Xt) → H(Xt′), for t ≤ t′. For k = 1, 2, 3, . . ., define λk : R → R by λk(t) = max( h | rank(H(Xt−h) → H(Xt+h) ≥ k ) We can combine these to get one function λ : N × R → R, where λ(k, t) = λk(t).

Peter Bubenik Persistence landscapes

slide-30
SLIDE 30

9/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Persistence landscape examples

Peter Bubenik Persistence landscapes

slide-31
SLIDE 31

9/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Persistence landscape examples

Peter Bubenik Persistence landscapes

slide-32
SLIDE 32

9/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Persistence landscape examples

Peter Bubenik Persistence landscapes

slide-33
SLIDE 33

9/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Persistence landscape examples

Peter Bubenik Persistence landscapes

slide-34
SLIDE 34

9/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Persistence landscape examples

Peter Bubenik Persistence landscapes

slide-35
SLIDE 35

10/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Mean landscapes

Persistence landscapes, λ(1), . . . , λ(n), have mean, λ = 1 n

n

  • i=1

λ(i). That is, λk(t) = 1 n

n

  • i=1

λ(i)

k (t)

Peter Bubenik Persistence landscapes

slide-36
SLIDE 36

11/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Mean landscape for brain arteries

10 20 30 40 50 60 70 80 90 100 1020304050 2 4 6 8 10 12 Peter Bubenik Persistence landscapes

slide-37
SLIDE 37

12/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Summary space

Let 1 ≤ p < ∞. Then λp =

  • k
  • λkp

1

p

. We assume λ := λp < ∞. That is, λ ∈ Lp(N × R). So λ is a random variable with values in a Banach space.

Peter Bubenik Persistence landscapes

slide-38
SLIDE 38

13/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Asymptotics

λ ∈ Lp(N × R), λ is a real random variable. If Eλ < ∞ then there exists E(λ) ∈ Lp(N × R) such that E(f (λ)) = f (E(λ)) for all continuous linear functionals f . Theorem (Strong Law of Large Numbers (SLLN)) λ

(n) → E(λ) almost surely if and only if Eλ < ∞.

Theorem (Central Limit Theorem (CLT)) Assume p ≥ 2. If Eλ < ∞ and E(λ2) < ∞ then √n[λ

(n) − E(λ)] converges weakly to a Gaussian random variable

with the same covariance structure as λ.

Peter Bubenik Persistence landscapes

slide-39
SLIDE 39

14/20 Application Theory Analysis Setup Persistence landscape Mean Banach space

Weighted norms

Recall that λp =

  • k
  • λkp

1

p

. Fix i ≤ j. Define λp,i,j =

  • j
  • k=i
  • λkp

1

p

. The previous SLLN and CLT also apply to this weighted norm.

Peter Bubenik Persistence landscapes

slide-40
SLIDE 40

15/20 Application Theory Analysis Correlation with age PCA

Correlation with age

Pearson’s correlation coefficient of age with statistics derived from the brain arteries Previous study without topology: Dan Shen et al (2014) r = 0.25 Using persistence landscape: topological statistic r λ1 0.5077 λ1,2,57 0.5214 λ1,5,5 0.5582

Peter Bubenik Persistence landscapes

slide-41
SLIDE 41

16/20 Application Theory Analysis Correlation with age PCA

Correlation of age with λ1,i,j

5 10 15 20 25 30 35 40 45 50 55 5 10 15 20 25 30 35 40 45 50 55 0.1 0.2 0.3 0.4 0.5 Peter Bubenik Persistence landscapes

slide-42
SLIDE 42

17/20 Application Theory Analysis Correlation with age PCA

Principal Component Analysis

10 20 30 40 50 60 70 80 90 100 1020304050 −0.02 0.02 0.04 0.06 0.08 0.1 0.12

Peter Bubenik Persistence landscapes

slide-43
SLIDE 43

17/20 Application Theory Analysis Correlation with age PCA

Principal Component Analysis

10 20 30 40 50 60 70 80 90 100 1020304050 −0.12 −0.1 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08

Peter Bubenik Persistence landscapes

slide-44
SLIDE 44

17/20 Application Theory Analysis Correlation with age PCA

Principal Component Analysis

10 20 30 40 50 60 70 80 90 100 1020304050 −0.15 −0.1 −0.05 0.05 0.1

Peter Bubenik Persistence landscapes

slide-45
SLIDE 45

17/20 Application Theory Analysis Correlation with age PCA

Principal Component Analysis

10 20 30 40 50 60 70 80 90 100 1020304050 −0.1 −0.05 0.05 0.1

Peter Bubenik Persistence landscapes

slide-46
SLIDE 46

17/20 Application Theory Analysis Correlation with age PCA

Principal Component Analysis

10 20 30 40 50 60 70 80 90 100 1020304050 −0.08 −0.06 −0.04 −0.02 0.02 0.04 0.06 0.08 0.1 0.12

Peter Bubenik Persistence landscapes

slide-47
SLIDE 47

18/20 Application Theory Analysis Correlation with age PCA

Correlation with age

Pearson’s correlation coefficient of age with statistics derived from the brain arteries Previous study without topology: Dan Shen et al (2014) r = 0.25 Values of r using statistics derived from persistence landscape: landscapes used 1-norm first princ comp λ1, . . . , λ57 0.5077 0.5216 λ2, . . . , λ57 0.5214 0.5666 λ5, . . . , λ5 0.5582 0.6000

Peter Bubenik Persistence landscapes

slide-48
SLIDE 48

19/20 Application Theory Analysis Correlation with age PCA

Correlation of age with PCA1 on weighted norms

5 10 15 20 25 30 35 40 45 50 55 5 10 15 20 25 30 35 40 45 50 55 0.1 0.2 0.3 0.4 0.5

Peter Bubenik Persistence landscapes

slide-49
SLIDE 49

20/20 Application Theory Analysis Correlation with age PCA

Summary

Topology promising tool for analyzing data Persistence landscapes easy to combine with standard statistical techniques Looking for collaborators

Peter Bubenik Persistence landscapes

slide-50
SLIDE 50

20/20 Application Theory Analysis Correlation with age PCA

Summary

Topology promising tool for analyzing data Persistence landscapes easy to combine with standard statistical techniques Looking for collaborators Thank you!

Peter Bubenik Persistence landscapes