Software for TDA ACM-BCB Workshop on TDA October 2, 2016 by - - PowerPoint PPT Presentation

software for tda
SMART_READER_LITE
LIVE PREVIEW

Software for TDA ACM-BCB Workshop on TDA October 2, 2016 by - - PowerPoint PPT Presentation

Open Source Software for TDA ACM-BCB Workshop on TDA October 2, 2016 by Svetlana Lockwood Topological Data Analysis 1. Persistence-Way Topological analysis using persistent homology Finds topological invariants in data (# of


slide-1
SLIDE 1

Open Source Software for TDA

ACM-BCB Workshop on TDA October 2, 2016

by Svetlana Lockwood

slide-2
SLIDE 2

Topological Data Analysis

  • 1. Persistence-Way
  • Topological analysis using

persistent homology

  • Finds topological

invariants in data (# of connected components, enclosed voids, etc.)

𝛾0 = 1 𝛾1 = 0 𝛾2 = 1 𝛾0 = 1 𝛾1 = 2 𝛾2 = 1

slide-3
SLIDE 3

Topological Data Analysis

  • 1. Persistence-Way
  • Topological analysis using

persistent homology

  • Finds topological

invariants in data (# of connected components, enclosed voids, etc.)

  • 2. Mapper-Way
  • Apply a filter function to

project data onto a lower dimensional space

  • Performs partial

clustering in the level sets

𝛾0 = 1 𝛾1 = 0 𝛾2 = 1 𝛾0 = 1 𝛾1 = 2 𝛾2 = 1

slide-4
SLIDE 4

TDA: the Persistence-Way (# 1)

  • A number of free software has appeared recently
  • R package – “TDA”
  • A number of benefits:
  • Familiar R environment
  • Implements 2 types of representation (barcodes &

birth-death)

  • R interface to efficient C++ libraries of GUDHI,

Dionysus and PHAT

slide-5
SLIDE 5

TDA: the Persistence-Way (# 1)

  • TDA package for R is developed by
  • Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, Clement Maria,

Vincent Rouvreau

  • Some of examples from:
  • Fasy, Brittany Terese, Jisu Kim, Fabrizio Lecci, and

Clément Maria. "Introduction to the R package TDA." arXiv preprint arXiv:1411.1830 (2014).

  • Kim, Jisu. "Tutorial on the R package TDA."
slide-6
SLIDE 6

TDA: the Persistence-Way (# 1)

  • Goal: to discover underlying shape of data
slide-7
SLIDE 7

TDA: the Persistence-Way (# 1)

  • Goal: to discover underlying shape of data

Data

Ghrist, R., 2008. Barcodes: the persistent topology of data.

slide-8
SLIDE 8

TDA: the Persistence-Way (# 1)

  • Goal: to discover underlying shape of data

Data Topological Features

Ghrist, R., 2008. Barcodes: the persistent topology of data.

slide-9
SLIDE 9

TDA: the Persistence-Way (# 1)

  • Goal: to discover underlying shape of data

Data Topological Features

Ghrist, R., 2008. Barcodes: the persistent topology of data.

  • (switch to R)
slide-10
SLIDE 10

Plasmids Data

(switch to R)

  • Plasmids are mobile elements
  • Exchange genetic material
  • 831 plasmids (see table)
  • Original data: 831 plasmids by

81898 features

  • Computed pairwise genetic

distance  831 x 831 matrix

  • Want to see if there is any

“interesting” structure

Pictures adapted from http://www.scienceprofonline.com

Subgroup Count

  • 1. Alpha

159

  • 2. Beta

85

  • 3. Gamma

519

  • 4. Delta/epsilon

68 Total plasmids 831

slide-11
SLIDE 11

Plasmids Data

351 471 292 570

slide-12
SLIDE 12

Plasmids Data

351 471 292 570

slide-13
SLIDE 13

Plasmids Data

351 471 292 570

slide-14
SLIDE 14

Plasmids Data

351 471 292 570

slide-15
SLIDE 15

Plasmids Data

351 471 292 570

slide-16
SLIDE 16

Other open source software is available for computing persistent homology

Software Installation Complex Boundary matrix Barcodes Visualization Data Set Size Ease of Use JavaPlex      small easy Perseus      small easy Dionysus --    -- medium medium DIPHA

  • - 

   large hard GUDHI

  • - 

  -- large hard

arxiv 2015, N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, H. A. Harrington

Other Software For Persistent Homology

Interface to Matlab/Octave

slide-17
SLIDE 17

TDA: the Mapper-Way (# 2)

slide-18
SLIDE 18

TDA: the Mapper-Way (# 2)

  • Apply a filter function to project data
  • nto a lower dimensional space
slide-19
SLIDE 19

TDA: the Mapper-Way (# 2)

  • Apply a filter function to project data
  • nto a lower dimensional space
  • Performs partial clustering in the level

sets using standard clustering algorithms to subsets of the original data

slide-20
SLIDE 20

TDA: the Mapper-Way (# 2)

  • Apply a filter function to project data
  • nto a lower dimensional space
  • Performs partial clustering in the level

sets using standard clustering algorithms to subsets of the original data

  • Goal: to understand the interaction of

the partial clusters formed in this way with each other

slide-21
SLIDE 21

TDA: the Mapper-Way (# 2)

  • Apply a filter function to project data
  • nto a lower dimensional space
  • Performs partial clustering in the level

sets using standard clustering algorithms to subsets of the original data

  • Goal: to understand the interaction of

the partial clusters formed in this way with each other

  • A few open source software exists
  • However all have some limitations
slide-22
SLIDE 22

TDA: the Mapper-Way (# 2)

  • I’ll present Python-based version

developed by MLWave & examples from https://github.com/MLWave/kepler- mapper

slide-23
SLIDE 23

TDA: the Mapper-Way (# 2)

  • I’ll present Python-based version

developed by MLWave & examples from https://github.com/MLWave/kepler- mapper

  • Pros:
  • Simple programming interface
  • Makes use of existing python ML

libraries

  • Nice visualizations
  • Cons:
  • Limited coloring
  • Not completely automated
slide-24
SLIDE 24

Python Mappers: Prerequisites

  • I highly recommend installing Anaconda
  • Saves a lot of troubles
  • Comes with SciPy, NumPy, scikit-learn
  • Includes Python IDE and package

manager (pip)

  • Copy km.py from MLWave into Anaconda

Lib folder

slide-25
SLIDE 25

Intro Mapper Example: MNIST digits

Intro example from MLWave

  • The MNIST database of handwritten digits
  • Thousands of digits
slide-26
SLIDE 26

Intro Mapper Example: MNIST digits

(switch to python) Intro example from MLWave

  • The MNIST database of handwritten digits
  • Thousands of digits
  • Each digit is represented by

8x8 pixel image

  • Goal: cluster handwritten

digits according to their value

slide-27
SLIDE 27

Plasmids Network

Overlap – 10%

slide-28
SLIDE 28

Plasmids Network

Overlap – 30%

slide-29
SLIDE 29

Plasmids Network

Overlap – 50%

slide-30
SLIDE 30

Plasmids Network

Overlap – 70%

slide-31
SLIDE 31

Plasmids Network

Overlap – 90%

slide-32
SLIDE 32

Other Mapper Software

  • Mapper by Daniel Müllner
  • Installation and the list of dependencies
  • http://danifold.net/mapper/installation/
  • Website also contains Mapper

documentation

  • Nice GUI (show)
  • More complex
slide-33
SLIDE 33

Other Mapper Software

  • R package “TDAmapper”
  • A walkthrough and a tutorial by Frederic Chazal and

Bertrand Michel at

  • http://www.lsta.upmc.fr/michelb/Enseignements/TDA

/Mapper_solutions.html

  • Familiar R environment
  • Visualizations are somewhat limited (show)
slide-34
SLIDE 34

References

1.

Fasy, Brittany Terese, Jisu Kim, Fabrizio Lecci, and Clément Maria. "Introduction to the R package TDA." arXiv preprint arXiv:1411.1830 (2014).

2.

Kim, Jisu. "Tutorial on the R package TDA.“

3.

Daniel Muller’s Mapper http://danifold.net/mapper/installation/

4.

TDAmapper in R http://www.lsta.upmc.fr/michelb/Enseignements/TDA/Mapper_ solutions.html

5.

Python Mapper by MLWave https://github.com/MLWave/kepler- mapper

6.

Ghrist, R., 2008. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1), pp.61-75.

slide-35
SLIDE 35

Thank You! Questions?