Outline Why Topology? Simplicial Complex Persistent Homology
An Introduction to Topological Data Analysis
Yuan Yao
Department of Mathematics HKUST
An Introduction to Topological Data Analysis Yuan Yao Department of - - PowerPoint PPT Presentation
Outline Why Topology? Simplicial Complex Persistent Homology An Introduction to Topological Data Analysis Yuan Yao Department of Mathematics HKUST April 22, 2020 1 Outline Why Topology? Simplicial Complex Persistent Homology 1 Why
Outline Why Topology? Simplicial Complex Persistent Homology
Department of Mathematics HKUST
Outline Why Topology? Simplicial Complex Persistent Homology
1 Why Topological Methods?
2 Simplicial Complex for Data Representation
3 Persistent Homology
Outline Why Topology? Simplicial Complex Persistent Homology
1 Why Topological Methods?
2 Simplicial Complex for Data Representation
3 Persistent Homology
Outline Why Topology? Simplicial Complex Persistent Homology
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Average Linkage Complete Linkage Single Linkage
Learning with Applications in R.
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
1 Start with each data point as its own cluster; 2 Repeatedly merge two “closest” clusters, where notions of
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
100% 9 9 % 100% 9 8 % 2 3 % 4 4 % 3% 9 8 % 100% 100%
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Pluripotent cells Neural precursors Progenitors N e u r
s log2 (1+TPM)
4.4 3.9 2.3 0.0 0.0 0.0 0.0 3.0 Group 1a genes Group 1b genes Group 2 genes Group 3 genes
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology Methods for Visualizing a Data Geometry
Outline Why Topology? Simplicial Complex Persistent Homology
1 Why Topological Methods?
2 Simplicial Complex for Data Representation
3 Persistent Homology
Outline Why Topology? Simplicial Complex Persistent Homology Simplicial Complex
Outline Why Topology? Simplicial Complex Persistent Homology Simplicial Complex
Outline Why Topology? Simplicial Complex Persistent Homology Simplicial Complex
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
h
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
a5 a1 b1 a2 a3 b3 a4 b4 b5 b2 h
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
Outline Why Topology? Simplicial Complex Persistent Homology Nerve, Reeb Graph, and Mapper
x′∈X d(x, x′)p
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
104:712-6]
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
K = exp(−d11) exp(−d12) exp(−d21) exp(−d22) ... exp(−dnn) row sum clustering graph
1 Kernel density estimation h(x) = i K(x, xi) with Hamming
2 Rank the data by h and divide the data into n overlapped sets 3 Single-linkage clustering on each level sets 4 Graphical representation
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
100% 9 9 % 9 7 % 9 4 % 8 1 % 100% 100% 100%
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
100% 9 9 % 100% 9 8 % 2 3 % 4 4 % 3% 9 8 % 100% 100%
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Day 2 Day 6 Day 3 Day 4 Day 5
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Pluripotent cells Neural precursors Progenitors N e u r
s log2 (1+TPM)
4.4 3.9 2.3 0.0 0.0 0.0 0.0 3.0 Group 1a genes Group 1b genes Group 2 genes Group 3 genes
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
GBM9-R1 GBM9-R2 GBM9-2 GBM9-1 Germline EGFR amp EGFR (A289T, G598V, vIII) EGFR:SEPT14 Fusion ARID2 NF1 (S1078, L2593) 103 63 3 96 76 93 215 PIK3CA F1016C CDKN2A del PTEN del
reappeared on the left side. Genomic analysis shows that the initial tumors were seeded by two independent, but related clones. The recurrent tumor was genetically similar to the left one. Jin-Ku Lee et al. Nature Genetics 49.4 (2017): 594-599.
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Right
TPM (log scale) average TPM
Mitotic markers Left EGFR Recurrence
Outline Why Topology? Simplicial Complex Persistent Homology Applications of Mapper Graph
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
ǫ = {UI ⊆ V : ∃x ∈ X, ∀α ∈ I, d(x, tα) ≤ d(x, V ) + ǫ}.
ǫ = {UI ⊆ V : ∃x ∈ X, ∀α ∈ I, d(x, tα) ≤ d(x, V−I) + ǫ}.
ǫ ⊆ W ∗ ǫ′ if ǫ ≤ ǫ′
Outline Why Topology? Simplicial Complex Persistent Homology ˇ Cech, Vietoris-Rips, and Witness Complexes
Outline Why Topology? Simplicial Complex Persistent Homology
1 Why Topological Methods?
2 Simplicial Complex for Data Representation
3 Persistent Homology
Outline Why Topology? Simplicial Complex Persistent Homology Betti Numbers
Outline Why Topology? Simplicial Complex Persistent Homology Betti Numbers
Outline Why Topology? Simplicial Complex Persistent Homology Betti Numbers
Outline Why Topology? Simplicial Complex Persistent Homology Betti Numbers
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Betti Number at Different Scales
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Figure 5.16 Left: Reassortments in viruses lead to incompatibility between trees. Reticulate network representing the reassortment of three parental strains. The reticulate network results from merging the three parental phylogenetic trees. Source: [100]. Right: Indeed, incompatibility between tree topologies inferred from different genes is a criterion used for the identification of events of genomic material
Chan, Gunnar Carlsson, and Raúl Rabadán, ‘Topology of viral evolution’, Proceedings of the National Academy of Sciences 110.46 (2013): 18566–18571. Reprinted with Permission from Proceedings of the National Academy of Sciences.
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
hemagglutinin neuraminidase PB2 PB1 PA HA NP NA M NS matrix ion channel
Figure 5.14 Influenza A is an antisense single-stranded RNA virus whose genome is composed of eight different segments containing one or two genes per segment. This virus contains an envelope borrowed from the infected cell that expressed two viral proteins, hemagglutinin and neuraminidase. When circulating viruses co-infect the same cell, new viruses can be created that contain segments from both parents. This phenomenon, called reassortment, can lead to dramatic adapta- tions to novel environments, and it is thought to be one of the contributing factors to human influenza pandemics.
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
A/California/05/2009 A/Mexico/4108/2009 A/Israel/277/2009 A/Auckland/4/2009 A/swine/Indiana/P12439/00 A/swine/North Carolina/43110/2003 A/swine/Iowa/3/1985 A/swine/Ratchaburi/NIAH550/2003 A/New Jersey/11/1976 A/swine/Tennessee/15/1976 A/swine/Wisconsin/30954/1976 A/swine/Hokkaido/2/1981 A/duck/NZL/160/1976 A/duck/Alberta/35/76 A/mallard/Alberta/42/1977 A/pintail duck/ALB/238/1979 A/pintail duck/Alberta/210/2002 A/duck/Miyagi/66/1977 A/duck/Bavaria/1/1977 A/swine/Belgium/WVL1/1979 A/swine/Belgium/1/83 A/swine/France/WVL3/1984 A/swine/Iowa/15/1930 A/South Carolina/1/1918 A/Alaska/1935 A/Wilson-Smith/1933 A/swine/Bakum/1832/2000 A/Roma/1949 A/Leningrad/1954/1 A/Memphis/10/1978 A/Hong Kong/117/77 A/Arizona/14/1978 A/Chile/1/1983 A/Memphis/51/1983 A/Switzerland/5389/95 A/Denmark/20/2001 A/New York/241/2001 A/South Canterbury/31/2009 A/New York/63/2009 A/South Australia/58/2005 A/Denmark/50/2006 A/Wellington/12/2005 A/California/02/2007 A/Mississippi/UR06-0242/2007 A/Kansas/UR06-0283/2007 A A/swine/North Carolina/43110/2003
2009 Human H1N1
Eurasian swine Classic swine H1N1 Human H3N2 Avian North American swine H3N2 North American swine H1N2 1990 B 2009 2000
virus was reconstructed. It was related to viruses that circulated in pigs potentially since the 1918 H1N1 pandemic. These viruses had diverged since that date into various independent strains, infecting humans and swine. Major reassortments between strains led to new sets of segments from different sources. In 1998, triple reassortant viruses were found infecting pigs in North America. These triple reassortant viruses contained segments that were circulating in swine, humans and birds. Further reassortment of these viruses with other swine viruses created the ancestors of this pandemic. Until this day, it is unclear how, where or when these reassortments happened. Source: [506]. From New England Journal of Medicine, Vladimir Trifonov, Hossein Khiabanian, and Ra´ ul Rabad´ an, Geographic dependence, surveillance, and origins of the 2009 influenza A (H1N1) virus, 361.2, 115–119.
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
H7 H15 H10 H3 H4 H12 H8 H9 H1 H5 H6 H16 H13 H11 A B C Betti Number 0 H12 H9 H8 H6 H16 H13 H11 H5 H1 H4 H3 H10 H15 H17 100 200 300 400 500 600 100 200 300 400 500 600 Base Pairs
gene of influenza A, in this case hemagglutinin, the only significant homology occurs in dimen- sion zero (panel A). The barcode represents a summary of a clustering procedure (panel B), that recapitulates the known phylogenetic relation between different hemagglutinin types (panel C). Source: [100]. From Joseph Minhow Chan, Gunnar Carlsson, and Ra´ ul Rabad´ an, ‘Topology of viral evolution’, Proceedings of the National Academy of Sciences 110.46 (2013): 18566–18571.
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Figure 5.18 Influenza evolves through mutations and reassortment. When the persistent homology approach is applied to finite metric spaces derived from only one segment, up to small noise, the homology is zero dimensional suggesting a tree-like process (left). However, when different segments are put together, the structure is more complex revealing non-trivial homology at different dimensions (right). 3105 influenza whole genomes were analyzed. Data from isolates collected between 1956 to 2012; all influenza A subtypes.
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
influenza segments was measured by testing a null model of equal reassortment. Significant cosegregation was identified within PA, PB1, PB2, NP, consistent with the cooperative func- tion of the polymerase complex. Source: [100]. Right: The persistence diagram for whole-genome avian flu sequences revealed bimodal topological structure. Annotating each interval as intra- or inter-subtype clarified a genetic barrier to reassortment at intermediate scales. From Joseph Minhow Chan, Gunnar Carlsson, and Ra´ ul Rabad´ an, ‘Topology of viral evolution’, Proceedings of the National Academy of Sciences 110.46 (2013): 18566–18571.
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Outline Why Topology? Simplicial Complex Persistent Homology Applications: H1N1 Evolution, Sensor Network Coverage, Natural Image Patches
Edelsbrunner, Letscher, and Zomorodian (2002) Topological Persistence and Simplification. Ghrist, R. (2007) Barcdes: the Persistent Topology of Data. Bulletin of AMS, 45(1):61-75. Edelsbrunner, Harer (2008) Persistent Homology - a survey. Contemporary Mathematics. Carlsson, G. (2009) Topology and Data. Bulletin of AMS, 46(2):255-308. Camara et al. (2016) Topological Data Analysis Generates High-Resolution, Genome-wide Maps of Human Recombination, Cell Systems, 3(1): 83–94. Wei, Guowei, (2017) Persistent Homology Analysis of Biomolecular Data, SIAM News. Raul Rabadan and Andrew J. Blumberg (2020). Topological Data Analysis for Genomics and Evolution. Cambridge University Press.