by Joshua Tan, for Ufora & NYU Capstone, 12/16/2014
topology and data
a library for topological data analysis and manifold learning
topology and data topological data analysis and manifold learning - - PowerPoint PPT Presentation
by Joshua Tan, for Ufora & NYU Capstone, 12/16/2014 a library for topology and data topological data analysis and manifold learning what is dimensionality reduction Given some input space X and a sample set S , dimensionality
by Joshua Tan, for Ufora & NYU Capstone, 12/16/2014
a library for topological data analysis and manifold learning
what is…
Given some input space X and a sample set S, dimensionality reduction seeks to find a lower-dimensional manifold M s.t. S ⊂ M ⊂ X.
❖ Kernel PCA projects up into the feature space, projects
❖ Isomap (i.e. MDS) embeds high-d points to low-d space
❖ Projection pursuit projects to the most “interesting”
❖ DBSCAN, which considers not only distances but some
❖ Like DBSCAN, Mapper is a clustering/dimensionality
❖ Unlike DBSCAN, Mapper is designed to be less
from Nicolau et al. 2011
For more complicated filter functions f : X \to R^2, the generated graph will be a simplicial complex.
figures borrowed from Michael Lesnick, on IAS eNews
❖ Persistent homology is a technique—read, a technical tool
❖ In some sense, the global counterpart to Mapper
❖ Take your point cloud S and turn it into a nested sequence of
simplicial complexes, a.k.a. a filtration.
computing the homology of a filtered d-dimensional simplicial complex K, assuming we evaluate the homology over a field.
❖ This returns a “persistent bar code”.
Data from Mumford et al.: 4167 images, randomly sample 5000 3 pixel by 3 pixel images from each
8,000,000 points in R^9.
contrast images (those away from the origin). Obtain points on S^7.
density as measured by δk (the k-nn distance).
❖ Ufora is a data analytics startup based in NYC ❖ For my project, I implemented both the Mapper
❖ https://dev.ufora.com/#/projects/mapper/HEAD/
❖ Carlsson, Gunnar. “Topology and data”. ❖ Zomorodian, Afra. “Computing persistent homology”. ❖ Ghrist, Robert. “Barcodes: the persistent homology of
data”.
❖ Singh, Gurjeet. “Topological methods for the analysis of
high dimensional data sets and 3D object recognition”.
❖ Mullner, Daniel. Python Mapper at danifold.net/mapper ❖ Blum, Avrim. “Thoughts on clustering”.