topology and data
play

topology and data topological data analysis and manifold learning - PowerPoint PPT Presentation

by Joshua Tan, for Ufora & NYU Capstone, 12/16/2014 a library for topology and data topological data analysis and manifold learning what is dimensionality reduction Given some input space X and a sample set S , dimensionality


  1. by Joshua Tan, for Ufora & NYU Capstone, 12/16/2014 a library for � topology and data topological data analysis � and manifold learning

  2. what is… dimensionality reduction Given some input space X and a sample set S , dimensionality reduction seeks to find a lower-dimensional manifold M s.t. S ⊂ M ⊂ X. � � Also known as manifold learning.

  3. examples ❖ Kernel PCA projects up into the feature space, projects down onto the components, ranks by eigenvalues � ❖ Isomap (i.e. MDS) embeds high-d points to low-d space while preserving a dissimilarity (distance) matrix � ❖ Projection pursuit projects to the most “interesting” components according to some objective function � ❖ DBSCAN , which considers not only distances but some “density-reachability” from a cluster

  4. Mapper ❖ Like DBSCAN, Mapper is a clustering/dimensionality reduction algorithm based on varying both a distance parameter s well as a “density” parameter � ❖ Unlike DBSCAN, Mapper is designed to be less dependent on the choice of parameters

  5. example: breast from Nicolau et al. 2011 cancer

  6. computing Mapper 1. generate a sample data set as a DataFrame object � 2. compute a 1-d dissimilarity matrix of distances � 3. evaluate the points using a knn-neighbors filter function � 4. define a covering of the resulting image � 5. use the pre-image of this covering to define a covering of the original data � 6. from the covering, generate a clustering of the data � 7. visualize the result as a graph � For more complicated filter functions f : X \to R^2, the generated graph will be a simplicial complex.

  7. “connecting” the dots figures borrowed from Michael Lesnick, on IAS eNews

  8. persistent homology ❖ Persistent homology is a technique—read, a technical tool — for computing the “shape” of data sets � ❖ In some sense, the global counterpart to Mapper

  9. computing persistent homology ❖ Take your point cloud S and turn it into a nested sequence of simplicial complexes, a.k.a. a filtration. � � � � ❖ Zomorodian and Carlsson (2004) specify a natural algorithm for computing the homology of a filtered d-dimensional simplicial complex K , assuming we evaluate the homology over a field . � ❖ This returns a “persistent bar code”.

  10. example: natural image statistics Data from Mumford et al.: 4167 images, randomly sample 5000 3 pixel by 3 pixel images from each image. Take the ones with highest contrast, obtain 8,000,000 points in R^9. � � Normalize w.r.t. mean intensity, project onto high- contrast images (those away from the origin). Obtain points on S^7. � � M[k,T] is the subset of M in the upper T percent of density as measured by δ k (the k-nn distance). �

  11. Ufora ❖ Ufora is a data analytics startup based in NYC � ❖ For my project, I implemented both the Mapper algorithm and a persistent homology library in their proprietary language, Fora � ❖ https://dev.ufora.com/#/projects/mapper/HEAD/ mapper

  12. future directions

  13. bibliography ❖ Carlsson, Gunnar. “Topology and data”. � ❖ Zomorodian, Afra. “Computing persistent homology”. � ❖ Ghrist, Robert. “Barcodes: the persistent homology of data”. � ❖ Singh, Gurjeet. “Topological methods for the analysis of high dimensional data sets and 3D object recognition”. � ❖ Mullner, Daniel. Python Mapper at danifold.net/mapper � ❖ Blum, Avrim. “Thoughts on clustering”.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend