Efficient Krylov Approximation for Manifold Learning Shinjae Yoo - PowerPoint PPT Presentation

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo Computational Science Initiative

Outline • Projects at BNL • Big data and unsupervised learning • Challenges of manifold learning in Big data • Diverse Power Iteration Embedding • Streaming version 2

Extreme Scale Spatio-Temporal Learning • Fusing theory, simulation, experiments, and ML • Interplay of simulation, observation and ML Long Island Solar Farm 3

Analysis on the Wire • Selectively and transparently perform generic computations on data while in transit in the network fabric. • Process streaming data (e.g., imagery) for early decision-making and reduced downstream bandwidth requirements • Extract data analytics, perform generic computations, use distributed computing capabilities • Examples: Forecasting, deep learning, pattern recognition (e.g., cyber security, automation) 4

Big Data Volume Velocity Variety Veracity 5

Research Facilities Brookhaven National Laboratory RHIC NSRL Computing Facility Interdisciplinary Energy Science Building Computational Science Initiative CFN NSLS-II Long Island Solar Farm Unsupervised Learning Tasks 6

Manifold Learning 7

MapReduce: Not Complete Solution in 2010 • Task : Find cluster patterns in Doppler Radar Spectra • Data : 1hr≈130MB, 1yr ≈1TB, 2004~2008 ≈ 5TB • MapReduce (K-Means) • Map: Find closest centroids • Reduce: Update centroids • MapReduce (Spectral Clustering) • Distributed Affinity Matrix Computation : O(n 2 ) • Distributed Lanczos Methods to compute EVD • Scalability Analysis • 12 cores (1 node) Spectral clustering took 1 week for one month data • 616 cores (77 nodes) Spectral Clustering took less than 2 hours for three months (~300GB)

Power-iteration-based Method    n                    t t t t t v W v a a ... a a a ( i )   t t 0 1 1 1 2 2 2 n n n 1 1 2 i i     i 2 2 9 F. Lin, W. Cohen, “Power Iteration Clustering”, (ICML 2010)

Power-iteration-based Method • Limitations 1 st Eigenvector PIE-1 PIE-2 PIE-3 • Large number of cluster application • Limited use of manifolds • Anomaly detection, feature selection, dimensionality reduction 10 Huang et al. ICDM ‘14 and TKDE ‘16

Diverse Power Iteration Embedding (DPIE)   t ' v f   '  i 1 : k 1    t ' arg min v f k  i 1 : k 1   t ' v f f  i 1 : k 1 1   3 O ( n ) O ( nmT ne ) 11 Huang et al. ICDM ‘14 and TKDE ‘16

DPIE: Efficient Space Learning Space Efficiency: Cosine Similarity Gaussian Similarity Approximation Affinity matrix W and degree matrix D can be calculated with: Using the equations listed in where 1 is a constant vector of all 1’s, cosine similarity, by replacing and X T denotes the transpose of X. X with R 12 Huang et al. ICDM ‘14 and TKDE ‘16

Diverse Power Iteration Value (DPIV) 13 Huang et al. TKDE ‘16

Diverse Power Iteration Value (DPIV) 14 Huang et al. TKDE ‘16

DPIE: Choice of regression types 15 Huang et al. TKDE ‘16

DPIE: Orthogonalization 16 Huang et al. TKDE ‘16

Experiment • Evaluation Metrics • Clustering and Feature Selection: NMI (Normalized Mutual Information) • Anomaly Detection: AUC Clustering, Feature Selection Anomaly Detection 17

Experiment: Clustering 18

Experiment: Anomaly Detection 19

Experiment: Feature Selection 20

Summary • Clustering: 4000 times faster and reach 95% of the best clustering performance. • Anomaly Detection: 5000 times faster and reach 103% of the best performance. • Feature Selection: 4000 times faster, and has similar performance of the best algorithms. • Provides DPIV and Orthogonalization for various applications 21

Streaming Approximations High Dimensional Stream Feature Selection Clustering Anomaly Detection Feature Selection 22

Questions? 23

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo - PowerPoint PPT Presentation

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo Computational Science Initiative Outline Projects at BNL Big data and unsupervised learning Challenges of manifold learning in Big data Diverse Power Iteration

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Whats so great about Krylov subspaces? David S. Watkins Department of Mathematics Washington

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Introducing Krylov eBay AI Platform - Machine Learning Made Easy Henry Saputra Technical Lead

On rational Krylov sequences Karl Meerbergen K.U. Leuven Rolling waves December 1516,

6. Approximation and fitting norm approximation least-norm problems regularized

Krylov methods for fast frequency response computations Karl Meerbergen January 8, 2006 Karl

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

Charting the Right Manifold: Manifold Mixup for Few-Shot Learning Puneet Mangla 1,2* , Mayank

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Manifold-driven spirals and rings Lia Athanassoula LAM, Marseille Lia Athanassoula Manifold

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

A manifold structure on the set of functional observers Jochen Trumpf University of W urzburg

Play with Prometheus Journey to make testing in production more reliable Giovanni Gargiulo

Pushing data into CP models using Graphical Model Learning & Solving CP 2020 CP and ML track

MARKETING TO LARGER ORGANISATIONS Presented by JE Consulting Corporate Finance Network Workshop

Mobile Communications Towards 2020 Carlos Caseiro January 2017 Evolution Mobile Networks

Barbara Chapman Stony Brook University Brookhaven National Laboratory How To Get Tied Up In

On the classification of one dimensional continua that admit expansive homeomorphisms.

OpenStack Orchestration with Heat s Tom a Sedovi c Software engineer at Red Hat ,

CF3 Summary Non-WIMP Dark Ma5er Cosmic Fron:er Summary

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo - PowerPoint PPT Presentation

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo Computational Science Initiative Outline Projects at BNL Big data and unsupervised learning Challenges of manifold learning in Big data Diverse Power Iteration

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Whats so great about Krylov subspaces? David S. Watkins Department of Mathematics Washington

Linear Manifold Clustering Robert Haralick and Rave Harpaz Outline Background The linear

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Manifold Learning: Applications in Neuroimaging Robin Wolz 23/09/2011 Overview Manifold

Introducing Krylov eBay AI Platform - Machine Learning Made Easy Henry Saputra Technical Lead

On rational Krylov sequences Karl Meerbergen K.U. Leuven Rolling waves December 1516,

6. Approximation and fitting norm approximation least-norm problems regularized

Krylov methods for fast frequency response computations Karl Meerbergen January 8, 2006 Karl

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

Charting the Right Manifold: Manifold Mixup for Few-Shot Learning Puneet Mangla 1,2* , Mayank

Manifold Regularization Lorenzo Rosasco 9.520 Class 10 March 6, 2011 L. Rosasco Manifold

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Manifold-driven spirals and rings Lia Athanassoula LAM, Marseille Lia Athanassoula Manifold

Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction Chenjie

A manifold structure on the set of functional observers Jochen Trumpf University of W urzburg

Play with Prometheus Journey to make testing in production more reliable Giovanni Gargiulo

Pushing data into CP models using Graphical Model Learning &amp; Solving CP 2020 CP and ML track

MARKETING TO LARGER ORGANISATIONS Presented by JE Consulting Corporate Finance Network Workshop

Mobile Communications Towards 2020 Carlos Caseiro January 2017 Evolution Mobile Networks

Barbara Chapman Stony Brook University Brookhaven National Laboratory How To Get Tied Up In

On the classification of one dimensional continua that admit expansive homeomorphisms.

OpenStack Orchestration with Heat s Tom a Sedovi c Software engineer at Red Hat ,

CF3 Summary Non-WIMP Dark Ma5er Cosmic Fron:er Summary

Pushing data into CP models using Graphical Model Learning & Solving CP 2020 CP and ML track