Efficient Krylov Approximation for Manifold Learning Shinjae Yoo - - PowerPoint PPT Presentation

efficient krylov approximation for manifold learning
SMART_READER_LITE
LIVE PREVIEW

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo - - PowerPoint PPT Presentation

Efficient Krylov Approximation for Manifold Learning Shinjae Yoo Computational Science Initiative Outline Projects at BNL Big data and unsupervised learning Challenges of manifold learning in Big data Diverse Power Iteration


slide-1
SLIDE 1

Efficient Krylov Approximation for Manifold Learning

Shinjae Yoo Computational Science Initiative

slide-2
SLIDE 2

Outline

  • Projects at BNL
  • Big data and unsupervised learning
  • Challenges of manifold learning in Big data
  • Diverse Power Iteration Embedding
  • Streaming version

2

slide-3
SLIDE 3

Extreme Scale Spatio-Temporal Learning

  • Fusing theory, simulation, experiments, and ML
  • Interplay of simulation, observation and ML

3

Long Island Solar Farm

slide-4
SLIDE 4

Analysis on the Wire

  • Selectively and transparently perform generic computations on data

while in transit in the network fabric.

  • Process streaming data (e.g., imagery) for early decision-making and reduced

downstream bandwidth requirements

  • Extract data analytics, perform generic computations, use distributed computing

capabilities

  • Examples: Forecasting, deep learning, pattern recognition (e.g., cyber security,

automation)

4

slide-5
SLIDE 5

Big Data

Volume Variety Veracity Velocity

5

slide-6
SLIDE 6

Brookhaven National Laboratory

RHIC NSRL Computing Facility Interdisciplinary Energy Science Building Computational Science Initiative CFN NSLS-II Long Island Solar Farm

6

Research Facilities

Unsupervised Learning Tasks

slide-7
SLIDE 7

Manifold Learning

7

slide-8
SLIDE 8

MapReduce: Not Complete Solution in 2010

  • Task: Find cluster patterns in Doppler Radar Spectra
  • Data: 1hr≈130MB, 1yr ≈1TB, 2004~2008 ≈ 5TB
  • MapReduce (K-Means)
  • Map: Find closest centroids
  • Reduce: Update centroids
  • MapReduce (Spectral Clustering)
  • Distributed Affinity Matrix Computation : O(n2)
  • Distributed Lanczos Methods to compute EVD
  • Scalability Analysis
  • 12 cores (1 node) Spectral clustering took 1 week for one month data
  • 616 cores (77 nodes) Spectral Clustering took less than 2 hours for three

months (~300GB)

slide-9
SLIDE 9

Power-iteration-based Method

9

  • F. Lin, W. Cohen, “Power Iteration Clustering”, (ICML 2010)

              

 n i i t i i t n t n n t t t t

a a a a a v W v

2 2 2 1 1 2 2 2 1 1 1

) ( ...           

slide-10
SLIDE 10

Power-iteration-based Method

  • Limitations
  • Large number of cluster application
  • Limited use of manifolds
  • Anomaly detection, feature selection, dimensionality reduction

10

1st Eigenvector PIE-1 PIE-2 PIE-3

Huang et al. ICDM ‘14 and TKDE ‘16

slide-11
SLIDE 11

Diverse Power Iteration Embedding (DPIE)

11

f v

k t i f ' 1 : 1

min arg

  

1 ' 1 : 1 ' 1 : 1 '

f v f v

k t i k t i k  

     

Huang et al. ICDM ‘14 and TKDE ‘16

) (  ne nmT O  ) (

3

n O

slide-12
SLIDE 12

DPIE: Efficient Space Learning

12

Space Efficiency: Cosine Similarity Affinity matrix W and degree matrix D can be calculated with: where 1 is a constant vector of all 1’s, and XT denotes the transpose of X. Gaussian Similarity Approximation Using the equations listed in cosine similarity, by replacing X with R

Huang et al. ICDM ‘14 and TKDE ‘16

slide-13
SLIDE 13

Diverse Power Iteration Value (DPIV)

13

Huang et al. TKDE ‘16

slide-14
SLIDE 14

Diverse Power Iteration Value (DPIV)

14

Huang et al. TKDE ‘16

slide-15
SLIDE 15

DPIE: Choice of regression types

15

Huang et al. TKDE ‘16

slide-16
SLIDE 16

DPIE: Orthogonalization

16

Huang et al. TKDE ‘16

slide-17
SLIDE 17

Experiment

  • Evaluation Metrics
  • Clustering and Feature Selection: NMI (Normalized Mutual

Information)

  • Anomaly Detection: AUC

17

Clustering, Feature Selection Anomaly Detection

slide-18
SLIDE 18

Experiment: Clustering

18

slide-19
SLIDE 19

Experiment: Anomaly Detection

19

slide-20
SLIDE 20

Experiment: Feature Selection

20

slide-21
SLIDE 21

Summary

  • Clustering: 4000 times faster and reach 95% of the best

clustering performance.

  • Anomaly Detection: 5000 times faster and reach 103% of the

best performance.

  • Feature Selection: 4000 times faster, and has similar

performance of the best algorithms.

  • Provides DPIV and Orthogonalization for various applications

21

slide-22
SLIDE 22

Streaming Approximations

22

High Dimensional Stream

Anomaly Detection Feature Selection Clustering

Feature Selection

slide-23
SLIDE 23

Questions?

23