Tensor Methods: A New Paradigm for Probabilistic Models and Feature - - PowerPoint PPT Presentation

tensor methods a new paradigm for probabilistic models
SMART_READER_LITE
LIVE PREVIEW

Tensor Methods: A New Paradigm for Probabilistic Models and Feature - - PowerPoint PPT Presentation

Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Data vs. Information Data vs. Information Missing observations, gross corruptions,


slide-1
SLIDE 1

Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning

Anima Anandkumar

U.C. Irvine

slide-2
SLIDE 2

Learning with Big Data

slide-3
SLIDE 3

Data vs. Information

slide-4
SLIDE 4

Data vs. Information

slide-5
SLIDE 5

Data vs. Information

Missing observations, gross corruptions, outliers.

slide-6
SLIDE 6

Data vs. Information

Missing observations, gross corruptions, outliers. Learning useful information is finding needle in a haystack!

slide-7
SLIDE 7

Matrices and Tensors as Data Structures

Multi-modal and multi-relational data. Matrices: pairwise relations. Tensors: higher order relations.

Multi-modal data figure from Lise Getoor slides.

slide-8
SLIDE 8

Spectral Decomposition of Tensors

M2 =

i

λiui ⊗ vi

= + ....

Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2

slide-9
SLIDE 9

Spectral Decomposition of Tensors

M2 =

i

λiui ⊗ vi

= + ....

Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2

M3 =

i

λiui ⊗ vi ⊗ wi

= + ....

Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2

We have developed efficient methods to solve tensor decomposition.

slide-10
SLIDE 10

Strengths of Tensor Methods

Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems.

slide-11
SLIDE 11

Strengths of Tensor Methods

Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems.

10

2

10

3

10

−1

10 10

1

10

2

10

3

10

4

Number of communities k Running time(secs)

MATLAB Tensor Toolbox(CPU) CULA Standard Interface(GPU) CULA Device Interface(GPU) Eigen Sparse(CPU)

slide-12
SLIDE 12

Outline

1

Introduction

2

Learning Probabilistic Models

3

Experiments

4

Feature Learning with Tensor Methods

5

Conclusion

slide-13
SLIDE 13

Latent variable models

Incorporate hidden or latent variables. Information structures: Relationships between latent variables and

  • bserved data.
slide-14
SLIDE 14

Latent variable models

Incorporate hidden or latent variables. Information structures: Relationships between latent variables and

  • bserved data.

Basic Approach: mixtures/clusters

Hidden variable is categorical.

slide-15
SLIDE 15

Latent variable models

Incorporate hidden or latent variables. Information structures: Relationships between latent variables and

  • bserved data.

Basic Approach: mixtures/clusters

Hidden variable is categorical.

Advanced: Probabilistic models

Hidden variables have more general distributions. Can model mixed membership/hierarchical groups. x1 x2 x3 x4 x5 h1 h2 h3

slide-16
SLIDE 16

Challenges in Learning LVMs

Computational Challenges

Maximum likelihood: non-convex optimization. NP-hard. Practice: Local search approaches such as gradient descent, EM, Variational Bayes have no consistency guarantees. Can get stuck in bad local optima. Poor convergence rates and hard to parallelize.

Tensor methods yield guaranteed learning for LVMs

slide-17
SLIDE 17

Unsupervised Learning of LVMs

GMM HMM

h1 h2 h3 x1 x2 x3

ICA

h1 h2 hk x1 x2 xd

Multiview and Topic Models

slide-18
SLIDE 18

Overall Framework for Unsupervised Learning

= +

....

Unlabeled Data Probabilistic admixture models Tensor Method Inference

slide-19
SLIDE 19

Outline

1

Introduction

2

Learning Probabilistic Models

3

Experiments

4

Feature Learning with Tensor Methods

5

Conclusion

slide-20
SLIDE 20

Demo for Learning Gaussian Mixtures

slide-21
SLIDE 21

NYTimes Demo

slide-22
SLIDE 22

Experimental Results on Yelp and DBLP

Lowest error business categories & largest weight businesses

Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31

slide-23
SLIDE 23

Experimental Results on Yelp and DBLP

Lowest error business categories & largest weight businesses

Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31

Top-5 bridging nodes (businesses)

Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe

slide-24
SLIDE 24

Experimental Results on Yelp and DBLP

Lowest error business categories & largest weight businesses

Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31

Top-5 bridging nodes (businesses)

Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe

Error (E) and Recovery ratio (R) Dataset ˆ k Method Running Time E R DBLP sub(n=1e5) 500

  • urs

10,157 0.139 89% DBLP sub(n=1e5) 500 variational 558,723 16.38 99% DBLP(n=1e6) 100

  • urs

5407 0.105 95%

slide-25
SLIDE 25

Discovering Gene Profiles of Neuronal Cell Types

Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.

slide-26
SLIDE 26

Discovering Gene Profiles of Neuronal Cell Types

Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.

slide-27
SLIDE 27

Hierarchical Tensors for Healthcare Analytics

= + .... = + .... = + .... = + ....

slide-28
SLIDE 28

Hierarchical Tensors for Healthcare Analytics

= + .... = + .... = + .... = + ....

CMS dataset: 1.6 million patients, 15.8 million events. Mining disease inferences from patient records.

slide-29
SLIDE 29

Outline

1

Introduction

2

Learning Probabilistic Models

3

Experiments

4

Feature Learning with Tensor Methods

5

Conclusion

slide-30
SLIDE 30

Feature Learning For Efficient Classification

Find good transformations of input for improved classification

Figures used attributed to Fei-Fei Li, Rob Fergus, Antonio Torralba, et al.

slide-31
SLIDE 31

Tensor Methods for Training Neural Networks

First order score function m-th order score function

Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data.

Algorithm for Training Neural Networks

Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.

slide-32
SLIDE 32

Tensor Methods for Training Neural Networks

Second order score function m-th order score function

Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data.

Algorithm for Training Neural Networks

Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.

slide-33
SLIDE 33

Tensor Methods for Training Neural Networks

Third order score function m-th order score function

Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data.

Algorithm for Training Neural Networks

Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.

slide-34
SLIDE 34

Demo: Training Neural Networks

slide-35
SLIDE 35

Combining Probabilistic Models with Deep Learning

Multi-object Detection in Computer vision

Deep learning is able to extract good features, but not context. Probabilistic models capture contextual information. Hierarchical models + pre-trained deep learning features. State-of-art results on Microsoft COCO.

slide-36
SLIDE 36

Outline

1

Introduction

2

Learning Probabilistic Models

3

Experiments

4

Feature Learning with Tensor Methods

5

Conclusion

slide-37
SLIDE 37

Conclusion: Tensor Methods for Learning

Tensor Decomposition

Efficient sample and computational complexities Better performance compared to EM, Variational Bayes etc.

In practice

Scalable and embarrassingly parallel: handle large datasets. Efficient performance: perplexity or ground truth validation.

slide-38
SLIDE 38

My Research Group and Resources

Furong H. Majid J. Hanie S. Niranjan U.N. Forough A. Tejaswi N. Hao L. Yang S.

ML summer school lectures available at http://newport.eecs.uci.edu/anandkumar/MLSS.html