SLIDE 1
Tensor Methods: A New Paradigm for Probabilistic Models and Feature - - PowerPoint PPT Presentation
Tensor Methods: A New Paradigm for Probabilistic Models and Feature - - PowerPoint PPT Presentation
Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Data vs. Information Data vs. Information Missing observations, gross corruptions,
SLIDE 2
SLIDE 3
Data vs. Information
SLIDE 4
Data vs. Information
SLIDE 5
Data vs. Information
Missing observations, gross corruptions, outliers.
SLIDE 6
Data vs. Information
Missing observations, gross corruptions, outliers. Learning useful information is finding needle in a haystack!
SLIDE 7
Matrices and Tensors as Data Structures
Multi-modal and multi-relational data. Matrices: pairwise relations. Tensors: higher order relations.
Multi-modal data figure from Lise Getoor slides.
SLIDE 8
Spectral Decomposition of Tensors
M2 =
i
λiui ⊗ vi
= + ....
Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
SLIDE 9
Spectral Decomposition of Tensors
M2 =
i
λiui ⊗ vi
= + ....
Matrix M2 λ1u1 ⊗ v1 λ2u2 ⊗ v2
M3 =
i
λiui ⊗ vi ⊗ wi
= + ....
Tensor M3 λ1u1 ⊗ v1 ⊗ w1 λ2u2 ⊗ v2 ⊗ w2
We have developed efficient methods to solve tensor decomposition.
SLIDE 10
Strengths of Tensor Methods
Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems.
SLIDE 11
Strengths of Tensor Methods
Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems.
10
2
10
3
10
−1
10 10
1
10
2
10
3
10
4
Number of communities k Running time(secs)
MATLAB Tensor Toolbox(CPU) CULA Standard Interface(GPU) CULA Device Interface(GPU) Eigen Sparse(CPU)
SLIDE 12
Outline
1
Introduction
2
Learning Probabilistic Models
3
Experiments
4
Feature Learning with Tensor Methods
5
Conclusion
SLIDE 13
Latent variable models
Incorporate hidden or latent variables. Information structures: Relationships between latent variables and
- bserved data.
SLIDE 14
Latent variable models
Incorporate hidden or latent variables. Information structures: Relationships between latent variables and
- bserved data.
Basic Approach: mixtures/clusters
Hidden variable is categorical.
SLIDE 15
Latent variable models
Incorporate hidden or latent variables. Information structures: Relationships between latent variables and
- bserved data.
Basic Approach: mixtures/clusters
Hidden variable is categorical.
Advanced: Probabilistic models
Hidden variables have more general distributions. Can model mixed membership/hierarchical groups. x1 x2 x3 x4 x5 h1 h2 h3
SLIDE 16
Challenges in Learning LVMs
Computational Challenges
Maximum likelihood: non-convex optimization. NP-hard. Practice: Local search approaches such as gradient descent, EM, Variational Bayes have no consistency guarantees. Can get stuck in bad local optima. Poor convergence rates and hard to parallelize.
Tensor methods yield guaranteed learning for LVMs
SLIDE 17
Unsupervised Learning of LVMs
GMM HMM
h1 h2 h3 x1 x2 x3
ICA
h1 h2 hk x1 x2 xd
Multiview and Topic Models
SLIDE 18
Overall Framework for Unsupervised Learning
= +
....
Unlabeled Data Probabilistic admixture models Tensor Method Inference
SLIDE 19
Outline
1
Introduction
2
Learning Probabilistic Models
3
Experiments
4
Feature Learning with Tensor Methods
5
Conclusion
SLIDE 20
Demo for Learning Gaussian Mixtures
SLIDE 21
NYTimes Demo
SLIDE 22
Experimental Results on Yelp and DBLP
Lowest error business categories & largest weight businesses
Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31
SLIDE 23
Experimental Results on Yelp and DBLP
Lowest error business categories & largest weight businesses
Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31
Top-5 bridging nodes (businesses)
Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe
SLIDE 24
Experimental Results on Yelp and DBLP
Lowest error business categories & largest weight businesses
Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4.0 36 2 Gluten Free P.F. Chang’s China Bistro 3.5 55 3 Hobby Shops Make Meaning 4.5 14 4 Mass Media KJZZ 91.5FM 4.0 13 5 Yoga Sutra Midtown 4.5 31
Top-5 bridging nodes (businesses)
Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe
Error (E) and Recovery ratio (R) Dataset ˆ k Method Running Time E R DBLP sub(n=1e5) 500
- urs
10,157 0.139 89% DBLP sub(n=1e5) 500 variational 558,723 16.38 99% DBLP(n=1e6) 100
- urs
5407 0.105 95%
SLIDE 25
Discovering Gene Profiles of Neuronal Cell Types
Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.
SLIDE 26
Discovering Gene Profiles of Neuronal Cell Types
Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.
SLIDE 27
Hierarchical Tensors for Healthcare Analytics
= + .... = + .... = + .... = + ....
SLIDE 28
Hierarchical Tensors for Healthcare Analytics
= + .... = + .... = + .... = + ....
CMS dataset: 1.6 million patients, 15.8 million events. Mining disease inferences from patient records.
SLIDE 29
Outline
1
Introduction
2
Learning Probabilistic Models
3
Experiments
4
Feature Learning with Tensor Methods
5
Conclusion
SLIDE 30
Feature Learning For Efficient Classification
Find good transformations of input for improved classification
Figures used attributed to Fei-Fei Li, Rob Fergus, Antonio Torralba, et al.
SLIDE 31
Tensor Methods for Training Neural Networks
First order score function m-th order score function
Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data.
Algorithm for Training Neural Networks
Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.
SLIDE 32
Tensor Methods for Training Neural Networks
Second order score function m-th order score function
Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data.
Algorithm for Training Neural Networks
Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.
SLIDE 33
Tensor Methods for Training Neural Networks
Third order score function m-th order score function
Input pdf is p(x) Sm(x) := (−1)m∇(m)p(x) p(x) Capture local variations in data.
Algorithm for Training Neural Networks
Estimate score functions using autoencoder. Decompose tensor E[y ⊗ Sm(x)] to obtain weights, for m ≥ 3. Recursively estimate score function of autoencoder and repeat.
SLIDE 34
Demo: Training Neural Networks
SLIDE 35
Combining Probabilistic Models with Deep Learning
Multi-object Detection in Computer vision
Deep learning is able to extract good features, but not context. Probabilistic models capture contextual information. Hierarchical models + pre-trained deep learning features. State-of-art results on Microsoft COCO.
SLIDE 36
Outline
1
Introduction
2
Learning Probabilistic Models
3
Experiments
4
Feature Learning with Tensor Methods
5
Conclusion
SLIDE 37
Conclusion: Tensor Methods for Learning
Tensor Decomposition
Efficient sample and computational complexities Better performance compared to EM, Variational Bayes etc.
In practice
Scalable and embarrassingly parallel: handle large datasets. Efficient performance: perplexity or ground truth validation.
SLIDE 38