Tensor Methods: A New Paradigm for Probabilistic Models and Feature - PowerPoint PPT Presentation

Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning Anima Anandkumar U.C. Irvine

Learning with Big Data

Data vs. Information

Data vs. Information Missing observations, gross corruptions, outliers.

Data vs. Information Missing observations, gross corruptions, outliers. Learning useful information is finding needle in a haystack!

Matrices and Tensors as Data Structures Multi-modal and multi-relational data. Matrices: pairwise relations. Tensors: higher order relations. Multi-modal data figure from Lise Getoor slides.

Spectral Decomposition of Tensors M 2 = � λ i u i ⊗ v i i .... = + Matrix M 2 λ 1 u 1 ⊗ v 1 λ 2 u 2 ⊗ v 2

Spectral Decomposition of Tensors M 2 = � λ i u i ⊗ v i i .... = + Matrix M 2 λ 1 u 1 ⊗ v 1 λ 2 u 2 ⊗ v 2 M 3 = � λ i u i ⊗ v i ⊗ w i i .... = + Tensor M 3 λ 1 u 1 ⊗ v 1 ⊗ w 1 λ 2 u 2 ⊗ v 2 ⊗ w 2 We have developed efficient methods to solve tensor decomposition.

Strengths of Tensor Methods Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems.

Strengths of Tensor Methods Fast and accurate, orders of magnitude faster than previous methods. Embarrassingly parallel and suited for cloud systems, e.g.Spark. Exploit optimized linear algebra libraries. Exploit parallelism of GPU systems. 4 10 3 10 Running time(secs) 2 10 1 10 MATLAB Tensor Toolbox(CPU) 0 10 CULA Standard Interface(GPU) CULA Device Interface(GPU) Eigen Sparse(CPU) −1 10 2 3 10 10 Number of communities k

Outline Introduction 1 Learning Probabilistic Models 2 Experiments 3 Feature Learning with Tensor Methods 4 Conclusion 5

Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data.

Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data. Basic Approach: mixtures/clusters Hidden variable is categorical.

Latent variable models Incorporate hidden or latent variables. Information structures: Relationships between latent variables and observed data. Basic Approach: mixtures/clusters Hidden variable is categorical. h 1 Advanced: Probabilistic models Hidden variables have more general distributions. h 2 h 3 Can model mixed membership/hierarchical groups. x 1 x 2 x 3 x 4 x 5

Challenges in Learning LVMs Computational Challenges Maximum likelihood: non-convex optimization. NP-hard. Practice: Local search approaches such as gradient descent, EM, Variational Bayes have no consistency guarantees. Can get stuck in bad local optima. Poor convergence rates and hard to parallelize. Tensor methods yield guaranteed learning for LVMs

Unsupervised Learning of LVMs GMM HMM ICA h 1 h 2 h k h 1 h 2 h 3 x 1 x 2 x d x 1 x 2 x 3 Multiview and Topic Models

Overall Framework for Unsupervised Learning = + .... Inference Probabilistic Tensor Unlabeled admixture Method Data models

Demo for Learning Gaussian Mixtures

NYTimes Demo

Experimental Results on Yelp and DBLP Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4 . 0 36 2 Gluten Free P.F. Chang’s China Bistro 3 . 5 55 3 Hobby Shops Make Meaning 4 . 5 14 4 Mass Media KJZZ 91 . 5 FM 4 . 0 13 5 Yoga Sutra Midtown 4 . 5 31

Experimental Results on Yelp and DBLP Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4 . 0 36 2 Gluten Free P.F. Chang’s China Bistro 3 . 5 55 3 Hobby Shops Make Meaning 4 . 5 14 4 Mass Media KJZZ 91 . 5 FM 4 . 0 13 5 Yoga Sutra Midtown 4 . 5 31 Top- 5 bridging nodes (businesses) Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe

Experimental Results on Yelp and DBLP Lowest error business categories & largest weight businesses Rank Category Business Stars Review Counts 1 Latin American Salvadoreno Restaurant 4 . 0 36 2 Gluten Free P.F. Chang’s China Bistro 3 . 5 55 3 Hobby Shops Make Meaning 4 . 5 14 4 Mass Media KJZZ 91 . 5 FM 4 . 0 13 5 Yoga Sutra Midtown 4 . 5 31 Top- 5 bridging nodes (businesses) Business Categories Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe Pizzeria Bianco Restaurants, Pizza, Phoenix FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe Error ( E ) and Recovery ratio ( R ) ˆ Dataset k Method Running Time E R DBLP sub(n=1e5) 500 ours 10,157 0 . 139 89% DBLP sub(n=1e5) 500 variational 558,723 16 . 38 99% DBLP(n=1e6) 100 ours 5407 0 . 105 95%

Discovering Gene Profiles of Neuronal Cell Types Learning mixture of point processes of cells through tensor methods. Components of mixture are candidates for neuronal cell types.

Hierarchical Tensors for Healthcare Analytics = .... + .... .... = .... = = + + +

Hierarchical Tensors for Healthcare Analytics = .... + .... .... = .... = = + + + CMS dataset: 1.6 million patients, 15.8 million events. Mining disease inferences from patient records.

Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to Fei-Fei Li, Rob Fergus, Antonio Torralba, et al.

Tensor Methods for Training Neural Networks m -th order score function First order score function Input pdf is p ( x ) S m ( x ) := ( − 1) m ∇ ( m ) p ( x ) p ( x ) Capture local variations in data. Algorithm for Training Neural Networks Estimate score functions using autoencoder. Decompose tensor E [ y ⊗ S m ( x )] to obtain weights, for m ≥ 3 . Recursively estimate score function of autoencoder and repeat.

Tensor Methods for Training Neural Networks m -th order score function Second order score function Input pdf is p ( x ) S m ( x ) := ( − 1) m ∇ ( m ) p ( x ) p ( x ) Capture local variations in data. Algorithm for Training Neural Networks Estimate score functions using autoencoder. Decompose tensor E [ y ⊗ S m ( x )] to obtain weights, for m ≥ 3 . Recursively estimate score function of autoencoder and repeat.

Tensor Methods for Training Neural Networks m -th order score function Third order score function Input pdf is p ( x ) S m ( x ) := ( − 1) m ∇ ( m ) p ( x ) p ( x ) Capture local variations in data. Algorithm for Training Neural Networks Estimate score functions using autoencoder. Decompose tensor E [ y ⊗ S m ( x )] to obtain weights, for m ≥ 3 . Recursively estimate score function of autoencoder and repeat.

Demo: Training Neural Networks

Combining Probabilistic Models with Deep Learning Multi-object Detection in Computer vision Deep learning is able to extract good features, but not context. Probabilistic models capture contextual information. Hierarchical models + pre-trained deep learning features. State-of-art results on Microsoft COCO.

Conclusion: Tensor Methods for Learning Tensor Decomposition Efficient sample and computational complexities Better performance compared to EM, Variational Bayes etc. In practice Scalable and embarrassingly parallel: handle large datasets. Efficient performance: perplexity or ground truth validation.

My Research Group and Resources Furong H. Majid J. Hanie S. Niranjan U.N. Forough A. Tejaswi N. Hao L. Yang S. ML summer school lectures available at http://newport.eecs.uci.edu/anandkumar/MLSS.html

Tensor Methods: A New Paradigm for Probabilistic Models and Feature - PowerPoint PPT Presentation

Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning Anima Anandkumar U.C. Irvine Learning with Big Data Data vs. Information Data vs. Information Data vs. Information Missing observations, gross corruptions,

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

Higher order black holes of scalar tensor theories E Babichev and CC gr-qc/1312.3204 CC, T

Hairy black holes in scalar tensor theories E Babichev and CC gr-qc/1312.3204 CC, T Kolyvaris, E

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Prolog Declarative/logic paradigm Functional paradigm No assignment statement

Tensor Methods for Signal Processing and Machine Learning Qibin Zhao Tensor Learning Unit RIKEN

and You Tensor network methods Matrix product states (MPS) Projected Entangled Pair States

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

PARADIGM Erkin Otles CS 838 PARADIGM Approach We developed an approach called PARADIGM

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

New observable for CP asymmetry in charm decays Hsiang-nan Li ( ) Academia Sinica,

JRA2 Task 24.7 Testing JRA Face to Face Meeting ESRF Ivan Andrian 2018.05.24 2 Data

On Model Checking Techniques for Randomized Distributed Systems Christel Baier Technische

Women on Boards in Italy Magda Bianco, Angela Ciavarella, Rossella Signoretti Banca dItalia,

inference-making Cat Davies, Michelle McGillion, Danielle Matthews This session will 1. Review

Scheduling Jobs with Unknown Duration in Clouds Siva Theja Maguluri, Student Member, IEEE, and R.

Abelian, nilpotent and solvable quandles David Stanovsk y jointly with M. Bonatto, P. Jedli

4TH WORKSHOP ON RECONFIGURABLE COMPUTING FOR MACHINE LEARNING Organisers: Christos Bouganis,