ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS - PowerPoint PPT Presentation

Anima Anandkumar ROLE OF TENSORS IN MACHINE LEARNING

TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2

EXAMPLE AI TASK: IMAGE CLASSIFICATION Maple Tree Villa Backyard Plant Potted Plant Garden Swimming Pool Water 3

DATA: LABELED IMAGES FOR TRAINING AI ➢ 14 million images and 1000 categories. ➢ Images in Fish category. ➢ Largest database of labeled images. ➢ Captures variations of fish. Picture credits: Image-net.org, ZDnet.com 4

MODEL: CONVOLUTIONAL NEURAL NETWORK p(ca t) .02 p(do g) .85 ➢ Deep learning: Many layers give large capacity for model to learn from data ➢ Inductive bias: Prior knowledge about natural images. 5

COMPUTE INFRASTRUCTURE FOR AI: GPU ➢ More than a billion operations per image. ➢ NVIDIA GPUs enable parallel operations. ➢ Enables Large-Scale AI. MOORE’S LAW: A SUPERCHARGED LAW 7

PROGRESS IN TRAINING IMAGENET Error in making 5 guesses about the image category 40 30 20 10 0 2010 2011 2012 2013 2014 Human 2015 Need Trinity of AI : Data + Algorithms + Compute Statista: Statistics Portal 10

TENSORS PLAY A CENTRAL ROLE ALGORITHMS COMPUTE DATA 11

TENSOR : EXTENSION OF MATRIX 12

13 WHY TENSORS? 13

TENSORS FOR DATA ENCODE MULTI-DIMENSIONALITY Image: 3 dimensions Video: 4 dimensions Width * Height * Channels Width * Height * Channels * Time 14

INDEXING A TENSOR Notion of a fiber Fibers = generalization of the concept of rows and columns for matrices • Obtained by fixing all indices but one • 15

INDEXING A TENSOR Notion of a slice Slices are obtained by fixing all indices but 2 • Useful to make examples by stacking matrices • 16

TENSOR DIAGRAMS Succinct notation Represent only variables and indices (dimensions) • Tensors = vertices, mode = edge, order = degree • 17

TENSORS OPERATIONS TENSOR CONTRACTION PRIMITIVE 18

TENSOR DIAGRAMS Succinct notation Contraction on a given dimension: simply link the indices over which to • contract together! 19

EXAMPLE: DISCOVERING HIDDEN FACTORS A Matrix of Measurements 20

EXAMPLE: DISCOVERING HIDDEN FACTORS Matrix Decomposition Methods Find low rank • Approx. of matrix. Each component is a • latent factor 21

EXAMPLE: DISCOVERING HIDDEN FACTORS Adding more dimensions to data through tensors Collect more • data in another dimension. • Represent it as a tensor. How do we • exploit this additional dimension? 22

EXAMPLE: DISCOVERING HIDDEN FACTORS Low rank approximations of a tensor • Decompose tensor into rank-1 components. • Declare each component as a hidden factor Why is this more • powerful than a matrix decomposition? 23

MATRIX VS TENSOR DECOMPOSITION Conditions for unique decomposition? Unique only when components are orthogonal Unique when components are linearly independent 24

TENSOR DIAGRAMS Notation for Tensor CP decomposition Contraction on a given dimension: simply link the indices over which to • contract together! 25

TENSORS FOR HIGHER ORDER MOMENTS WHY IS IT MORE POWERFUL? Pairwise correlations Third order correlations

PRINCIPAL COMPONENT ANALYSIS (PCA) Low-rank approximation of Covariance Matrix Problem: Find best rank-k projection of (centered) data • Solution: Top Eigen components of Covariance matrix • Limitation: Uses first two moments. Gaussian approx. • But data tends to be far from Gaussian. • 27

UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS Topics Justice Education Sports 28

UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS 29

TENSORS FOR MODELING: TOPIC DETECTION IN TEXT Co-occurrence Topic 2 Topic 1 of word triplets 30

WHY TENSORS? Statistical reasons: • Incorporate higher order relationships in data • Discover hidden topics (not possible with matrix methods) Computational reasons: • Tensor algebra is parallelizable like linear algebra. • Faster than other algorithms for LDA • Flexible: Training and inference decoupled • Guaranteed in theory to converge to global optimum A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.

TENSOR-BASED TOPIC MODELING IS FASTER Training time for NYTimes Training time for PubMed 90.00 250.00 Spectral Time (minutes) Mallet Time (minutes) Spectral Time(minutes) Mallet Time (minutes) 80.00 Time in minutes Time in minutes 200.00 70.00 60.00 150.00 50.00 40.00 100.00 30.00 20.00 22x faster on average 50.00 12x faster on average 10.00 0.00 0.00 5 10 15 20 25 30 50 75 100 5 10 15 20 25 50 100 Number of Topics Number of Topics 300000 documents 8 million documents • Mallet is an open-source framework for topic modeling • Benchmarks on AWS SageMaker Platform • Bulit into AWS Comprehend NLP service.

TENSORS OPERATIONS TENSOR CONTRACTION PRIMITIVE 33

TENSORS FOR MODELS STANDARD CNN USE LINEAR ALGEBRA 34

TENSORS FOR MODELS TENSORIZED NEURAL NETWORKS Jean Kossaifi, Zack Chase Lipton, Aran Khanna, Tommaso Furlanello, A Jupyters notebook: https://github.com/JeanKossaifi/tensorly-notebooks 35

SPACE SAVING IN DEEP TENSORIZED NETWORKS 36

TUCKER DECOMPOSITION Generalizing Tensor CP decomposition 37

TENSOR DIAGRAMS Notation for Tucker Decomposition Contraction on a given dimension: simply link the indices over which to • contract together! 38

TENSORS FOR LONG-TERM FORECASTING Difficulties in long term forecasting: Long-term dependencies • High-order correlations • Error propagation • 39 39

RNNS: FIRST-ORDER MARKOV MODELS Input state 𝑦 𝑢 , hidden state ℎ 𝑢 , output 𝑧 𝑢 , ℎ 𝑢 = 𝑔 𝑦 𝑢 , ℎ 𝑢−1 ; 𝜄 ; 𝑧 𝑢 = 𝑕( ℎ 𝑢 ; 𝜄)

TENSOR-TRAIN RNNS AND LSTMS Seq2seq architecture TT-LSTM cells

TENSOR DIAGRAMS Notation for Tensor Train Contraction on a given dimension: simply link the indices over which to • contract together! 42

TENSOR LSTM FOR LONG-TERM FORECASTING T r a f f i c d a t a s e t C l i m a t e d a t a s e t Yisong Yue Stephan Zhang Rose Yu

APPROXIMATION GUARANTEES FOR TT-RNN • Approximation error : bias of best model in function class. • No such guarantees exist for RNNs. Theorem : TT-RNN with m units approx. with error 𝜁 • Dimension d , tensor-train rank r . Window p. • Bounded derivatives order k , smoothness C • Easier to approximate if function is smooth and analytic. • Higher rank and bigger window more efficient.

T E N S O R L Y : H I G H - L E V E L A P I F O R T E N S O R A L G E B R A • Python programming • User-friendly API • Multiple backends: flexible + scalable • Example notebooks Jean Kossaifi

TENSORLY WITH PYTORCH BACKEND import tensorly as tl from tensorly.random import tucker_tensor tl.set_backend( ‘ pytorch ’ ) Set Pytorch backend core, factors = tucker_tensor((5, 5, 5), rank=(3, 3, 3)) Tucker Tensor form core = Variable(core, requires_grad=True) Attach gradients factors = [Variable(f, requires_grad=True) for f in factors] optimiser = torch.optim.Adam([core]+factors, lr=lr) Set optimizer for i in range(1, n_iter): optimiser.zero_grad() rec = tucker_to_tensor(core, factors) loss = (rec - tensor).pow(2).sum() for f in factors: loss = loss + 0.01*f.pow(2).sum() loss.backward() optimiser.step()

TENSORS FOR COMPUTE TENSOR CONTRACTION PRIMITIVE 47

TENSOR PRIMITIVES? History & Future 1969 – BLAS Level 1: Vector-Vector • + = 𝛽 Better Hardware utilization More complex data acceses 1972 – BLAS Level 2: Matrix-Vector • = ∗ 1980 – BLAS Level 3: Matrix-Matrix • = ∗ • Now? – BLAS Level 4: Tensor-Tensor = ∗ Kim, Jinsung, et al. "Optimizing Tensor Contractions in CCSD (T) for Efficient Execution on GPUs." (2018).

Thank you 50

ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS - PowerPoint PPT Presentation

Anima Anandkumar ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2 EXAMPLE AI TASK: IMAGE CLASSIFICATION Maple Tree Villa Backyard Plant Potted Plant Garden Swimming Pool Water 3 DATA: LABELED IMAGES

Outline Outline 4 Basic Rules 4 Basic Rules 4 Vectors and Tensors 4 Vectors and Tensors 4

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Computing With Tensors: Modern Algorithm for . . . Modern Algorithm for . . . Potential

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Understanding Machine Learning with Language and Tensors Jon Rawski Linguistics Department

Tensors Lek-Heng Lim Statistics Department Retreat October 27, 2012 Thanks: NSF DMS 1209136 and

09 - Introduction to Tensors Data Mining and Matrices Universitt des Saarlandes, Saarbrcken

A CLT for Wishart Tensors Dan Mikulincer Weizmann Institute of Science 1 Wishart Tensors Let {

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

Trinity Education Project Overview April 2016 A Trinity Education 1 Trinity College Dublin, The

ROLE OF TENSORS IN ML TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2 EXAMPLE AI TASK: IMAGE

EE-559 Deep learning 1b. PyTorch Tensors Fran cois Fleuret https://fleuret.org/dlc/

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

LOS ANGELES CHAMBER LOS ANGELES CHAMBER ORCHESTRA ORCHESTRA Los Angeles Chamber Orchestra

Sparse sampling sampling design in design in Sparse population PK/PD studies studies

Q1 2013 Results 8 May 2013 Forward looking statement This presentation contains forward

Development and application of a reactive plume-in-grid model: evaluation over greater Paris 13 t

NEW THERAPIES AND DIFFICULT TO TREAT PATIENTS: HCV PD Dr. D. Nierhoff University of Cologne

May 12, 2007 Bond Election Information (Proposed Projects, Estimated Costs and Estimated Tax

Awards Ceremony years. We also recognise those Service staff who have gone beyond the call of

Reinvention of the Library in the Digital Age JILL BENN, UNIVERSITY LIBRARIAN, THE UNIVERSITY OF