role of tensors in machine learning trinity of ai ml
play

ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS - PowerPoint PPT Presentation

Anima Anandkumar ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2 EXAMPLE AI TASK: IMAGE CLASSIFICATION Maple Tree Villa Backyard Plant Potted Plant Garden Swimming Pool Water 3 DATA: LABELED IMAGES


  1. Anima Anandkumar ROLE OF TENSORS IN MACHINE LEARNING

  2. TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2

  3. EXAMPLE AI TASK: IMAGE CLASSIFICATION Maple Tree Villa Backyard Plant Potted Plant Garden Swimming Pool Water 3

  4. DATA: LABELED IMAGES FOR TRAINING AI ➢ 14 million images and 1000 categories. ➢ Images in Fish category. ➢ Largest database of labeled images. ➢ Captures variations of fish. Picture credits: Image-net.org, ZDnet.com 4

  5. MODEL: CONVOLUTIONAL NEURAL NETWORK p(ca t) .02 p(do g) .85 ➢ Deep learning: Many layers give large capacity for model to learn from data ➢ Inductive bias: Prior knowledge about natural images. 5

  6. COMPUTE INFRASTRUCTURE FOR AI: GPU ➢ More than a billion operations per image. ➢ NVIDIA GPUs enable parallel operations. ➢ Enables Large-Scale AI. MOORE’S LAW: A SUPERCHARGED LAW 7

  7. PROGRESS IN TRAINING IMAGENET Error in making 5 guesses about the image category 40 30 20 10 0 2010 2011 2012 2013 2014 Human 2015 Need Trinity of AI : Data + Algorithms + Compute Statista: Statistics Portal 10

  8. TENSORS PLAY A CENTRAL ROLE ALGORITHMS COMPUTE DATA 11

  9. TENSOR : EXTENSION OF MATRIX 12

  10. 13 WHY TENSORS? 13

  11. TENSORS FOR DATA ENCODE MULTI-DIMENSIONALITY Image: 3 dimensions Video: 4 dimensions Width * Height * Channels Width * Height * Channels * Time 14

  12. INDEXING A TENSOR Notion of a fiber Fibers = generalization of the concept of rows and columns for matrices • Obtained by fixing all indices but one • 15

  13. INDEXING A TENSOR Notion of a slice Slices are obtained by fixing all indices but 2 • Useful to make examples by stacking matrices • 16

  14. TENSOR DIAGRAMS Succinct notation Represent only variables and indices (dimensions) • Tensors = vertices, mode = edge, order = degree • 17

  15. TENSORS OPERATIONS TENSOR CONTRACTION PRIMITIVE 18

  16. TENSOR DIAGRAMS Succinct notation Contraction on a given dimension: simply link the indices over which to • contract together! 19

  17. EXAMPLE: DISCOVERING HIDDEN FACTORS A Matrix of Measurements 20

  18. EXAMPLE: DISCOVERING HIDDEN FACTORS Matrix Decomposition Methods Find low rank • Approx. of matrix. Each component is a • latent factor 21

  19. EXAMPLE: DISCOVERING HIDDEN FACTORS Adding more dimensions to data through tensors Collect more • data in another dimension. • Represent it as a tensor. How do we • exploit this additional dimension? 22

  20. EXAMPLE: DISCOVERING HIDDEN FACTORS Low rank approximations of a tensor • Decompose tensor into rank-1 components. • Declare each component as a hidden factor Why is this more • powerful than a matrix decomposition? 23

  21. MATRIX VS TENSOR DECOMPOSITION Conditions for unique decomposition? Unique only when components are orthogonal Unique when components are linearly independent 24

  22. TENSOR DIAGRAMS Notation for Tensor CP decomposition Contraction on a given dimension: simply link the indices over which to • contract together! 25

  23. TENSORS FOR HIGHER ORDER MOMENTS WHY IS IT MORE POWERFUL? Pairwise correlations Third order correlations

  24. PRINCIPAL COMPONENT ANALYSIS (PCA) Low-rank approximation of Covariance Matrix Problem: Find best rank-k projection of (centered) data • Solution: Top Eigen components of Covariance matrix • Limitation: Uses first two moments. Gaussian approx. • But data tends to be far from Gaussian. • 27

  25. UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS Topics Justice Education Sports 28

  26. UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS 29

  27. TENSORS FOR MODELING: TOPIC DETECTION IN TEXT Co-occurrence Topic 2 Topic 1 of word triplets 30

  28. WHY TENSORS? Statistical reasons: • Incorporate higher order relationships in data • Discover hidden topics (not possible with matrix methods) Computational reasons: • Tensor algebra is parallelizable like linear algebra. • Faster than other algorithms for LDA • Flexible: Training and inference decoupled • Guaranteed in theory to converge to global optimum A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.

  29. TENSOR-BASED TOPIC MODELING IS FASTER Training time for NYTimes Training time for PubMed 90.00 250.00 Spectral Time (minutes) Mallet Time (minutes) Spectral Time(minutes) Mallet Time (minutes) 80.00 Time in minutes Time in minutes 200.00 70.00 60.00 150.00 50.00 40.00 100.00 30.00 20.00 22x faster on average 50.00 12x faster on average 10.00 0.00 0.00 5 10 15 20 25 30 50 75 100 5 10 15 20 25 50 100 Number of Topics Number of Topics 300000 documents 8 million documents • Mallet is an open-source framework for topic modeling • Benchmarks on AWS SageMaker Platform • Bulit into AWS Comprehend NLP service.

  30. TENSORS OPERATIONS TENSOR CONTRACTION PRIMITIVE 33

  31. TENSORS FOR MODELS STANDARD CNN USE LINEAR ALGEBRA 34

  32. TENSORS FOR MODELS TENSORIZED NEURAL NETWORKS Jean Kossaifi, Zack Chase Lipton, Aran Khanna, Tommaso Furlanello, A Jupyters notebook: https://github.com/JeanKossaifi/tensorly-notebooks 35

  33. SPACE SAVING IN DEEP TENSORIZED NETWORKS 36

  34. TUCKER DECOMPOSITION Generalizing Tensor CP decomposition 37

  35. TENSOR DIAGRAMS Notation for Tucker Decomposition Contraction on a given dimension: simply link the indices over which to • contract together! 38

  36. TENSORS FOR LONG-TERM FORECASTING Difficulties in long term forecasting: Long-term dependencies • High-order correlations • Error propagation • 39 39

  37. RNNS: FIRST-ORDER MARKOV MODELS Input state 𝑦 𝑢 , hidden state ℎ 𝑢 , output 𝑧 𝑢 , ℎ 𝑢 = 𝑔 𝑦 𝑢 , ℎ 𝑢−1 ; 𝜄 ; 𝑧 𝑢 = 𝑕( ℎ 𝑢 ; 𝜄)

  38. TENSOR-TRAIN RNNS AND LSTMS Seq2seq architecture TT-LSTM cells

  39. TENSOR DIAGRAMS Notation for Tensor Train Contraction on a given dimension: simply link the indices over which to • contract together! 42

  40. TENSOR LSTM FOR LONG-TERM FORECASTING T r a f f i c d a t a s e t C l i m a t e d a t a s e t Yisong Yue Stephan Zhang Rose Yu

  41. APPROXIMATION GUARANTEES FOR TT-RNN • Approximation error : bias of best model in function class. • No such guarantees exist for RNNs. Theorem : TT-RNN with m units approx. with error 𝜁 • Dimension d , tensor-train rank r . Window p. • Bounded derivatives order k , smoothness C • Easier to approximate if function is smooth and analytic. • Higher rank and bigger window more efficient.

  42. T E N S O R L Y : H I G H - L E V E L A P I F O R T E N S O R A L G E B R A • Python programming • User-friendly API • Multiple backends: flexible + scalable • Example notebooks Jean Kossaifi

  43. TENSORLY WITH PYTORCH BACKEND import tensorly as tl from tensorly.random import tucker_tensor tl.set_backend( ‘ pytorch ’ ) Set Pytorch backend core, factors = tucker_tensor((5, 5, 5), rank=(3, 3, 3)) Tucker Tensor form core = Variable(core, requires_grad=True) Attach gradients factors = [Variable(f, requires_grad=True) for f in factors] optimiser = torch.optim.Adam([core]+factors, lr=lr) Set optimizer for i in range(1, n_iter): optimiser.zero_grad() rec = tucker_to_tensor(core, factors) loss = (rec - tensor).pow(2).sum() for f in factors: loss = loss + 0.01*f.pow(2).sum() loss.backward() optimiser.step()

  44. TENSORS FOR COMPUTE TENSOR CONTRACTION PRIMITIVE 47

  45. TENSOR PRIMITIVES? History & Future 1969 – BLAS Level 1: Vector-Vector • + = 𝛽 Better Hardware utilization More complex data acceses 1972 – BLAS Level 2: Matrix-Vector • = ∗ 1980 – BLAS Level 3: Matrix-Matrix • = ∗ • Now? – BLAS Level 4: Tensor-Tensor = ∗ Kim, Jinsung, et al. "Optimizing Tensor Contractions in CCSD (T) for Efficient Execution on GPUs." (2018).

  46. 49

  47. Thank you 50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend