ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS - - PowerPoint PPT Presentation

role of tensors in machine learning trinity of ai ml
SMART_READER_LITE
LIVE PREVIEW

ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS - - PowerPoint PPT Presentation

Anima Anandkumar ROLE OF TENSORS IN MACHINE LEARNING TRINITY OF AI/ML ALGORITHMS COMPUTE DATA 2 EXAMPLE AI TASK: IMAGE CLASSIFICATION Maple Tree Villa Backyard Plant Potted Plant Garden Swimming Pool Water 3 DATA: LABELED IMAGES


slide-1
SLIDE 1

Anima Anandkumar

ROLE OF TENSORS IN MACHINE LEARNING

slide-2
SLIDE 2

2

TRINITY OF AI/ML DATA COMPUTE ALGORITHMS

slide-3
SLIDE 3

3

EXAMPLE AI TASK: IMAGE CLASSIFICATION

Maple Villa Plant Garden Water Swimming Pool Tree Potted Plant Backyard

slide-4
SLIDE 4

4

DATA: LABELED IMAGES FOR TRAINING AI

Picture credits: Image-net.org, ZDnet.com

➢ 14 million images and 1000 categories. ➢ Largest database of labeled images. ➢ Images in Fish category. ➢ Captures variations of fish.

slide-5
SLIDE 5

5

MODEL: CONVOLUTIONAL NEURAL NETWORK

.02 .85

p(ca t) p(do g)

➢ Deep learning: Many layers give large capacity for model to learn from data ➢ Inductive bias: Prior knowledge about natural images.

slide-6
SLIDE 6

7

MOORE’S LAW: A SUPERCHARGED LAW ➢ More than a billion

  • perations per image.

➢ NVIDIA GPUs enable parallel operations. ➢ Enables Large-Scale AI.

COMPUTE INFRASTRUCTURE FOR AI: GPU

slide-7
SLIDE 7

10

PROGRESS IN TRAINING IMAGENET

Statista: Statistics Portal

10 20 30 40 2010 2011 2012 2013 2014 Human 2015

Error in making 5 guesses about the image category

Need Trinity of AI : Data + Algorithms + Compute

slide-8
SLIDE 8

11

TENSORS PLAY A CENTRAL ROLE DATA COMPUTE ALGORITHMS

slide-9
SLIDE 9

12

TENSOR : EXTENSION OF MATRIX

slide-10
SLIDE 10

13

WHY TENSORS?

13

slide-11
SLIDE 11

14

TENSORS FOR DATA ENCODE MULTI-DIMENSIONALITY

Image: 3 dimensions Width * Height * Channels Video: 4 dimensions Width * Height * Channels * Time

slide-12
SLIDE 12

15

INDEXING A TENSOR

Notion of a fiber

  • Fibers = generalization of the concept of rows and columns for matrices
  • Obtained by fixing all indices but one
slide-13
SLIDE 13

16

INDEXING A TENSOR

Notion of a slice

  • Slices are obtained by fixing all indices but 2
  • Useful to make examples by stacking matrices
slide-14
SLIDE 14

17

TENSOR DIAGRAMS

Succinct notation

  • Represent only variables and indices (dimensions)
  • Tensors = vertices, mode = edge, order = degree
slide-15
SLIDE 15

18

TENSORS OPERATIONS TENSOR CONTRACTION PRIMITIVE

slide-16
SLIDE 16

19

TENSOR DIAGRAMS

Succinct notation

  • Contraction on a given dimension: simply link the indices over which to

contract together!

slide-17
SLIDE 17

20

EXAMPLE: DISCOVERING HIDDEN FACTORS

A Matrix of Measurements

slide-18
SLIDE 18

21

EXAMPLE: DISCOVERING HIDDEN FACTORS

Matrix Decomposition Methods

  • Find low rank
  • Approx. of matrix.
  • Each component is a

latent factor

slide-19
SLIDE 19

22

EXAMPLE: DISCOVERING HIDDEN FACTORS

Adding more dimensions to data through tensors

  • Collect more

data in another dimension.

  • Represent it as

a tensor.

  • How do we

exploit this additional dimension?

slide-20
SLIDE 20

23

EXAMPLE: DISCOVERING HIDDEN FACTORS

Low rank approximations of a tensor

  • Decompose tensor into

rank-1 components.

  • Declare each component

as a hidden factor

  • Why is this more

powerful than a matrix decomposition?

slide-21
SLIDE 21

24

MATRIX VS TENSOR DECOMPOSITION

Conditions for unique decomposition?

Unique when components are linearly independent Unique only when components are

  • rthogonal
slide-22
SLIDE 22

25

TENSOR DIAGRAMS

Notation for Tensor CP decomposition

  • Contraction on a given dimension: simply link the indices over which to

contract together!

slide-23
SLIDE 23

Pairwise correlations Third order correlations

TENSORS FOR HIGHER ORDER MOMENTS

WHY IS IT MORE POWERFUL?

slide-24
SLIDE 24

27

PRINCIPAL COMPONENT ANALYSIS (PCA)

Low-rank approximation of Covariance Matrix

  • Problem: Find best rank-k projection of (centered) data
  • Solution: Top Eigen components of Covariance matrix
  • Limitation: Uses first two moments. Gaussian approx.
  • But data tends to be far from Gaussian.
slide-25
SLIDE 25

28

UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS

Justice Education Sports Topics

slide-26
SLIDE 26

29

UNSUPERVISED LEARNING TOPIC MODELS THROUGH TENSORS

slide-27
SLIDE 27

30

TENSORS FOR MODELING: TOPIC DETECTION IN TEXT

Co-occurrence

  • f word triplets

Topic 1 Topic 2

slide-28
SLIDE 28

WHY TENSORS?

Statistical reasons:

  • Incorporate higher order relationships in data
  • Discover hidden topics (not possible with matrix methods)

Computational reasons:

  • Tensor algebra is parallelizable like linear algebra.
  • Faster than other algorithms for LDA
  • Flexible: Training and inference decoupled
  • Guaranteed in theory to converge to global optimum
  • A. Anandkumar etal,Tensor Decompositions for Learning Latent Variable Models, JMLR 2014.
slide-29
SLIDE 29

TENSOR-BASED TOPIC MODELING IS FASTER

  • Mallet is an open-source framework for topic modeling
  • Benchmarks on AWS SageMaker Platform
  • Bulit into AWS Comprehend NLP service.

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 5 10 15 20 25 30 50 75 100

Time in minutes Number of Topics Training time for NYTimes Spectral Time(minutes) Mallet Time (minutes)

0.00 50.00 100.00 150.00 200.00 250.00 5 10 15 20 25 50 100

Time in minutes Number of Topics Training time for PubMed Spectral Time (minutes) Mallet Time (minutes)

8 million documents

22x faster on average 12x faster on average

300000 documents

slide-30
SLIDE 30

33

TENSORS OPERATIONS TENSOR CONTRACTION PRIMITIVE

slide-31
SLIDE 31

34

TENSORS FOR MODELS STANDARD CNN USE LINEAR ALGEBRA

slide-32
SLIDE 32

35

Jean Kossaifi, Zack Chase Lipton, Aran Khanna, Tommaso Furlanello, A Jupyters notebook: https://github.com/JeanKossaifi/tensorly-notebooks

TENSORS FOR MODELS TENSORIZED NEURAL NETWORKS

slide-33
SLIDE 33

36

SPACE SAVING IN DEEP TENSORIZED NETWORKS

slide-34
SLIDE 34

37

TUCKER DECOMPOSITION

Generalizing Tensor CP decomposition

slide-35
SLIDE 35

38

TENSOR DIAGRAMS

Notation for Tucker Decomposition

  • Contraction on a given dimension: simply link the indices over which to

contract together!

slide-36
SLIDE 36

39

TENSORS FOR LONG-TERM FORECASTING

Difficulties in long term forecasting:

  • Long-term dependencies
  • High-order correlations
  • Error propagation

39

slide-37
SLIDE 37

RNNS: FIRST-ORDER MARKOV MODELS

Input state 𝑦𝑢, hidden state ℎ𝑢, output 𝑧𝑢, ℎ𝑢= 𝑔 𝑦𝑢, ℎ𝑢−1; 𝜄 ; 𝑧𝑢 = 𝑕( ℎ𝑢; 𝜄)

slide-38
SLIDE 38

TENSOR-TRAIN RNNS AND LSTMS

Seq2seq architecture TT-LSTM cells

slide-39
SLIDE 39

42

TENSOR DIAGRAMS

Notation for Tensor Train

  • Contraction on a given dimension: simply link the indices over which to

contract together!

slide-40
SLIDE 40

C l i m a t e d a t a s e t T r a f f i c d a t a s e t

TENSOR LSTM FOR LONG-TERM FORECASTING

Rose Yu Stephan Zhang Yisong Yue

slide-41
SLIDE 41

APPROXIMATION GUARANTEES FOR TT-RNN

Theorem: TT-RNN with m units approx. with error 𝜁

  • Dimension d , tensor-train rank r. Window p.
  • Bounded derivatives order k , smoothness C
  • Approximation error : bias of best model in function class.
  • No such guarantees exist for RNNs.
  • Easier to approximate if function is smooth and analytic.
  • Higher rank and bigger window more efficient.
slide-42
SLIDE 42

T E N S O R L Y : H I G H - L E V E L A P I F O R T E N S O R A L G E B R A

  • Python programming
  • User-friendly API
  • Multiple backends: flexible +

scalable

  • Example notebooks

Jean Kossaifi

slide-43
SLIDE 43

TENSORLY WITH PYTORCH BACKEND

import tensorly as tl from tensorly.random import tucker_tensor tl.set_backend(‘pytorch’) core, factors = tucker_tensor((5, 5, 5), rank=(3, 3, 3)) core = Variable(core, requires_grad=True) factors = [Variable(f, requires_grad=True) for f in factors]

  • ptimiser = torch.optim.Adam([core]+factors, lr=lr)

for i in range(1, n_iter):

  • ptimiser.zero_grad()

rec = tucker_to_tensor(core, factors) loss = (rec - tensor).pow(2).sum() for f in factors: loss = loss + 0.01*f.pow(2).sum() loss.backward()

  • ptimiser.step()

Set Pytorch backend Attach gradients Set optimizer Tucker Tensor form

slide-44
SLIDE 44

47

TENSORS FOR COMPUTE TENSOR CONTRACTION PRIMITIVE

slide-45
SLIDE 45

TENSOR PRIMITIVES?

  • 1969 – BLAS Level 1: Vector-Vector
  • 1972 – BLAS Level 2: Matrix-Vector
  • 1980 – BLAS Level 3: Matrix-Matrix
  • Now? – BLAS Level 4: Tensor-Tensor

History & Future

= 𝛽 + = ∗ = ∗ = ∗

Better Hardware utilization More complex data acceses

Kim, Jinsung, et al. "Optimizing Tensor Contractions in CCSD (T) for Efficient Execution on GPUs." (2018).

slide-46
SLIDE 46

49

slide-47
SLIDE 47

50

Thank you