Sparse Tensor Factorization: Algorithms, Data Structures, and - - PowerPoint PPT Presentation

sparse tensor factorization algorithms data structures
SMART_READER_LITE
LIVE PREVIEW

Sparse Tensor Factorization: Algorithms, Data Structures, and - - PowerPoint PPT Presentation

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith & George Karypis University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms,


slide-1
SLIDE 1

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

Shaden Smith & George Karypis

University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-2
SLIDE 2

Talk Outline

1

Introduction

2

Compressed Sparse Fiber

3

Cache-Friendly Reordering & Tiling

4

Distributed-Memory MTTKRP

5

Conclusions

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-3
SLIDE 3

Table of Contents

1

Introduction

2

Compressed Sparse Fiber

3

Cache-Friendly Reordering & Tiling

4

Distributed-Memory MTTKRP

5

Conclusions

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-4
SLIDE 4

Tensor Introduction

Tensors are the generalization of matrices to ≥ 3D Tensors have m dimensions (or modes) and are I1× . . . ×Im

◮ We’ll usually stick to I×J×K in this talk

users items contexts

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-5
SLIDE 5

Applications

Dataset I J K nnz NELL-2 12K 9K 28K 77M Beer 33K 66K 960K 94M Netflix 480K 18K 2K 100M Delicious 532K 17M 3M 140M NELL-1 3M 2M 25M 143M Amazon 5M 18M 2M 1.7B

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-6
SLIDE 6

Canonical Polyadic Decomposition (CPD)

We compute matrices A, B, C, each with F columns

◮ We will use A(1), . . . , A(m) when ≥ 3 modes

≈ + · · · + Usually computed via alternating least squares (ALS)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-7
SLIDE 7

Matricized Tensor Times Khatri-Rao Product

MTTKRP

MTTKRP is the core computation of each iteration A = X(1) (C B) X(1) (C B) I F J · K J · K

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-8
SLIDE 8

Alternating Least Squares

1: while not converged do 2:

A⊺ = (C⊺C ∗ B⊺B)−1 X(1)(C B) ⊺

3:

B⊺ = (C⊺C ∗ A⊺A)−1 X(2)(C A) ⊺

4:

C⊺ = (B⊺B ∗ A⊺A)−1 X(3)(B A) ⊺

5: end while

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-9
SLIDE 9

Tensor Storage – Coordinate Form

            i j k l v 1 1 1 2 1. 1 1 1 3 1. 1 2 1 3 3. 1 2 2 1 8. 2 2 1 1 1. 2 2 1 3 3. 2 2 2 2 8.            

Why don’t we unfold?

We need a representation of X for each mode NELL has dimensions 3M×2M×25M

◮ Add a fourth mode and we exceed 264 shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-10
SLIDE 10

MTTKRP

← i A j B k C A(i, :) ← A(i, :) + X(i, j, k) [B(j, :) ∗ C(k, :)]

Limitations

Memory bandwidth Parallelism

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-11
SLIDE 11

Table of Contents

1

Introduction

2

Compressed Sparse Fiber

3

Cache-Friendly Reordering & Tiling

4

Distributed-Memory MTTKRP

5

Conclusions

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-12
SLIDE 12

Can we do better?

Consider three nonzeros in the fiber X(i, j, :) (a vector) A(i, :) ← A(i, :) + X(i, j, k1) [B(j, :) ∗ C(k1, :)] A(i, :) ← A(i, :) + X(i, j, k2) [B(j, :) ∗ C(k2, :)] A(i, :) ← A(i, :) + X(i, j, k3) [B(j, :) ∗ C(k3, :)]

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-13
SLIDE 13

Can we do better?

Consider three nonzeros in the fiber X(i, j, :) (a vector) A(i, :) ← A(i, :) + X(i, j, k1) [B(j, :) ∗ C(k1, :)] A(i, :) ← A(i, :) + X(i, j, k2) [B(j, :) ∗ C(k2, :)] A(i, :) ← A(i, :) + X(i, j, k3) [B(j, :) ∗ C(k3, :)] A little factoring... A(i, :) ← A(i, :) + B(j, :) ∗

  • 3
  • x=1

X(i, j, kx)C(kx, :)

  • shaden@cs.umn.edu

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-14
SLIDE 14

SPLATT: The Surprisingly ParalleL spArse Tensor Toolkit

[Smith, Ravindran, Sidiropoulos, and Karypis 2015]

Fibers are sparse vectors Slice X(i, :, :) is almost a CSR matrix... But, we need m representations of X

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-15
SLIDE 15

Compressed Sparse Fiber (CSF)

            i j k l 1 1 1 2 1 1 1 3 1 2 1 3 1 2 2 1 2 2 1 1 2 2 1 3 2 2 2 2             → i j k l 1 1 1 2 3 2 1 3 2 1 2 2 1 1 3 2 2

[Smith and Karypis 2015]

Modes are recursively compressed Values are stored in the leaves (not shown)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-16
SLIDE 16

MTTKRP with a CSF Tensor

Objective

We want to perform MTTKRP on each tensor mode with only one CSF representation There are three types of nodes in a tree: root, internal, and leaf

◮ Each will have a tailored algorithm ◮ root and leaf are special cases of internal shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-17
SLIDE 17

CSF-LEAF

The leaf nodes determine the output location Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-18
SLIDE 18

CSF-LEAF

Hadamard products are pushed down the tree Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-19
SLIDE 19

CSF-LEAF

Hadamard products are pushed down the tree Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-20
SLIDE 20

CSF-LEAF

Hadamard products are pushed down the tree Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-21
SLIDE 21

CSF-LEAF

Leaves designate write locations Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-22
SLIDE 22

CSF-LEAF

Leaves designate write locations Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-23
SLIDE 23

CSF-LEAF

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-24
SLIDE 24

CSF-LEAF

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-25
SLIDE 25

CSF-LEAF

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-26
SLIDE 26

CSF-LEAF

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-27
SLIDE 27

CSF-LEAF

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-28
SLIDE 28

CSF-ROOT

Inner products are accumulated in a buffer Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-29
SLIDE 29

CSF-ROOT

Inner products are accumulated in a buffer Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-30
SLIDE 30

CSF-ROOT

Hadamard products are then propagated up the CSF tree Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-31
SLIDE 31

CSF-ROOT

Hadamard products are then propagated up the CSF tree Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-32
SLIDE 32

CSF-ROOT

Results are written to A(1) Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-33
SLIDE 33

CSF-ROOT

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-34
SLIDE 34

CSF-ROOT

The traversal continues... Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-35
SLIDE 35

CSF-ROOT

Partial results are kept in buffer Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-36
SLIDE 36

CSF-ROOT

Inner products are accumulated in a buffer Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-37
SLIDE 37

CSF-ROOT

Inner products are accumulated in a buffer Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-38
SLIDE 38

CSF-ROOT

Results are written to A(1) Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-39
SLIDE 39

CSF-INTERNAL

Internal nodes use a combination of CSF-ROOT and CSF-LEAF Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-40
SLIDE 40

CSF-INTERNAL

Hadamard products are pushed down to the output level Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-41
SLIDE 41

CSF-INTERNAL

CSF-ROOT next pulls up to the output level Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-42
SLIDE 42

CSF-INTERNAL

CSF-ROOT next pulls up to the output level Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-43
SLIDE 43

CSF-INTERNAL

CSF-ROOT next pulls up to the output level Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-44
SLIDE 44

CSF-INTERNAL

CSF-ROOT next pulls up to the output level Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-45
SLIDE 45

CSF-INTERNAL

CSF-ROOT next pulls up to the output level Z 1 1 1 2 3 2 1 3 2 1 A(1) A(2) A(3) A(4)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-46
SLIDE 46

Parallelism – Challenges?

            i j k l 1 1 1 2 1 1 1 3 1 2 1 3 1 2 2 1 2 2 1 1 2 2 1 3 2 2 2 2             → i j k l 1 1 1 2 3 2 1 3 2 1 2 2 1 1 3 2 2 CSF-ROOT can be parallelized over the trees CSF-INTERNAL and CSF-LEAF require more thought...

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-47
SLIDE 47

Parallelism – Tiling

For p threads, we do a p-way tiling of each tensor mode Distributing the tiles allows us to eleminate the need for mutexes A B C

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-48
SLIDE 48

Datasets

Dataset I J K nnz NELL-2 12K 9K 28K 77M Beer 33K 66K 960K 94M Netflix 480K 18K 2K 100M Delicious 532K 17M 3M 140M NELL-1 3M 2M 25M 143M Amazon 5M 18M 2M 1.7B

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-49
SLIDE 49

Storage Comparison

NELL-2 Beer Netflix Delicious NELL-1 Amazon Dataset 100 101 102 Tensor storage (GB)

3.7 4.5 5.0 8.2 8.8 99.3 2.3 2.8 3.0 4.2 4.3 51.9 1.1 1.4 1.6 2.6 2.4 36.4 1.2 1.8 2.1 4.0 3.6 47.5

CSFx3 COORD CSF-M CSF-T shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-50
SLIDE 50

Serial MTTKRP

NELL-2 Beer Netflix Delicious NELL-1 Amazon Dataset 1 2 3 4 5 6 Speedup over COORD with 1 core

CSFx3 CSF-M CSF-T shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-51
SLIDE 51

Parallel MTTKRP

NELL-2 Beer Netflix Delicious NELL-1 Amazon Dataset 10 20 30 40 50 60 Speedup over COORD with 16 cores

CSFx3 CSF-M CSF-T shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-52
SLIDE 52

Table of Contents

1

Introduction

2

Compressed Sparse Fiber

3

Cache-Friendly Reordering & Tiling

4

Distributed-Memory MTTKRP

5

Conclusions

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-53
SLIDE 53

Tensor Reordering

    3 3 2 2 1 1 1 1 2 2 3 3         3 3 3 3 2 2 2 2 1 1 1 1     We reorder the tensor to improve the access patterns on the factors

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-54
SLIDE 54

Tensor Reordering

      i j k 1 1 1 1 2 1 2 2 1 2 2 2       → 1 2 1 1 2 2 2 2 2

Graph Partitioning

We model the sparsity structure of X with a tripartite graph

◮ Slices are vertices, nonzeros connect slices with a triangle

Partitioning the graph finds regions with shared indices We reorder the tensor to group indices in the same partition

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-55
SLIDE 55

Cache Blocking over Tensors

Sparsity is Hard

Tiling lets us schedule nonzeros to reuse indices already in cache Cost: more fibers Tensor sparsity forces us to grow tiles

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-56
SLIDE 56

Scaling: NELL-2, Speedup vs Untiled

1 2 4 8 16 Cores 1 2 4 8 16 Speedup relative to CSF-3

ideal CSFx3 CSFx3+cache

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-57
SLIDE 57

Scaling: Netflix, Speedup vs Untiled

1 2 4 8 16 Cores 1 2 4 8 16 Speedup relative to CSF-3

ideal CSFx3 CSFx3+cache

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-58
SLIDE 58

Table of Contents

1

Introduction

2

Compressed Sparse Fiber

3

Cache-Friendly Reordering & Tiling

4

Distributed-Memory MTTKRP

5

Conclusions

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-59
SLIDE 59

MTTKRP Communication

i A j1 j2 B k C A(i, :) ← A(i, :) + X(i, j1, k) [B(j1, :) ∗ C(k, :)] A(i, :) ← A(i, :) + X(i, j2, k) [B(j2, :) ∗ C(k, :)]

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-60
SLIDE 60

Coarse-Grained Decomposition

A B C

[Choi & Vishwanathan 2014, Shin & Kang 2014]

Processes own complete slices of X and aligned factor rows I/p rows communicated to p−1 processes after each update

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-61
SLIDE 61

Fine-Grained Decomposition

[Kaya & U¸ car 2015]

Most flexible: non-zeros individually assigned to processes Two communication steps

1

Aggregate partial computations after MTTKRP

2

Exchange new factor values

Hypergraph partitioning is used to minimize communication

◮ Non-zeros mapped to vertices ◮ I+J+K hyperedges shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-62
SLIDE 62

Medium-Grained Decomposition

A1 A2 B1 B2 B3 C1 C2

[Smith & Karypis 2016]

Distribute over a grid of p = q×r×s partitions r×s processes divide each A1, . . . , Aq Two communication steps like fine-grained

◮ O(I/p) rows communicated to r×s processes shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-63
SLIDE 63

Average Communication Volume

Netflix Delicious NELL Amazon Random1 Random2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Volume relative to coarse-grained

Average Communication Volume medium fine

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-64
SLIDE 64

Maximum Communication Volume

Netflix Delicious NELL Amazon Random1 Random2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Volume relative to coarse-grained

Maximum Communication Volume medium fine

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-65
SLIDE 65

Strong Scaling: Netflix

8 16 32 64 128 256 512 1024 Number of cores 10-2 10-1 100 101 102 Time per iteration

DFacTo coarse medium ideal fine

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-66
SLIDE 66

Strong Scaling: Amazon

64 128 256 512 1024 Number of cores 10-1 100 101 102 Time per iteration

DFacTo coarse medium ideal

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-67
SLIDE 67

Table of Contents

1

Introduction

2

Compressed Sparse Fiber

3

Cache-Friendly Reordering & Tiling

4

Distributed-Memory MTTKRP

5

Conclusions

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-68
SLIDE 68

Wrapping Up

SPLATT is 40× to 80× faster than competing distributed-memory codes 50× to 300× faster single-node performance than Matlab

◮ > 1000× faster with a supercomputer!

New applications possible

1

Healthcare

2

Recommender systems

3

Yours!

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-69
SLIDE 69

Future Work

Still many open problems!

Manycore architectures Coupled factorization What’s beyond ALS?

Where does Intel fit in?

Intel is in a unique position to make significant contributions Kernels have unstructured access patterns and are memory-bound

◮ Mostly :-)

High-bandwidth memory and hardware-transactional memory are exciting technologies for tensor folk http://cs.umn.edu/~splatt/

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-70
SLIDE 70

Questions?

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-71
SLIDE 71

Backup Slides

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-72
SLIDE 72

Convergence Check

1: while not converged do 2:

. . .

3: end while

Checking for convergence is not trivial! ||X − Z||2

F = X, X constant

+ Z, Z

||Z||2

F

−2 X, Z

?

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-73
SLIDE 73

Convergence Check – Tensor Norm

||Z||2

F = λT

ATA ∗ BTB ∗ CTC

  • λ

The cost is negligible if we have cached ATA, etc.

◮ O(F 2) vs O(IF 2) flops shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-74
SLIDE 74

Convergence Check – Inner Product

X, Z =

F

  • f =1

λ(f )  

nnz(X)

X(i, j, k)A(i, f )B(j, f )C(k, f )   Cost: O(F nnz(X)), with a higher constant than MTTKRP

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-75
SLIDE 75

Convergence Check – Inner Product

X, Z =

F

  • f =1

λ(f )  

nnz(X)

X(i, j, k)A(i, f )B(j, f )C(k, f )  

  • does this look familiar?

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-76
SLIDE 76

Convergence Check – Inner Product

Smith and Karypis 2016

Keep the MTTKRP result from the last mode, ˆ C

◮ ˆ

C has the latest A and B values

ˆ C(k, :) =

  • nnz(X(:,:,k))

X(i, j, k) [A(i, :) ∗ B(j, :)]

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-77
SLIDE 77

Convergence Check – Inner Product

Smith and Karypis 2016

Keep the MTTKRP result from the last mode, ˆ C

◮ ˆ

C has the latest A and B values

ˆ C(k, :) =

  • nnz(X(:,:,k))

X(i, j, k) [A(i, :) ∗ B(j, :)] Now we just need to account for λ and the new C values X, Z = 1T C ∗ ˆ C

  • λ

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-78
SLIDE 78

Convergence Check – Inner Product

Smith and Karypis 2016

Keep the MTTKRP result from the last mode, ˆ C

◮ ˆ

C has the latest A and B values

ˆ C(k, :) =

  • nnz(X(:,:,k))

X(i, j, k) [A(i, :) ∗ B(j, :)] Now we just need to account for λ and the new C values X, Z = 1T C ∗ ˆ C

  • λ

Cost: O(IF), much cheaper than O(nnz(X)F)

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-79
SLIDE 79

Tensor Reordering – Mode Dependent

α β γ δ

  • αβ

γ δ i i1 i i2 i j1 i j2 i k1 i k2

Hypergraph Partitioning

Instead, create a new reordering for each mode of computation Fibers are now vertices and slices are hyperedges Overheads?

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges

slide-80
SLIDE 80

Choosing the Shape of the Decomposition

Objective

We need to find q, r, s such that q×r×s = p Tensors modes are often very skewed (480k Netflix users vs 2k days)

◮ We want to assign processes proportionally ◮ 1D decompositions actually work well for many tensors

Algorithm

1 Start with a 1×1×1 shape 2 Compute the prime factorization of p 3 For each prime factor f , starting from the largest, multiply the most

imbalanced mode by f

shaden@cs.umn.edu Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges