A Medium-Grained Algorithm for Distributed Sparse Tensor - PowerPoint PPT Presentation

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George Karypis University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu Medium-Grained Sparse Tensor Factorization 1 / 24 http://cs.umn.edu/~splatt/

Table of Contents Preliminaries 1 Related Work: Coarse- and Fine-Grained Algorithms 2 A Medium-Grained Algorithm 3 Experiments 4 Conclusions 5 Medium-Grained Sparse Tensor Factorization 2 / 24 http://cs.umn.edu/~splatt/

Tensor Introduction Tensors are the generalization of matrices to ≥ 3 D Tensors have m dimensions (or modes ) and are I 1 × . . . × I m . ◮ We’ll stick to m = 3 in this talk and call dimensions I , J , K patients procedures diagnoses Medium-Grained Sparse Tensor Factorization 3 / 24 http://cs.umn.edu/~splatt/

Canonical Polyadic Decomposition (CPD) We compute matrices A , B , C , each with F columns ◮ F is assumed to be small, on the order of 10 or 50 + · · · + ≈ Usually computed via alternating least squares (ALS) As a result, computations are mode-centric Medium-Grained Sparse Tensor Factorization 4 / 24 http://cs.umn.edu/~splatt/

CPD-ALS Algorithm 1 CPD-ALS 1: while not converged do A ⊺ = ( C ⊺ C ∗ B ⊺ B ) − 1 � � ⊺ X (1) ( C � B ) 2: B ⊺ = ( C ⊺ C ∗ A ⊺ A ) − 1 � � ⊺ X (2) ( C � A ) 3: C ⊺ = ( B ⊺ B ∗ A ⊺ A ) − 1 � � ⊺ X (3) ( B � A ) 4: 5: end while Medium-Grained Sparse Tensor Factorization 5 / 24 http://cs.umn.edu/~splatt/

A Closer Look... Algorithm 2 One mode of CPD-ALS 1: ˆ A ← X (1) ( C � B ) ⊲ O ( F · nnz( X )) 2: LL ⊺ ← Cholesky( C ⊺ C ∗ B ⊺ B ) ⊲ O ( F 3 ) 3: A ⊺ = ( LL ⊺ ) − 1 ˆ ⊺ ⊲ O ( IF 2 ) A ⊲ O ( IF 2 ) 4: Compute A ⊺ A Step 1 is the most expensive and the focus of this talk Medium-Grained Sparse Tensor Factorization 6 / 24 http://cs.umn.edu/~splatt/

Matricized Tensor Times Khatri-Rao Product (MTTKRP) A C i k B j A ( i , :) ← ˆ ˆ A ( i , :) + X ( i , j , k ) [ B ( j , :) ∗ C ( k , :)] Medium-Grained Sparse Tensor Factorization 7 / 24 http://cs.umn.edu/~splatt/

MTTKRP Communication A C i k B j 1 j 2 A ( i , :) ← ˆ ˆ A ( i , :) + X ( i , j 1 , k ) [ B ( j 1 , :) ∗ C ( k , :)] A ( i , :) ← ˆ ˆ A ( i , :) + X ( i , j 2 , k ) [ B ( j 2 , :) ∗ C ( k , :)] Medium-Grained Sparse Tensor Factorization 8 / 24 http://cs.umn.edu/~splatt/

Coarse-Grained Decomposition A C B [Choi & Vishwanathan 2014, Shin & Kang 2014] Processes own complete slices of X and aligned factor rows I / p rows communicated to p − 1 processes after each update Medium-Grained Sparse Tensor Factorization 10 / 24 http://cs.umn.edu/~splatt/

Fine-Grained Decomposition [Kaya & U¸ car 2015] Most flexible: non-zeros individually assigned to processes Two communication steps Aggregate partial computations after MTTKRP 1 Exchange new factor values 2 Factors can be assigned to minimize communication Medium-Grained Sparse Tensor Factorization 11 / 24 http://cs.umn.edu/~splatt/

Finding a Fine-Grained Decomposition Some options: Random assignment Hypergraph partitioning Multi-constraint hypergraph partitioning In Practice: Hypergraph Model nnz( X ) vertices and I + J + K hyperedges Tight approximation of communication and load balance ◮ Distribution of factors must be considered: in practice a greedy solution works well Medium-Grained Sparse Tensor Factorization 12 / 24 http://cs.umn.edu/~splatt/

Medium-Grained Decomposition A 1 A 2 C 2 C 1 B 1 B 2 B 3 Distribute over a grid of p = q × r × s partitions r × s processes divide each A 1 , . . . , A q Two communication steps like fine-grained ◮ O ( I / p ) rows communicated to r × s processes Medium-Grained Sparse Tensor Factorization 14 / 24 http://cs.umn.edu/~splatt/

Medium-Grained Decomposition X (2 , 3 , 1) C 1 A 2 B 3 Each process owns roughly I / p rows of each factor Like before, a greedy algorithm works well Medium-Grained Sparse Tensor Factorization 15 / 24 http://cs.umn.edu/~splatt/

Finding a Medium-Grained Decomposition Greedy Algorithm 1 Apply a random relabeling to modes of X 2 Choose a decomposition dimension (algorithm in paper) 3 Compute 1D partitionings of each mode ◮ Greedily chosen with load balance objective 4 Intersect! 5 Distribute factors with objective of reducing communication Medium-Grained Sparse Tensor Factorization 16 / 24 http://cs.umn.edu/~splatt/

Datasets Dataset I J K nnz Netflix 480K 18K 2K 100M Delicious 532K 17M 3M 140M NELL 3M 2M 25M 143M Amazon 5M 18M 2M 1.7B Random1 20M 20M 20M 1.0B Random2 50M 5M 5M 1.0B Medium-Grained Sparse Tensor Factorization 18 / 24 http://cs.umn.edu/~splatt/

Load Balance Table: Load imbalance with 64 and 128 processes. coarse medium fine Dataset 64 128 64 128 64 128 Netflix 1.03 1.18 1.00 1.00 1.00 1.00 Delicious 1.21 1.41 1.01 1.06 1.00 1.05 NELL 1.12 1.29 1.01 1.01 1.00 1.00 Amazon 2.17 3.86 1.08 1.08 - - Medium-Grained Sparse Tensor Factorization 19 / 24 http://cs.umn.edu/~splatt/

Communication Volume Average Communication Volume Maximum Communication Volume 4.0 4.0 medium medium fine fine 3.5 3.5 Volume relative to coarse-grained Volume relative to coarse-grained 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 Netflix Delicious NELL Amazon Random1 Random2 Netflix Delicious NELL Amazon Random1 Random2 Medium-Grained Sparse Tensor Factorization 20 / 24 http://cs.umn.edu/~splatt/

Strong Scaling: Netflix DFacTo medium fine coarse ideal 10 2 10 1 Time per iteration 10 0 10 -1 10 -2 8 16 32 64 128 256 512 1024 Number of cores Medium-Grained Sparse Tensor Factorization 21 / 24 http://cs.umn.edu/~splatt/

Strong Scaling: Amazon DFacTo medium ideal coarse 10 2 10 1 Time per iteration 10 0 10 -1 64 128 256 512 1024 Number of cores Medium-Grained Sparse Tensor Factorization 22 / 24 http://cs.umn.edu/~splatt/

Wrapping Up... Medium-grained decompositions are a good middle-ground 1 . 5 × to 5 × faster than fine-grained decompositions with hypergraph partitioning DMS is 40 × to 80 × faster than DFacTo , the fastest publicly available software http://cs.umn.edu/~splatt/ Medium-Grained Sparse Tensor Factorization 24 / 24 http://cs.umn.edu/~splatt/

Choosing the Shape of the Decomposition Objective We need to find q , r , s such that q × r × s = p Tensors modes are often very skewed (480k Netflix users vs 2k days) ◮ We want to assign processes proportionally ◮ 1D decompositions actually work well for many tensors Algorithm 1 Start with a 1 × 1 × 1 shape 2 Compute the prime factorization of p 3 For each prime factor f , starting from the largest, multiply the most imbalanced mode by f Medium-Grained Sparse Tensor Factorization 24 / 24 http://cs.umn.edu/~splatt/

A Medium-Grained Algorithm for Distributed Sparse Tensor - PowerPoint PPT Presentation

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George Karypis University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu Medium-Grained Sparse Tensor Factorization 1 / 24

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific Northwest National Laboratory

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , Fredrik Kjolstad, and Saman

Presentation Outline 1. Medium Term Fiscal projections 1. The 2011/12 and Medium Term Budget

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Healthcare Association of New York State www.hanys.org Federal HIT Issues Update Latest from

DECISION-CTO Optimal Medical Therapy With or Without Stenting For Coronary Chronic Total

4nterconnect !assan Wassel 6 7 Mohit !iwari 6 7 9onathan :alamehr ; 7 <u>e !heogara@an ; 7

JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs Eunji

Implementing NYS Healthcare Reform Initiatives: DSRIP Update and Key IT Initiatives Greg Allen,

Disclosure Medical Problems in the I have no financial disclosures or conflicts of interest in

Think of it as a RE treat. Lower costs Contribute to society Something to do and think

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Candidate (kijungs@cs.cmu.edu)

A Medium-Grained Algorithm for Distributed Sparse Tensor - PowerPoint PPT Presentation

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George Karypis University of Minnesota Department of Computer Science & Engineering shaden@cs.umn.edu Medium-Grained Sparse Tensor Factorization 1 / 24

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Fine Grained Access Control Fine-Grained Access Control Fine Grained Access Control

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Fine-Grained Access Control Fine Grained Access Control Fine-grained access control examples:

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

A Sparse Tensor Format and a Benchmark Suite Jiajia Li Pacific Northwest National Laboratory

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &amp;

Format Abstraction for Sparse Tensor Algebra Compilers Stephen Chou , Fredrik Kjolstad, and Saman

Presentation Outline 1. Medium Term Fiscal projections 1. The 2011/12 and Medium Term Budget

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist,

Healthcare Association of New York State www.hanys.org Federal HIT Issues Update Latest from

DECISION-CTO Optimal Medical Therapy With or Without Stenting For Coronary Chronic Total

4nterconnect !assan Wassel 6 7 Mohit !iwari 6 7 9onathan :alamehr ; 7 &lt;u&gt;e !heogara@an ; 7

JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs Eunji

Implementing NYS Healthcare Reform Initiatives: DSRIP Update and Key IT Initiatives Greg Allen,

Disclosure Medical Problems in the I have no financial disclosures or conflicts of interest in

Think of it as a RE treat. Lower costs Contribute to society Something to do and think

Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Candidate (kijungs@cs.cmu.edu)

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &

4nterconnect !assan Wassel 6 7 Mohit !iwari 6 7 9onathan :alamehr ; 7 <u>e !heogara@an ; 7