Gilbert: Declarative Sparse Linear Algebra on Massively Parallel - - PowerPoint PPT Presentation

gilbert declarative sparse linear algebra on massively
SMART_READER_LITE
LIVE PREVIEW

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel - - PowerPoint PPT Presentation

Motivation Gilbert Evaluation Conclusion Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems Till Rohrmann 1 Sebastian Schelter 2 Tilmann Rabl 2 Volker Markl 2 1 Apache Software Foundation 2 Technische Universitt


slide-1
SLIDE 1

Motivation Gilbert Evaluation Conclusion

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

Till Rohrmann 1 Sebastian Schelter 2 Tilmann Rabl 2 Volker Markl 2

1Apache Software Foundation 2Technische Universität Berlin

March 8, 2017

1 / 25

slide-2
SLIDE 2

Motivation Gilbert Evaluation Conclusion

Motivation

2 / 25

slide-3
SLIDE 3

Motivation Gilbert Evaluation Conclusion

Information Age

Collected data grows exponentially Valuable information stored in data Need for scalable analytical methods

3 / 25

slide-4
SLIDE 4

Motivation Gilbert Evaluation Conclusion

Distributed Computing and Data Analytics

Writing parallel algorithms is tedious and error-prone Huge existing code base in form of libraries Need for parallelization tool

4 / 25

slide-5
SLIDE 5

Motivation Gilbert Evaluation Conclusion

Requirements

Linear algebra is lingua franca of analytics Parallelize programs automatically to simplify development Sparse operations to support sparse problems efficiently Goal Development of distributed sparse linear algebra system

5 / 25

slide-6
SLIDE 6

Motivation Gilbert Evaluation Conclusion

Gilbert

6 / 25

slide-7
SLIDE 7

Motivation Gilbert Evaluation Conclusion

Gilbert in a Nutshell

7 / 25

slide-8
SLIDE 8

Motivation Gilbert Evaluation Conclusion

System architecture

8 / 25

slide-9
SLIDE 9

Motivation Gilbert Evaluation Conclusion

Gilbert Language

Subset of MATLAB R

language

Support of basic linear algebra

  • perations

Fixpoint operator serves as side-effect free loop abstraction Expressive enough to implement a wide variety of machine learning algorithms

1 A = rand (10 , 2 ) ; 2 B = eye ( 1 0 ) ; 3 A’∗B; 4 f = @( x ) x . ^ 2 . 0 ; 5 eps = 0 . 1 ; 6 c = @(p , c ) norm (p−c , 2 ) < eps ; 7 f i x p o i n t (1/2 , f , 10 , c ) ;

9 / 25

slide-10
SLIDE 10

Motivation Gilbert Evaluation Conclusion

Gilbert Typer

Matlab is dynamically typed Dataflow systems require type knowledge at compile type Automatic type inference using the Hindley-Milner type inference algorithm Infer also matrix dimensions for optimizations

1 A = rand (10 , 2 ) : Matrix ( Double , 10 , 2) 2 B = eye ( 1 0 ) : Matrix ( Double , 10 , 10) 3 A’ ∗B: Matrix ( Double , 2 , 10) 4 f = @( x ) x . ^ 2 . 0 : N − > N 5 eps = 0 . 1 : Double 6 c = @(p , c ) norm (p−c , 2 ) < eps : (N,N) − > Boolean 7 f i x p o i n t (1/2 , f , 10 , c ) : Double

10 / 25

slide-11
SLIDE 11

Motivation Gilbert Evaluation Conclusion

Intermediate Representation & Gilbert Optimizer

Language independent representation of linear algebra programs Abstraction layer facilitates easy extension with new programming languages (such as R) Enables language independent optimizations

Transpose push down Matrix multiplication re-ordering

11 / 25

slide-12
SLIDE 12

Motivation Gilbert Evaluation Conclusion

Distributed Matrices

(a) Row partitioning (b) Quadratic block partitioning

Which partitioning is better suited for matrix multiplications? io_costrow = O

  • n3

io_costblock = O

  • n2√n
  • 12 / 25
slide-13
SLIDE 13

Motivation Gilbert Evaluation Conclusion

Distributed Operations: Addition

Apache Flink and Apache Spark offer MapReduce-like API with additional operators: join, coGroup, cross

13 / 25

slide-14
SLIDE 14

Motivation Gilbert Evaluation Conclusion

Evaluation

14 / 25

slide-15
SLIDE 15

Motivation Gilbert Evaluation Conclusion

Gaussian Non-Negative Matrix Factorization

Given V ∈ Rd×w find W ∈ Rd×t and H ∈ Rt×w such that V ≈ WH Used in many fields: Computer vision, document clustering and topic modeling Efficient distributed implementation for MapReduce systems Algorithm H ← randomMatrix(t, w) W ← randomMatrix(d, t) while V − WH2 > eps do H ← H · (W TV /W TWH) W ← W · (VHT/WHHT) end while

15 / 25

slide-16
SLIDE 16

Motivation Gilbert Evaluation Conclusion

Testing Setup

Set t = 10 and w = 100000 V ∈ Rd×100000 with sparsity 0.001 Block size 500 × 500 Numbers of cores 64 Flink 1.1.2 & Spark 2.0.0 Gilbert implementation: 5 lines Distributed GNMF on Flink: 70 lines

1 V = rand ( $rows , 100000 , 0 , 1 , 0 . 0 0 1 ) ; 2 H = rand (10 , 100000 , 0 , 1 ) ; 3 W = rand ( $rows , 10 , 0 , 1 ) ; 4 nH = H. ∗ ( (W’ ∗V) . / (W’ ∗W∗H)) 5 nW = W. ∗ (V∗nH ’ ) . / (W∗nH∗nH ’ )

16 / 25

slide-17
SLIDE 17

Motivation Gilbert Evaluation Conclusion

Gilbert Optimizations

103 104 100 200 300 Rows d of V Execution time t in s

Optimized Spark Optimized Flink Non-optimized Spark Non-optimized Flink

17 / 25

slide-18
SLIDE 18

Motivation Gilbert Evaluation Conclusion

Optimizations Explained

Matrix updates H ← H · (W TV /W TWH) W ← W · (VHT/WHHT) Non-optimized matrix multiplications

∈R10×100000

  • W TW
  • ∈R10×10

H

∈Rd×10

  • (WH)

∈Rd×100000

HT Optimized matrix multiplications

∈R10×100000

  • W TW
  • ∈R10×10

H

∈Rd×10

  • W
  • HHT

∈R10×10

18 / 25

slide-19
SLIDE 19

Motivation Gilbert Evaluation Conclusion

GNMF Step: Scaling Problem Size

103 104 105 101 102 Number of rows of matrix V Execution time t in s

Flink SP Flink Spark SP Spark Local

Distributed Gilbert execution handles much larger problem sizes than local execution Specialized implementation is slightly faster than Gilbert

19 / 25

slide-20
SLIDE 20

Motivation Gilbert Evaluation Conclusion

GNMF Step: Weak Scaling

100 101 102 20 40 60 Number of cores Execution time t in s

Flink Spark

Both distributed backends show good weak scaling behaviour

20 / 25

slide-21
SLIDE 21

Motivation Gilbert Evaluation Conclusion

PageRank

Ranking between entities with reciprocal quotations and references PR(pi) = d

  • pj∈L(pi)

PR(pj) D(pj) + 1 − d N N - number of pages d - damping factor L(pi) - set of pages being linked by pi D(pi) - number of linked pages by pi M - transition matrix derived from adjacency matrix R = d · MR + 1 − d N · ✶

21 / 25

slide-22
SLIDE 22

Motivation Gilbert Evaluation Conclusion

PageRank Implementation

MATLAB R

  • 1

i t = 10; 2 d = sum(A, 2) ; 3 M = ( diag (1 . / d ) ∗ A) ’ ; 4 r_0 = ones (n , 1) / n ; 5 e = ones (n , 1) / n ; 6 f o r i = 1: i t 7 r = .85 ∗ M ∗ r + .15 ∗ e 8 end

Gilbert

1 i t = 10; 2 d = sum(A, 2) ; 3 M = ( diag (1 . / d ) ∗ A) ’ ; 4 r_0 = ones (n , 1) / n ; 5 e = ones (n , 1) / n ; 6 f i x p o i n t ( r_0 , 7 @( r ) .85 ∗ M ∗ r + .15 ∗ e , 8 i t )

22 / 25

slide-23
SLIDE 23

Motivation Gilbert Evaluation Conclusion

PageRank: 10 Iterations

104 105 101 102 103 104 Number of vertices n Execution time t in s

Spark Flink SP Flink SP Spark

Gilbert backends show similar performance Specialized implementation faster because it can fuse operations

23 / 25

slide-24
SLIDE 24

Motivation Gilbert Evaluation Conclusion

Conclusion

24 / 25

slide-25
SLIDE 25

Motivation Gilbert Evaluation Conclusion

Conclusion

Easy to use sparse linear algebra environment for people familiar with MATLAB R

  • Scales to data sizes exceeding a single

computer High-level linear algebra optimizations improve runtime Slower than specialized implementations due to abstraction

  • verhead

25 / 25