MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud - - PowerPoint PPT Presentation

madlinq large scale distributed matrix computation for
SMART_READER_LITE
LIVE PREVIEW

MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud - - PowerPoint PPT Presentation

MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, Thomas Moscibroda, Zheng Zhang Philip Leonard October 27, 2014 Philip Leonard (University of Cambridge)


slide-1
SLIDE 1

MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud

Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, Thomas Moscibroda, Zheng Zhang Philip Leonard October 27, 2014

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 1 / 17

slide-2
SLIDE 2

Motivation

MapReduce and DryadLINQ relational algebra operators not suitable for linear algebra computations Demand for efficient matrix computations;

◮ Machine learning ◮ Graph algorithms (graphs boil down to sparse matrices)

Previous attempts failed to deliver;

◮ ScaLAPACK [2] too low level (MPI Knowledge required) ◮ HAMA built on top of MapReduce (still restrictive) Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 2 / 17

slide-3
SLIDE 3

Key Components of MadLINQ

Simple programming model for matrix computation New Fine Grained Pipelining (FGP) model Fault tolerance for FGP Integration with DryadLINQ [3]

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 3 / 17

slide-4
SLIDE 4

Tile Algorithms

A tile is a sub-matrix. Entire matrix is partitioned into a grid of tiles. This idea is what gives rise to parallelism in matrix computation. Aim is to maximise cache localisation by exploiting the structured access of matrix algorithms.

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 4 / 17

slide-5
SLIDE 5

Computation Example: Cholesky Decomposition

Takes a symmetric positive-definite matrix Matrix is partitioned into tiles On the k-th iteration, tile operations employed to factorise;

◮ diagonally (DPOTRF) ◮ n − k tiles below (DTRSM) ◮ trailing tiles to the right (DSYRK and DGEMM) Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 5 / 17

slide-6
SLIDE 6

Cholesky Decomposition Iteration

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 6 / 17

slide-7
SLIDE 7

Programming Model

C# constructs, allows DryadLINQ and MadLINQ integration. Matrix data abstraction in C# encapsulates tile representation. Programs expressed in a sequential fashion. Linear algebra library in C#

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 7 / 17

slide-8
SLIDE 8

Example Application: Collaborative Filtering

How to predict what other movies users will like given their rating of

  • ther movies.

R[i, j] is user j’s rating of movie i.

CF Equation

(R · RT) · R becomes;

CF MadLINQ Code

Matrix s i m i l a r i t y = R. M u l t i p l y (R. Transpose ( ) ) ; Matrix s c o r e s = s i m i l a r i t y . M u l t i p l y (R ) . Normalize ( ) ;

* Matrix goes from sparse (users haven’t seen most movies) to dense (every user has predicted score for every movie) Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 8 / 17

slide-9
SLIDE 9

CF: Integration with DryadLINQ

DryadLINQ processes Netflix dataset This boils down to a MadLINQ Matrix MadLINQ does transposition, matrix multiplication and normalisation

  • f R to get scores

DryadLINQ generates top 5 list of movies for each user.

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 9 / 17

slide-10
SLIDE 10

MadLINQ Architecture

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 10 / 17

slide-11
SLIDE 11

Directed Acyclic Graph (DAG)

DAG is dynamically expanded through symbolic execution to prevent explosion (O(n3) for Cholesky Decomposition) f 1 through f 4 are the tile operators discussed earlier (DPOTRF, DTRSM, DSYRK and DGEMM resp.)

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 11 / 17

slide-12
SLIDE 12

FGP & Fault Tolerance

Parallelism fluctuates with matrix computations Pipelining exploits vertex parallelism by increasing data granularity (recursively tiling matrices) Failure handling: Input blocks can be reconstructed from output blocks. Dependencies are calculated to reduce recovery cost.

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 12 / 17

slide-13
SLIDE 13

Optimisations & Configuration

From the authors experience, optimisations were made;

◮ Prefetching of vertex data for a close to terminating node ◮ Specifying order of matrix data (column or row first?) ◮ Auto switching between sparse (compressed) and dense matrices.

Configuration;

◮ Smaller tiles =

⇒ higher parallelism

◮ Granularity of computation is a block ◮ Block size determined by number of non-zero elements Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 13 / 17

slide-14
SLIDE 14

Observations & Applications

Observations;

◮ Pipelining performs better on larger problems ◮ Pipeline approach on average 14.4% faster than ScaLAPACK ◮ ScaLAPACK failed consistently using 32 cores (with no fault tolerance)

Real world applications;

◮ MadLINQ more efficient than MapReduce ◮ For Collaborative Filtering (recall (R · RT) · R) on 20k × 500k matrix

(Netflix challenge). Mahout over Hadoop took over 800 minutes, as

  • pposed to MadLINQ 16 (albeit Mahout produces results of higher

accuracy)

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 14 / 17

slide-15
SLIDE 15

Conclusion & Related Work

DAGuE, a similar use of DAG for tiled algorithms but failed to provide fault tolerance and resource dynamics Future research ideas;

◮ Auto-Tiling of matrices for matrix algorithms ◮ Dynamic Re-Tiling (dynamic changing of tile sizes for graph

algorithms)

◮ Sparse matrices cause load imbalance. Methods required for handling

these well.

Concludes MadLINQ fills the void that is large scale distributed matrix and graph processing, using linear algebra.

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 15 / 17

slide-16
SLIDE 16

References I

Zhengping Qian, Xiuwei Chen, Nanxi Kang, Mingcheng Chen, Yuan Yu, Thomas Moscibroda, Zheng Zhang MadLINQ: Large-Scale Distributed Matrix Computation for the Cloud EuroSys, 2012. Choi, J., Dongarra , J., Pozo , R., Walker , D. Scalapack: A scalable linear algebra library for distributed memory concurrent computers. In Symposium on the Frontiers of Massively Parallel Computation, 1992. Yu, Y., Isard, M., Fetterly, D., Et Al . DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008.

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 16 / 17

slide-17
SLIDE 17

Questions

Philip Leonard (University of Cambridge) MadLINQ October 27, 2014 17 / 17