M a d L I N Q : L a r g e - S c a l e D i s r - - PowerPoint PPT Presentation

m a d l i n q
SMART_READER_LITE
LIVE PREVIEW

M a d L I N Q : L a r g e - S c a l e D i s r - - PowerPoint PPT Presentation

M a d L I N Q : L a r g e - S c a l e D i s r i b u t e d M a t r i x C o m p u t a t i o n f o r t h e C l o u d B y Z h e n g p i n g Q i a n , X i u w e i C h e n


slide-1
SLIDE 1

M a d L I N Q :

L a r g e

  • S

c a l e D i s r i b u t e d M a t r i x C

  • m

p u t a t i

  • n

f

  • r

t h e C l

  • u

d

B y Z h e n g p i n g Q i a n , X i u w e i C h e n , N a n x i K a n g , Mi n g c h e n g C h e n , Y u a n Y u , T h

  • m

a s Mo s c i b r

  • d

a , Z h e n g Z h a n g M i c r

  • s
  • f

t R e s e a r c h A s i a , S h a n g h a i J i a

  • t
  • n

g U n i v e r s i t y , Mi c r

  • s
  • f

t R e s e a r c h S i l i c

  • n

V a l l e y P r e s e n t e r : H a i k a l P r i b a d i ( h p 3 5 6 )

slide-2
SLIDE 2

M a d L I N Q

M

  • t

i v a t i

  • n

C

  • n

t r i b u t i

  • n

E v a l u a t i

  • n

F u t u r e Wo r k

slide-3
SLIDE 3

M

  • t

i v a t i

  • n
slide-4
SLIDE 4

D i s t r i b u t e d E n g i n e s – G

  • d

a n d B a d

S u c c e s s

– S

t r

  • n

g s u b s e t

  • f

r e l a t i

  • n

a l

  • p

e r a t

  • r

s

  • F

i l t e r i n g , p r

  • j

e c t i

  • n

, a g g r e g a t i

  • n

, s

  • r

t i n g a n d j

  • i

n s

  • E

x t e n s i

  • n

s v i a u s e r

  • d

e f i n e d f u n c t i

  • n

s

– A

d

  • p

t s d i r e c t

  • a

c y c l i c

  • g

r a p h ( D A G ) e x e c u t i

  • n

m

  • d

e l

  • S

c a l a b l e a n d r e s i l i e n t

P r

  • b

l e m a t i c

– D

e e p a n a l y s i s a n d m a n i p u l a t i

  • n
  • f

d a t a

– R

e q u i r e s l i n e a r a l g e b r a a n d m a t r i x c

  • m

p u t a t i

  • n
slide-5
SLIDE 5

D i s t r i b u t e d E n g i n e s

  • P

r

  • b

l e m

L i n e a r a l g e b r a a n d m a t r i x c

  • m

p u t a t i

  • n

– M

a c h i n e L e a r n i n g

  • M

u l t i p l i c a t i

  • n

, S V D , L U f a c t

  • r

i z a t i

  • n
  • C

h

  • l

e s k y f a c t

  • r

i z a t i

  • n

– R

a n k i n g

  • r

c l a s s i f i c a t i

  • n

a l g

  • r

i t h m

– S

  • c

i a l w e b

  • m

i n i n g

  • r

i n f

  • r

m a t i

  • n

r e t r i e v a l

– H

a r d t

  • c

a p t u r e i n r e l a t i

  • n

a l a l g e b r a

  • p

e r a t

  • r

s

– R

e a l w

  • r

l d m a t r i x a n d d a t a m i n i n g a l g

  • r

i t h m s a r e e x t r e m e l y h a r d t

  • i

m p l e m e n t

slide-6
SLIDE 6

H i g h P e r f

  • r

m a n c e C

  • m

p u t i n g

S

  • l

u t i

  • n

t

  • m

a t r i x c

  • m

p u t a t i

  • n

H

  • w

e v e r

– I

n v

  • l

v e s l

  • w

l e v e l p r i m i t i v e s t

  • d

e v e l

  • p

a l g

  • r

i t h m s

– S

i n g l e P r

  • c

e s s M u l t i p l e D a t a ( S P M D ) e x e c u t i

  • n

m

  • d

e l

– P

r

  • b

l e m m a i n t a i n e d i n m e m

  • r

y

– C

  • n

s t r a i n s p r

  • g

r a m m a b i l i t y , s c a l a b i l i t y a n d r

  • b

u s t n e s s

– N

  • t

a p p l i c a b l e f

  • r

w e b

  • s

c a l e b i g d a t a a n a l y s i s

slide-7
SLIDE 7

H A M A – M a t r i x O p e r a t i

  • n
  • n

M a p R e d u c e

R e m

  • v

e s t h e c

  • n

s t r a i n t

  • f

p r

  • b

l e m s i z e M a p R e d u c e i n t e r f a c e i s r e s t r i c t i v e

– D

i f f i c u l t t

  • p

r

  • g

r a m r e a l w

  • r

l d l i n e a r a l g e b r a

– I

m p l i c i t l y s y n c h r

  • n

i z e d

– F

a i l s t

  • t

a k e a d v a n t a g e

  • f

s e m a n t i c s

  • f

m a t r i x

  • p

e r a t i

  • n

s

slide-8
SLIDE 8

C

  • n

t r i b u t i

  • n
slide-9
SLIDE 9

M a t r i x C

  • m

p u t a t i

  • n

S y s t e m

U n i f i e d p r

  • g

r a m m i n g m

  • d

e l

– M

a t r i x d e v e l

  • p

m e n t l a n g u a g e

– A

p p l i c a t i

  • n

d e v e l

  • p

m e n t l i b r a r y

I n t e g r a t e w i t h d a t a

  • p

a r a l l e l c

  • m

p u t i n g s y s t e m M a i n t a i n s c a l a b i l i t y a n d r

  • b

u s t n e s s

  • f

D A G

– F

i n e

  • g

r a i n e d p i p e l i n i n g ( F G P )

– L

i g h t w e i g h t f a u l t

  • t
  • l

e r a n c e p r

  • t
  • c
  • l
slide-10
SLIDE 10
slide-11
SLIDE 11

P r

  • g

r a m m i n g M

  • d

e l

  • M

a t r i x

D e v e l

  • p

m a t r i x a l g

  • r

i t h m s M a t r i x

  • p

t i m i z a t i

  • n

s B a s e d

  • n

t i l e a b s t r a c t i

  • n

– S

q u a r e s u b

  • m

a t r i c e s

– I

n d e x e d g r i d

  • f

t i l e s f

  • r

m a m a t r i x

– M

a t r i c e s e x p r e s s e d n a t u r a l l y

– S

t r u c t u r a l c h a r a c t e r i s t i c

  • f

m a t r i c e s

slide-12
SLIDE 12

P r

  • g

r a m m i n g M

  • d

e l

  • M

a t r i x

M a t r i x m u l t i p l i c a t i

  • n

c

  • d

e e x a m p l e :

MadLINQ.For(0, m, 1, i => { MadLINQ.For(0, p, 1, j => { c[i, j] = 0; MadLINQ.For(0, n, 1, k => c[i, j] += a[i, k] * b[k, j]); }); });

slide-13
SLIDE 13

P r

  • g

r a m m i n g M

  • d

e l

  • M

a t r i x

C h

  • l

e s k y t i l e

  • a

l g

  • r

i t h m i m p l e m e n t a t i

  • n

MadLINQ.For(0, n, 1, k => { L[k, k] = A[k, k].DPOTRF(); MadLINQ.For(k + 1, n, 1, l => L[l, k] = Tile.DTRSM(L[k, k], A[l, k])); MadLINQ.For(k + 1, n, 1, m => { A[m, m] = Tile.DSYRK(A[m, k], A[m, m]); MadLINQ.For(m + 1, n, 1, l => A[l, m] = Tile.DGEMM(A[l, k], A[m, k], A[l, m])); }); });

slide-14
SLIDE 14

P r

  • g

r a m m i n g M

  • d

e l – A p p l i c a t i

  • n

e x .

C

  • l

l a b

  • r

a t i v e F i l t e r i n g

– B

a s e l i n e a l g

  • r

i t h m w i t h d a t a s e t f r

  • m

N e t f l i x

– D

a t a s e t : m a t r i x R r e c

  • r

d s u s e r s ' r a t i n g s

  • n

m

  • v

i e s

  • s

i m i l a r i t y = R x R

t

( s p a r s e m a t r i x )

  • s

c

  • r

e s = s i m i l a r i t y x R ( d e n s e m a t r i x )

Matrix similarity = R.Multiply(R.Transpose()); Matrix scores = similarity.Multiply(R).Normalize();

slide-15
SLIDE 15

P r

  • g

r a m m i n g M

  • d

e l – A p p l i c a t i

  • n

e x .

M a r k

  • v

C l u s t e r i n g

– A

d j a c e n c y m a t r i x t

  • r

e p r e s e n t g r a p h s

MadLINQ.For(0, DEPTH, 1, i => { // Expansion G = G.Multiply(G); // Inflate: element-wise xˆ2 and row-based normalization G = G.EWiseMult(G).Normalize().Prune(); });

slide-16
SLIDE 16

P r

  • g

r a m m i n g M

  • d

e l – A p p l i c a t i

  • n

e x .

R e g u l a r i z e d L a t e n t S e m a n t i c I n d e x ( R L S I )

– w

e b

  • m

i n i n g a l g

  • r

i t h m t

  • d

e r i v e a p p r

  • x

i m a t e t

  • p

i c m

  • d

e l f

  • r

We b d

  • c

s

– O

n l y 1 L

  • C

w h i l e S C O P E ' s a d

  • p

t i

  • n
  • f

M a p R e d u c e t a k e s 1 1 + L

  • C

MadLINQ.For(0, T, 1, i => { // Update U Matrix S = V.Multiply(V.Transpose()); Matrix R = D.Multiply(V.Transpose()); // Assume tile size >= K MadLINQ.For(0, U.M, 1, m => U[m, 0] = Tile.UpdateU(S[0,0], R[m,0])); // Update V Matrix Phi = U.Transpose().Multiply(D); V = U.Transpose() .Multiply(U) .Add(TiledMatrix<double>.EYE(U.N, lambda2)) .CholeskySolve(Phi); });

slide-17
SLIDE 17

I n t e g r a t i

  • n

w i t h D r y a d L I N Q

// The input datasets var ratings = PartitionedTable.Get<LineRecord>(NetflixRating); // Step 1: Process the Netflix dataset in DryadLINQ Matrix R = ratings.Select(x => CreateEntry(x)).GroupBy(x => x.col) .SelectMany((g, i) => g.Select(x => new Entry(x.row, i, x.val))) .ToMadLINQ(MovieCnt, UserCnt, tileSize); // Step 2: Compute the scores of movies for each user Matrix similarity = R.Multiply(R.Transpose()); Matrix scores = similarity.Multiply(R).Normalize(); // Step 3: Create the result report var result = scores.ToDryadLinq(); result.GroupBy(x => x.col).Select(g => g.OrderBy().Take(5));

slide-18
SLIDE 18
slide-19
SLIDE 19

F i n e G r a i n e d P i p e l i n i n g ( F G P )

A v e r t e x i s r e a d w h e n i t s e a c h i n p u t c h a n n e l h a s p a r t i a l r e s u l t s , e x e c u t e w h i l e c

  • n

s u m i n g i n p u t

– D

a t a i n p u t /

  • u

t p u t a t f i n e r g r a n u l a r i t y

– E

x a m p l e , a d d i n g m a t r i x A a n d B :

  • E

a c h d i v i d e d t

  • 4

x 4 g r i d = 1 6 t i l e s

  • E

a c h t i l e i s d i v i d e d t

  • 1

6 b l

  • c

k s

  • V

e r t i c e s c a n s t r e a m i n p u t s

  • f

b l

  • c

k s

  • f

A a n d B

  • V

e r t i c e s c a n s t r e a m

  • u

t p u t

  • f

C b l

  • c

k s

T h e i n f e r i

  • r

m

  • d

e

  • f

e x e c u t i

  • n

:

– S

t a g e d e x e c u t i

  • n

: a v e r t e x i s r e a d y w h e n i t s p a r e n t s h a v e p r

  • d

u c e d a l l d a t a

slide-20
SLIDE 20

F a u l t T

  • l

e r a n c e P r

  • t
  • c
  • l

f

  • r

F G P

L

  • n

g c h a i n

  • f

v e r t i c e s R e

  • e

x e c u t i

  • n

r e c

  • m

p u t e s a l l d e s c e n d a n t s H i g h

  • v

e r h e a d T h u s :

  • n

l y r e c

  • m

p u t e n e e d b l

  • c

k s

– R

e c

  • v

e r i n g v e r t e x q u e r y d

  • w

n

  • s

t r e a m f

  • r

n e e d e d b l

  • c

k s

– R

e q u e s t s p e c i f i c a l l y n e e d e d b l

  • c

k s f r

  • m

u p s t r e a m

slide-21
SLIDE 21

E v a l u a t i

  • n
slide-22
SLIDE 22

E f f e c t s

  • f

F G P a n d F a u l t T

  • l

e r a n c e

C P U u t i l i z a t i

  • n
  • n

e x e c u t i

  • n
  • f

C h

  • l

e s k y ,

  • n

9 6 K x 9 6 K d e n s e m a t r i x , 1 2 8 c

  • r

e s ( 1 6 n

  • d

e s ) F G P b e i n g 1 5 . 9 % f a s t e r

slide-23
SLIDE 23

E f f e c t s

  • f

F G P a n d F a u l t T

  • l

e r a n c e

A g g r e g a t e d n e t w

  • r

k t r a f f i c v

  • l

u m e s P i p e l i n e d b e h a v e s m

  • r

e e v e n l y s p r e a d

slide-24
SLIDE 24

E f f e c t s

  • f

F G P a n d F a u l t T

  • l

e r a n c e

C

  • m

p a r i s

  • n

w i t h S c a L A P A C K , d e n s e m a t r i x

  • f

1 2 8 K x 1 2 8 K F G P c

  • n

s i s t e n t l y p e r f

  • r

m s b e t t e r t h a n S c a L A P A C K b y a n a v e r a g e 1 4 . 4 %

slide-25
SLIDE 25

R e a l Wo r l d A p p l i c a t i

  • n

s

R e g u l a r i z e d L a t e n t S e m a n t i c I n d e x ( R L S I )

16 nodes 32 nodes SCOPE 6000s MadLINQ - FGP 1838s 1188s MadLINQ - staged 2053 1260

slide-26
SLIDE 26

R e a l Wo r l d A p p l i c a t i

  • n

s

C

  • l

l a b

  • r

a t i v e F i l t e r i n g C

  • m

p a r e d a g a i n s t M a h

  • u

t

  • v

e r H a d

  • p

M = R x Rt (sparse) M x R (dense) Mahout over Hadoop 630s 780min (after R was broken into 10,

  • therwise cannot complete)

MadLINQ 347s 9.5min

slide-27
SLIDE 27

R e l a t e d Wo r k

slide-28
SLIDE 28

C r i t i c i s m

P r

  • t
  • t

y p e S

  • f

t w a r e H e a v y c

  • n

f i g u r a t i

  • n
  • n

p a r a m e t e r s a n d s e t t i n g s P a r a l l e l i s m d e p e n d s

  • n

w e l l t i l e

  • a

l g

  • r

i t h m s N

  • t

h a v i n g a s

  • l

i d b e n c h m a r k D r y a d L I N Q n

  • l
  • n

g e r a c t i v e

slide-29
SLIDE 29

F u t u r e Wo r k

slide-30
SLIDE 30

F u t u r e Wo r k

A u t

  • t

i l i n g

– V

e r t e x i s c u r r e n t l y p i p e l i n e a b l e i f f i t r e p r e s e n t s a t i l e a l g

  • r

i t h m

– C

u r r e n t l y d

  • n

e m a n u a l l y

D y n a m i c r e

  • t

i l i n g / b l

  • c

k i n g

– M

a t r i c e s m a y e v

  • l

v e a n d r e q u i r e d i f f e r e n t b l

  • c

k a n d t i l e s i z e

S p a r s e m a t r i c e s

– H

a n d l i n g s p a r s e m a t r i x i s s t i l l d i f f i c u l t

– n

  • n
  • z

e r

  • d

i s t r i b u t i

  • n

c a u s e s l a u d i m b a l a n c e