Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - - PowerPoint PPT Presentation

matrix multiply in hadoop
SMART_READER_LITE
LIVE PREVIEW

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - - PowerPoint PPT Presentation

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content Dense Matrix Multiplication Previous Work Our Approach and Strategy Analysis Experiment Sparse Matrix Multiplication Our Approach Experiment Previous


slide-1
SLIDE 1

Matrix Multiply in Hadoop

Botong Huang and You Wu (Will)

slide-2
SLIDE 2

Content

  • Dense Matrix Multiplication
  • Previous Work
  • Our Approach and Strategy
  • Analysis
  • Experiment
  • Sparse Matrix Multiplication
  • Our Approach
  • Experiment
slide-3
SLIDE 3

Previous Work in Dense Mtrx

  • Hama project -- a distributed scientific package based on

Hadoop for massive matrix and graph data. “HAMA: An Efficient Matrix Computation with the MapReduce Framework”, IEEE 2010 CloudCom Workshop

  • “A MapReduce Algorithm for Matrix Multiplication”, John

Norstad, Northwestern University

slide-4
SLIDE 4

Our Approach

  • Try to push the computation ahead into map phase

without data preprocessing. Finish the task in one map/reduce job

  • Provide Mapper with information from two matrix

files when generating the splits

  • Modified classes include: FileSplit, FileInputFormat

and RecordReader

slide-5
SLIDE 5

Strategy 1

M: matrix size n: # of blocks per line/column N: # of physical map slots

slide-6
SLIDE 6

Strategy 2

slide-7
SLIDE 7

Strategy 3

slide-8
SLIDE 8

ANALYSIS

slide-9
SLIDE 9

Comparing the Three Strategies

Strategy 1 Strategy 2 Strategy 3 Mapper input traffic (total) 2M2n 2M2n M2n Mapper input traffic (average) 2M2/n2 2M2/n M2/n Shuffle traffic M2n M2 M2n Computation per mapper 1 n n Memory per mapper 3M2/n2 2M2/n 2M2/n Number of (logical) mappers n3 n2 n2

slide-10
SLIDE 10

Comparing the Three Strategies

  • Fix number of physical map slots = N

Strategy 1 Strategy 2 Strategy 3 n N1/3 N1/2 N1/2 Mapper input traffic (total) 2M2N1/3 2M2N1/2 M2N1/2 Mapper input traffic (average) M2N-2/3 2M2N-1/2 M2N-1/2 Shuffle traffic M2N1/3 M2 M2N1/2 Computation per mapper 1 N1/2 N1/2 Memory per mapper 3M2N-2/3 2M2N-1/2 2M2N-1/2

slide-11
SLIDE 11

EXPERIMENTS

slide-12
SLIDE 12

Impact of block size on running time

  • 4 nodes – 12 map slots

200 400 600 800 1000 1200 1400 1600 2 4 6 8 10

Running time (sec) Blocks

M=1000 M=2000 M=3000 M=4000 M=5000

slide-13
SLIDE 13

Impact of block size on running time

  • 8 nodes – 24 map slots

200 400 600 800 1000 1200 1400 1600 2 4 6 8 10

Running time (sec) Blocks

M=1000 M=2000 M=3000 M=4000 M=5000

slide-14
SLIDE 14

Impact of block size on running time

  • 16 nodes – 48 map slots

200 400 600 800 1000 1200 1400 1600 2 4 6 8 10

Running time (sec) Blocks

M=1000 M=2000 M=3000 M=4000 M=5000

slide-15
SLIDE 15

Impact of map slots on running time

200 400 600 800 1000 1200 1400 1600 4 8 16

Running time (sec) # of nodes

1000 2000 3000 4000 5000

slide-16
SLIDE 16

Comparing the three strategies

100 200 300 400 500 600 1000 2000 3000 4000 5000

Running time (sec) M

Strategy 1 Strategy 2 Strategy 3

slide-17
SLIDE 17

Others

  • Comparing with existing work

– Northwestern’s 2-job algorithm: an analogy to strategy 1 – Took them 2365s (40 mins) to multiply two 5000-by-5000 matrices, with 48 map/reduce slots – Took our program only 485s (8 mins)

  • Scalability

– Took us 3916sec to multiply two 10k-by-10k matrices, with 48 map/reduce slots

slide-18
SLIDE 18

SPARSE MATRIX

slide-19
SLIDE 19

An Example

Saved in file: 0 1 2 20 1 2 0 18 3 25 2 1 1 28 3 1 3 30

slide-20
SLIDE 20

Our Approach

A B

Each Mapper will be assigned some number of lines in A, so that the total number of non-zero values in those lines are about the same among different Mappers

slide-21
SLIDE 21

Experiment

  • We set the number of Mappers to be slightly smaller than the

number of physical map slots. So the map phase could be done in one wave. Minimizing the overhead.

  • M log2M non-zero values
slide-22
SLIDE 22

Impact of Matrix Size on Running Time

200 400 600 800 1000 1200 1400 1600 100000 200000 300000 400000

Running time (sec) Matrix size

4nodes 8nodes 16nodes

slide-23
SLIDE 23

Future Work

  • Use more nodes to run the experiment to

differentiate the performance between the three strategies

  • In Dense Mtrx Multiply, take the number of physical

slots into account when generating splits. Finish the map phase in one wave. Minimize the overhead

  • Run experiment on real world data, both dense and

sparse

  • More work could be done in sparse