Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - - PowerPoint PPT Presentation
Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - - PowerPoint PPT Presentation
Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content Dense Matrix Multiplication Previous Work Our Approach and Strategy Analysis Experiment Sparse Matrix Multiplication Our Approach Experiment Previous
Content
- Dense Matrix Multiplication
- Previous Work
- Our Approach and Strategy
- Analysis
- Experiment
- Sparse Matrix Multiplication
- Our Approach
- Experiment
Previous Work in Dense Mtrx
- Hama project -- a distributed scientific package based on
Hadoop for massive matrix and graph data. “HAMA: An Efficient Matrix Computation with the MapReduce Framework”, IEEE 2010 CloudCom Workshop
- “A MapReduce Algorithm for Matrix Multiplication”, John
Norstad, Northwestern University
Our Approach
- Try to push the computation ahead into map phase
without data preprocessing. Finish the task in one map/reduce job
- Provide Mapper with information from two matrix
files when generating the splits
- Modified classes include: FileSplit, FileInputFormat
and RecordReader
Strategy 1
M: matrix size n: # of blocks per line/column N: # of physical map slots
Strategy 2
Strategy 3
ANALYSIS
Comparing the Three Strategies
Strategy 1 Strategy 2 Strategy 3 Mapper input traffic (total) 2M2n 2M2n M2n Mapper input traffic (average) 2M2/n2 2M2/n M2/n Shuffle traffic M2n M2 M2n Computation per mapper 1 n n Memory per mapper 3M2/n2 2M2/n 2M2/n Number of (logical) mappers n3 n2 n2
Comparing the Three Strategies
- Fix number of physical map slots = N
Strategy 1 Strategy 2 Strategy 3 n N1/3 N1/2 N1/2 Mapper input traffic (total) 2M2N1/3 2M2N1/2 M2N1/2 Mapper input traffic (average) M2N-2/3 2M2N-1/2 M2N-1/2 Shuffle traffic M2N1/3 M2 M2N1/2 Computation per mapper 1 N1/2 N1/2 Memory per mapper 3M2N-2/3 2M2N-1/2 2M2N-1/2
EXPERIMENTS
Impact of block size on running time
- 4 nodes – 12 map slots
200 400 600 800 1000 1200 1400 1600 2 4 6 8 10
Running time (sec) Blocks
M=1000 M=2000 M=3000 M=4000 M=5000
Impact of block size on running time
- 8 nodes – 24 map slots
200 400 600 800 1000 1200 1400 1600 2 4 6 8 10
Running time (sec) Blocks
M=1000 M=2000 M=3000 M=4000 M=5000
Impact of block size on running time
- 16 nodes – 48 map slots
200 400 600 800 1000 1200 1400 1600 2 4 6 8 10
Running time (sec) Blocks
M=1000 M=2000 M=3000 M=4000 M=5000
Impact of map slots on running time
200 400 600 800 1000 1200 1400 1600 4 8 16
Running time (sec) # of nodes
1000 2000 3000 4000 5000
Comparing the three strategies
100 200 300 400 500 600 1000 2000 3000 4000 5000
Running time (sec) M
Strategy 1 Strategy 2 Strategy 3
Others
- Comparing with existing work
– Northwestern’s 2-job algorithm: an analogy to strategy 1 – Took them 2365s (40 mins) to multiply two 5000-by-5000 matrices, with 48 map/reduce slots – Took our program only 485s (8 mins)
- Scalability
– Took us 3916sec to multiply two 10k-by-10k matrices, with 48 map/reduce slots
SPARSE MATRIX
An Example
Saved in file: 0 1 2 20 1 2 0 18 3 25 2 1 1 28 3 1 3 30
Our Approach
A B
Each Mapper will be assigned some number of lines in A, so that the total number of non-zero values in those lines are about the same among different Mappers
Experiment
- We set the number of Mappers to be slightly smaller than the
number of physical map slots. So the map phase could be done in one wave. Minimizing the overhead.
- M log2M non-zero values
Impact of Matrix Size on Running Time
200 400 600 800 1000 1200 1400 1600 100000 200000 300000 400000
Running time (sec) Matrix size
4nodes 8nodes 16nodes
Future Work
- Use more nodes to run the experiment to
differentiate the performance between the three strategies
- In Dense Mtrx Multiply, take the number of physical
slots into account when generating splits. Finish the map phase in one wave. Minimize the overhead
- Run experiment on real world data, both dense and
sparse
- More work could be done in sparse