Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - PowerPoint PPT Presentation

Matrix Multiply in Hadoop Botong Huang and You Wu (Will)

Content • Dense Matrix Multiplication  Previous Work  Our Approach and Strategy  Analysis  Experiment • Sparse Matrix Multiplication  Our Approach  Experiment

Previous Work in Dense Mtrx • Hama project -- a distributed scientific package based on Hadoop for massive matrix and graph data. “HAMA: An Efficient Matrix Computation with the MapReduce Framework”, IEEE 2010 CloudCom Workshop • “A MapReduce Algorithm for Matrix Multiplication”, John Norstad, Northwestern University

Our Approach • Try to push the computation ahead into map phase without data preprocessing. Finish the task in one map/reduce job • Provide Mapper with information from two matrix files when generating the splits • Modified classes include: FileSplit, FileInputFormat and RecordReader

Strategy 1 M: matrix size n: # of blocks per line/column N: # of physical map slots

Strategy 2

Strategy 3

ANALYSIS

Comparing the Three Strategies Strategy 1 Strategy 2 Strategy 3 2M 2 n 2M 2 n M 2 n Mapper input traffic (total) 2M 2 /n 2 2M 2 /n M 2 /n Mapper input traffic (average) M 2 n M 2 M 2 n Shuffle traffic Computation per mapper 1 n n 3M 2 /n 2 2M 2 /n 2M 2 /n Memory per mapper n 3 n 2 n 2 Number of (logical) mappers

Comparing the Three Strategies Strategy 1 Strategy 2 Strategy 3 N 1/3 N 1/2 N 1/2 n 2M 2 N 1/3 2M 2 N 1/2 M 2 N 1/2 Mapper input traffic (total) M 2 N -2/3 2M 2 N -1/2 M 2 N -1/2 Mapper input traffic (average) M 2 N 1/3 M 2 M 2 N 1/2 Shuffle traffic N 1/2 N 1/2 Computation per mapper 1 3M 2 N -2/3 2M 2 N -1/2 2M 2 N -1/2 Memory per mapper • Fix number of physical map slots = N

EXPERIMENTS

Impact of block size on running time • 4 nodes – 12 map slots Running time (sec) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10

Impact of block size on running time • 8 nodes – 24 map slots Running time ( sec ) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10

Impact of block size on running time • 16 nodes – 48 map slots Running time (sec) 1600 1400 1200 M=1000 1000 M=2000 800 M=3000 M=4000 600 M=5000 400 200 0 Blocks 2 4 6 8 10

Impact of map slots on running time Running time (sec) 1600 1400 1200 1000 1000 2000 800 3000 4000 600 5000 400 200 0 # of nodes 4 8 16

Comparing the three strategies Running time (sec) 600 500 400 Strategy 1 300 Strategy 2 Strategy 3 200 100 0 M 1000 2000 3000 4000 5000

Others • Comparing with existing work – Northwestern’s 2-job algorithm: an analogy to strategy 1 – Took them 2365s (40 mins) to multiply two 5000-by-5000 matrices, with 48 map/reduce slots – Took our program only 485s (8 mins) • Scalability – Took us 3916sec to multiply two 10k-by-10k matrices, with 48 map/reduce slots

SPARSE MATRIX

An Example Saved in file: 0 1 2 20 1 2 0 18 3 25 2 1 1 28 3 1 3 30

Our Approach A B Each Mapper will be assigned some number of lines in A, so that the total number of non-zero values in those lines are about the same among different Mappers

Experiment • We set the number of Mappers to be slightly smaller than the number of physical map slots. So the map phase could be done in one wave. Minimizing the overhead. • M log 2 M non-zero values

Impact of Matrix Size on Running Time Running time (sec) 1600 1400 1200 1000 4nodes 8nodes 800 16nodes 600 400 200 Matrix size 0 100000 200000 300000 400000

Future Work • Use more nodes to run the experiment to differentiate the performance between the three strategies • In Dense Mtrx Multiply, take the number of physical slots into account when generating splits. Finish the map phase in one wave. Minimize the overhead • Run experiment on real world data, both dense and sparse • More work could be done in sparse

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - PowerPoint PPT Presentation

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content Dense Matrix Multiplication Previous Work Our Approach and Strategy Analysis Experiment Sparse Matrix Multiplication Our Approach Experiment Previous

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Lesson 9 - I can multiply 3 digits by 1 digit Today we will learn to multiply 3 digits by 1

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Benchmarking Sparse Matrix-Vector Multiply In 5 Minutes Hormozd Gahvari, Mark Hoemmen, James

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

Lesson 2- I can multiply by 10 and 100 reasoning and problem solving Recap: Yesterdays

Relate Multiplication to Addition Multiplication Table Activity Multiply by 3 Multiply by 4

Proposed CRA Expansion Map Who is the CRA The CRA Board: The activities and programs offered

MASSDOT GIS Q U I N N M O L L O Y G I S M U N I C I PA L C O O R D I N AT O R MASSDOT GIS

Developing the Most Significant and Suitable Smart City Indicators for Smart City Pilot in

GEN Z KIM JONES, SVP A A Little Ab About ut Me & Willow KIM JONES WILLOW MARKETING

Community Asset Mapping Pilot Project - St Athan Hannah Dineen, Senior Regeneration Officer 5 th

GSCP Task Force Address potential professional conflicts between GIS Professionals

Zoning Ordinance Update Planning Commission December 18, 2013 SCHEDULE Meeting Date Topics

Presenters: Dr. Brittany Wenniseri:iostha Jock and Dr. Treena Wasonti:io Delormier 1 Background

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content - PowerPoint PPT Presentation

Matrix Multiply in Hadoop Botong Huang and You Wu (Will) Content Dense Matrix Multiplication Previous Work Our Approach and Strategy Analysis Experiment Sparse Matrix Multiplication Our Approach Experiment Previous

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Lesson 9 - I can multiply 3 digits by 1 digit Today we will learn to multiply 3 digits by 1

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Benchmarking Sparse Matrix-Vector Multiply In 5 Minutes Hormozd Gahvari, Mark Hoemmen, James

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Extension: Combiner Functions import org.apache.hadoop.io.IntWritable; import

Lesson 2- I can multiply by 10 and 100 reasoning and problem solving Recap: Yesterdays

Relate Multiplication to Addition Multiplication Table Activity Multiply by 3 Multiply by 4

Proposed CRA Expansion Map Who is the CRA The CRA Board: The activities and programs offered

MASSDOT GIS Q U I N N M O L L O Y G I S M U N I C I PA L C O O R D I N AT O R MASSDOT GIS

Developing the Most Significant and Suitable Smart City Indicators for Smart City Pilot in

GEN Z KIM JONES, SVP A A Little Ab About ut Me &amp; Willow KIM JONES WILLOW MARKETING

Community Asset Mapping Pilot Project - St Athan Hannah Dineen, Senior Regeneration Officer 5 th

GSCP Task Force Address potential professional conflicts between GIS Professionals

Zoning Ordinance Update Planning Commission December 18, 2013 SCHEDULE Meeting Date Topics

Presenters: Dr. Brittany Wenniseri:iostha Jock and Dr. Treena Wasonti:io Delormier 1 Background

GEN Z KIM JONES, SVP A A Little Ab About ut Me & Willow KIM JONES WILLOW MARKETING