Building the next Generation of MapReduce Programming Models over - PowerPoint PPT Presentation

Building the next Generation of MapReduce Programming Models over MPI to Fill the Gaps between Data Analytics and Supercomputers Michela Taufer University of Delaware Collaborators: Tao Gao, Boyu Zhang (University of Delaware) Pavan Balaji, Yanfei Guo (Argonne National Laboratory) BingQiang Wang, Yutong Lu (Guangzhou Supercomputer Center) Pietro Cicotti (San Diego Supercomputer Center) Yanjie Wei (Shenzhen Institute of Advanced Technologies)

1 MapReduce Programming Model • MapReduce runtime handles the parallel job execution, communication, and data movement • Users provide map and reduce functions Wordcount example: <Hello,1> <world,1> Hello <Hello, 2> reduce map World Shuffle Hello map reduce <World, 2> <Hello,1> World <world,1>

2 2 WordCount : A Concrete Example Key Value Map (When, 1), ( tweetle , 1 ), (beetles, 1), (fight, 1) When tweetle beetles fight, it’s called a tweetle beetle battle. (it’s, 1), (called, 1), (a, 1), (tweetle, 1), (beetle, 1), (battle, 1) And when they battle in a puddle, (And, 1), (when, 1), (they, 1), (battle, 1), (in, 1), (a, 1), (puddle, 1) it’s a tweetle beetle puddle battle. (it’s, 1), (a, 1), (tweetle, 1), (beetle, 1), (puddle, 1), (battle, 1) And when tweetle beetles battle with paddles in a puddle, (And, 1), (when, 1), (tweetle, 1), (beetles, 1), (battle, 1), (with, 1), (paddles, 1), (in, 1), (a, 1), (puddle, 1) They call it a tweetle beetle puddle paddle battle. (They, 1), (call, 1), (it, 1), (a, 1), (tweetle, 1), (beetle, 1), (puddle, 1), (paddle, 1), (battle, 1)

3 3 WordCount : A Concrete Example Reduce ( tweetle , 1 ), ( tweetle , 1 ), ( tweetle , 1 ), ( tweetle , 1 ), ( tweetle , 5 ) ( tweetle , 1 ) (When, 1) (When, 1) (When, 1), ( tweetle , 1 ), (beetles, 1), (fight, 1) (it’s, 1), (called, 1), (a, 1), ( tweetle , 1 ), (beetle, 1), (battle, 1) (when, 1), (when, 1) (when, 2) (battle, 1), (battle, 1), (battle, 1), (battle, 1), (battle, 5) (And, 1), (when, 1), (they, 1), (battle, 1), (in, 1), (a, 1), (puddle, 1) (battle, 1) (it’s, 1), (a, 1), ( tweetle , 1 ), (beetle, 1), (puddle, 1), (battle, 1) (puddle, 1), (puddle, 1), (puddle, 4) (puddle, 1), (puddle, 1) (beetle, 1), (beetle, 1), (And, 1), (when, 1), ( tweetle , 1 ), (beetles, 1), (battle, 1), (beetle, 3) (beetle, 1) (with, 1), (paddles, 1), (in, 1), (a, 1), (puddle, 1) (They, 1), (call, 1), (it, 1), (a, 1), ( tweetle , 1 ), (beetle, 1), (beetles, 1), (beetles, 1) (beetles, 2) (puddle, 1), (paddle, 1), (battle, 1)

4 Data Generation on HPC Systems From: https://xdmod.ccr.buffalo.edu

5 over MPI Is MapReduce an appealing way to handle big data processing on HPC systems?

6 Data Processing on HPC Systems • Key differences between Cloud computing and HPC systems disenfranchise the naïve used of Cloud method HPC systems Cloud computing systems processor disk processor Interconnect Ethernet disk processor array disk Hadoop/Spark MPI/OpenMP

7 A Fundamentally Correct MapReduce (MR) over MPI • Support logical map-shuffle-reduce workflow in four phases § Map , aggregate , convert , and reduce [1] <key, value> <key, value> <key, list<value>> output input aggregate P0 reduce convert map P1 map convert reduce … … … … … Pn map convert reduce barrier barrier barrier barrier [1] S. J Plimpton and K. D. Devine. MapReduce in MPI for Large-Scale Graph Algorithms. Parallel Computing, 37(9):610–632, 2011.

8 Extra Synchronizations • Aggregate and convert need users to explicit invocate them § Cost: extra synchronizations <key, value> <key, list<value>> output input aggregate P0 reduce convert map P1 map convert reduce … … … … … Pn map convert reduce barrier barrier barrier barrier

9 Extra Data Staging • Aggregate and convert need to store intermediate data § Cost: extra data staging <key, value> <key, list<value>> output input aggregate P0 reduce convert map P1 map convert reduce … … … … … Pn map convert reduce

10 Extra Memory Usage and Poor Data Management • Zoom in map / aggregate operations <key, value> <key, list<value>> output input aggregate P0 reduce convert map P1 map convert reduce … … … … … Pn map convert reduce barrier barrier barrier barrier zoom in

11 Extra Memory Usage and Poor Data Management • Allocation additional memory buffers for metadata § Cost: extra memory use • If in-memory buffer full à Spill data to the disk Static § Cost: poor data management Allocation staging area staging area send buffer receive buffer map P0 send buffer receive buffer map P1

12 Tackling Shortcomings of a Correct MR Model Shortcomings: extra synchronizations, extra data staging, extra memory use, and poor data management A journey to design and implement Mimir, an efficient MR over MPI framework • Memory inefficiency • Load balancing issues • I/O variability

13 Impact: Out-of-memory Operations • Existing MapReduce over MPI implementations still struggle with memory limits Can process only 4GB data in- memory on a 128GB node Out-of-memory processing Single-node execution time of WordCount (Wikipedia) with MR- MPI on Comet ( 128G memory ) [1] [1] T. Gao, Y. Guo, B. Zhang, P. Cicotti, Y. Lu, P. Balaji, and M. Taufer. Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems. In Proceedings of the IPDPS, 2017.

14 Reduce Synchronization and Extra Data Staging • Interleave operations: e.g., map interleaves with aggregate interleave <key,value> <key,list<value>> <key, value> input output aggregate P0 convert reduce map P1 map convert reduce … … … … … Pn map convert reduce barrier barrier barrier barrier Improvements: 1. Reduce synchronization; 2. Reduce extra data staging

15 Reduce Synchronization and Extra Data Staging • Interleave operations: e.g., map interleaves with aggregate interleave <key,value> <key,list<value>> <key, value> input output aggregate P0 convert reduce map P1 map convert reduce … … … … … Pn map convert reduce barrier barrier barrier

16 Optimizing Intermediate Data Management • Use send buffer as output of map directly § Avoid extra buffers usage • Use KV/KMV container as staging area Dynamic § Dynamically allocate one or multiple pages Allocation KV container Improvements: 3. Avoid extra map P0 memory buffer usage 4. Manage intermediate data more efficiently P1 map

17 Mimir vs. MR-MPI: WordCount on Comet • Single-node execution (24 processes, 128G memory) § Benchmarks: WC with Wikipedia dataset § Settings: MR-MPI (64M page and 512M page); Mimir (64M page) 4X 64X Mimir can handle 4X larger dataset [1] T. Gao, Y. Guo, B. Zhang, P. Cicotti, Y. Lu, P. Balaji, and M. Taufer. Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems. In Proceedings of the IPDPS, 2017.

18 Impact: Load Imbalance • <key,value> pairs are NOT distributed evenly among processes § Imbalanced <key,value> pairs may cause poor resource usage Number of <key,value> pairs Total time (sec) Mimir Process ids Execution time of WordCount (Wikipedia) with Number of <key,value> pairs for the Mimir on Tianhe-2 without load balancing WordCount (Wikipedia) on 768 processes

19 Impact: Load Imbalance • <key,value> pairs are NOT distributed evenly among processes § Imbalanced <key,value> pairs may cause poor resource usage Number of <key,value> pairs Balance ratio Mimir Process ids Balance ratio (Max mem / min mem for all Number of <key,value> pairs for the processes) for WordCount (Wikipedia) on Tianhe-2 WordCount (Wikipedia) on 768 processes

20 Combining <key,value> Pairs • Combiner operations: interleave <key, value> § Merge <key,value> pairs input with the same key before aggregate P0 shuffle map § Merge <key,value> pairs combine combine with the same key after P1 map shuffle • Application dependent: … … combine combine § Wordcount à YES Pn map § Join à NO combine combine

21 Combining <key,value> Pairs • M erge <key,value> pairs with the same key before shuffle • Merge <key,value> pairs with the same key after shuffle Wordcount example: <Hello,1> Hello <World,1> <Hello, 3> World <World,2> <Hello,2> reduce map Hi <Hi,1> World <World,1> <Hello,1> Hello Shuffle reduce map <World, 3> World <World,1> reduce <Hi, 1> <Hello,1> Hello map

22 Combiner Results: WordCount on Tianhe-2 1e8 Number of KV pairs Total time (sec) Mimir Mimir + combiner Memory usage (GB) Mimir Mimir Balance ratio Mimir + combiner Mimir + combiner WC T. Gao, Y. Guo, B. Zhang, P. Cicotti, Y. Lu, P. Balaji, and M. Taufer. Skew Mitigation in MapReduce for Supercomputing Systems. In preparation, 2017.

Building the next Generation of MapReduce Programming Models over - PowerPoint PPT Presentation

Building the next Generation of MapReduce Programming Models over MPI to Fill the Gaps between Data Analytics and Supercomputers Michela Taufer University of Delaware Collaborators: Tao Gao, Boyu Zhang (University of Delaware) Pavan Balaji,

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

MapReduce and its use for indexing The Programming Model and Practice Enrique Alfonseca

MapReduce and Frequent Itemsets Mining Yang Wang 1 MapReduce (Hadoop) Programming model

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Junjie Hu 1 Introduction Current practice

Flow Analysis Using MapReduce Strengths and Limitations Markus De Shon Sr. Security Engineer

Design Patterns for Efficient Graph Algorithms in MapReduce Algorithms in MapReduce Jimmy Lin and

Counting Triangles and Modeling MapReduce Siddharth Suri Yahoo! Research Outline 2 Modeling

DAPHNE in Run 2B Rory Fitzpatrick, Matt Toups ICEBERG PD Meeting September 30, 2019 1

Felix Hutchison Milda Zizyte Game physics is hard Even when your physics engine is good.

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Formalizing Turing Machines Andrea Asperti & Wilmer Ricciotti Department of Computer Science,

Sprint Canoe Diane Tam UDLS - Nov 2, 2012 1 2 What is sprint canoe? Olympic sport since

SVN repository for sources Source code for the Lab is available via SVN:

Research @ Vicarious AI: toward data efficiency, task generality and conceptual understanding

Physical Simula-on CS 2501 Intro to Game Programming and

Building the next Generation of MapReduce Programming Models over - PowerPoint PPT Presentation

Building the next Generation of MapReduce Programming Models over MPI to Fill the Gaps between Data Analytics and Supercomputers Michela Taufer University of Delaware Collaborators: Tao Gao, Boyu Zhang (University of Delaware) Pavan Balaji,

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

MapReduce 320302 Databases &amp; Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data &amp; Cloud Services (P. Baumann) 1 Overview MapReduce : the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

MapReduce and its use for indexing The Programming Model and Practice Enrique Alfonseca

MapReduce and Frequent Itemsets Mining Yang Wang 1 MapReduce (Hadoop) Programming model

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Junjie Hu 1 Introduction Current practice

Flow Analysis Using MapReduce Strengths and Limitations Markus De Shon Sr. Security Engineer

Design Patterns for Efficient Graph Algorithms in MapReduce Algorithms in MapReduce Jimmy Lin and

Counting Triangles and Modeling MapReduce Siddharth Suri Yahoo! Research Outline 2 Modeling

DAPHNE in Run 2B Rory Fitzpatrick, Matt Toups ICEBERG PD Meeting September 30, 2019 1

Felix Hutchison Milda Zizyte Game physics is hard Even when your physics engine is good.

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

Formalizing Turing Machines Andrea Asperti &amp; Wilmer Ricciotti Department of Computer Science,

Sprint Canoe Diane Tam UDLS - Nov 2, 2012 1 2 What is sprint canoe? Olympic sport since

SVN repository for sources Source code for the Lab is available via SVN:

Research @ Vicarious AI: toward data efficiency, task generality and conceptual understanding

Physical Simula-on CS 2501 Intro to Game Programming and

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the

Formalizing Turing Machines Andrea Asperti & Wilmer Ricciotti Department of Computer Science,