Mrs: High Performance MapReduce for Iterative and Asynchronous - PowerPoint PPT Presentation

Mrs. Iterative MapReduce Performance and Case Studies Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python Jeff Lund , Chace Ashcraft, Andrew McNabb and Kevin Seppi Brigham Young University November 14, 2016

Mrs. Iterative MapReduce Performance and Case Studies What is Mrs? Simple and easy to use MapReduce framework Implemented in pure Python Designed with scientific computing in mind

Mrs. Iterative MapReduce Performance and Case Studies MapReduce Input Map Input Map Reduce Input Map Reduce Input Map Reduce Input Map

Mrs. Iterative MapReduce Performance and Case Studies Example: WordCount wordcount.py import mrs class WordCount(mrs.MapReduce): def map (self, line num, line text): for word in line text.split(): yield (word, 1) def reduce (self, word, counts): yield sum (counts) name == ’ main ’: if mrs.main(WordCount)

Mrs. Iterative MapReduce Performance and Case Studies Why Python? Python is nearly ubiquitous Mrs needs no dependencies outside of standard library Familiarity and readability Easy interoperability Debugging and testing

Mrs. Iterative MapReduce Performance and Case Studies Iterative MapReduce Input Map Reduce Map Reduce Input Map Reduce Map Reduce · · · Input Map Reduce Map Reduce Input Map Reduce Map Reduce Performance Challenges: CPU bound problems Communication time Task Management

Mrs. Iterative MapReduce Performance and Case Studies Proposed Solutions Infrequent Checkpointing Reduce-Map task Generator-Callback Model Asynchronous Scheduling Model

Mrs. Iterative MapReduce Performance and Case Studies How Often to Checkpoint Let X be a random variable indicating a failure occurred during an iteration, then � 1 �� t + c � X ∼ Bernoulli f n n: Number of iterations between checkpoints t: Time to perform each iteration c: Extra time required for a checkpointed iteration f: Failures in a cluster

Mrs. Iterative MapReduce Performance and Case Studies How Often to Checkpoint If Y ∼ Uniform ( n ) indicates the number of iterations since last checkpoint then the expected value of the number of seconds of extra work in an iteration is: E [ X ( r + Yt )] = 1 t + c r + n � � � � 2 t f n and the breakeven number of iterations is � �� c �� 1 , 1 � 2 � c n = max 2 + r − 2 c ( r − f ) − 2 + r . t

Mrs. Iterative MapReduce Performance and Case Studies Iterative MapReduce: ReduceMap Input Map Reduce Map Reduce Input Map Reduce Map Reduce · · · Input Map Reduce Map Reduce Input Map Reduce Map Reduce Input Map ReduceMap ReduceMap Input Map ReduceMap ReduceMap · · · Input Map ReduceMap ReduceMap Input Map ReduceMap ReduceMap

Mrs. Iterative MapReduce Performance and Case Studies Generator-Callback Model def run batches(): data path = input path for iteration in range(MAX ITERATIONS): output path = make temp path() job = new job(data path, map func, reduce func, output path) job.wait for completion() data path = output path if iteration % CHECK FREQUENCY == 0: data = read all(data path) perform output(data) if converged(data): break

Mrs. Iterative MapReduce Performance and Case Studies Generator-Callback Model def generator(queue): dataset = input data for iteration in range(MAX ITERATIONS): output path = make temp path() dataset = mapreduce(dataset, map func, reduce func, output path) if iteration % CHECK FREQUENCY == 0: queue.submit(dataset, callback) else: queue.submit(dataset, None) def callback(data): data.read all() perform output(data) return !converged(data)

Mrs. Iterative MapReduce Performance and Case Studies Task Dependencies: Synchronous MapReduce

Mrs. Iterative MapReduce Performance and Case Studies Task Dependencies: Asynchronous MapReduce

Mrs. Iterative MapReduce Performance and Case Studies Task Execution Traces Synchronous: Asynchronous:

Mrs. Iterative MapReduce Performance and Case Studies Performance and Case Studies We demonstrate on two different problems: Particle Swarm Optimization Minimize 250 degree Rosenbrock function Expectation Maximization Mixture of Multinomials model in the context of clustering text documents

Mrs. Iterative MapReduce Performance and Case Studies Particle Swarm Optimization 40 Inspired by simulations of flocking birds 30 Particles interact while exploring 20 Map: motion and function evaluation 10 Reduce: communication CPU bound problem 0 0 2 4 6 8 10

Mrs. Iterative MapReduce Performance and Case Studies Particle Swarm Optimization 1 Reduce-map tasks Rare checks 0 . 8 Parallel Efficiency Concurrent checks No redundant storage 0 . 6 Redundant storage 0 . 4 0 . 2 0 10 0 10 1 10 2 10 3 Number of subiterations

Mrs. Iterative MapReduce Performance and Case Studies Particle Swarm Optimization: Asynchronous 140 Average Tasks per Second 120 100 80 60 40 Asynchronous 20 Synchronous 0 0 5 10 15 20 Standard deviation of subiterations

Mrs. Iterative MapReduce Performance and Case Studies Particle Swarm Optimization: Asynchronous 80 Average Tasks per Second 60 40 20 Synchronous Asynchronous 0 16 64 128 256 512 768 Number of Processors

Mrs. Iterative MapReduce Performance and Case Studies Expectation Maximization Feature Set Size 80 252 8000 25298 Reduce-map tasks 0.411 0.357 0.277 0.193 Rare checks 0.362 0.314 0.253 0.18 Redundant storage 0.013 0.013 0.013 0.012 Parallel efficiency per iteration of EM for various feature set sizes.

Mrs. Iterative MapReduce Performance and Case Studies Conclusion By taking the following approaches, we have considerably improved performance for iterative parallel algorithms in Mrs: Infrequent Checkpointing Reduce-Map Task Generator-Callback Model Asynchronous Model

Where to find Mrs Mrs Homepage with links to source, documentation, mailing list, etc: https://github.com/byu-aml-lab/mrs-mapreduce In case you forget the url, just google “mrs mapreduce” :)

Mrs: High Performance MapReduce for Iterative and Asynchronous - PowerPoint PPT Presentation

Mrs. Iterative MapReduce Performance and Case Studies Mrs: High Performance MapReduce for Iterative and Asynchronous Algorithms in Python Jeff Lund , Chace Ashcraft, Andrew McNabb and Kevin Seppi Brigham Young University November 14, 2016

Hello Friends!! Mrs. Wren (with Bailey) Mrs. Merrill (with Kobe and Zoe) Mrs. McGarry Mrs.

Our presentation will begin soon. Mrs. Hevia Mrs. Andersen Mrs. McEntee Sra. Auon Mrs. Reed

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Back To School Second Grade 2018- 2019 Welcome to Second Grade! Mrs. Stumpfl Mrs. Keats Mrs.

Work hard to be your very best! Mrs Steyn Mr Evans Mrs Robbins Y4C Y4W Y4B Mrs Wall Mrs

Mrs. Tucker Mrs. Thompson Mrs. Doe Mrs. Gutierrez Mrs. Gutierrez Fun Facts: This is my 21st

History History Department Mrs Conway, Ms Hagan, Mrs Ireland, Mrs McCabe, Mrs Mole, Mrs

Class of 2021 Mrs. Chancey Co-F Department Chair Mrs. Figarella K-M Mrs. Norgan Se-Z Mr.

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Welcome to Reception Special Educational Needs and Inclusion Learning Support Team Mrs Frostick

WELCOME TO YEAR 6 INFORMATION FOR PARENTS AND PUPILS IN Y6 The Y6 team are: Mr Tolman Y6

Kindergarten Parent Orientation 2017-2018 Kindergarten T eam Mrs. Thomas Mrs. Mackey Mrs.

Mrs. Jennifer E. Kim (Choi) Mrs. April Sommer (Clark) Mrs. Helen Gray Mrs. Julianne Smith (Tela)

Welcome Meet the Year 2 Team 2A Mrs.Wasway TA Mrs. Melloy TA - Miss. Fendt 2B Mrs

Curriculum Presentation Wednesday 18 th September 2019 Mrs Heaney Mrs McGrath Mrs Pini

Particle Swarm Optimization for Voltage Stability Analysis Dinesh Rangana Gurusinghe University

Emergent Optimization: Design and Applications in Telecommunications and Bioinformatics PhD

Search and Machine Learning Kalyan Veeramachaneni, Jason Ansel, Shoaib Kamil, Jeffrey Bosboom,

Forecasting Non-Stationary Time Series without Recurrent Connections AP Engelbrecht Department

L101: Optimization fundamentals Previous lecture Logistic regression parameter learning:

Globln optimalizace Evolutionary optimization: antenna 1 . 0 ,

Milu: A Higher Order Mutation Testing Tool Yue Jia University College London Joint work with

Detecting multivariate outliers using projection pursuit with particle swarm optimization Anne