Introduction The volume of information is increasing as evolving - PowerPoint PPT Presentation

HAMA : An Efficient Matrix Computation with the MapReduce Framework IEEE CLOUDCOM 2010 Workshop, Indianapolis, USA Sangwon Seo , Computer Architecture Lab, KAIST, 1 Dec. 2010

Introduction  The volume of information is increasing as evolving modern science  Data-intensive processing is required  MapReduce is one of the data-intensive programming model  Massive matrix/graph computations are often used as primary functionalities  Thus, we provide easy-of-use tool for data- intensive scientific computation known as HAMA 2/19

Apache HAMA  Now incubating project in ASF  Fundamental design is changed from MapReduce with matrix computation to BSP with graph processing  Mimic of Pregel running on HDFS  Use zookeeper as a synchronization barrier  Officially sponsored by Ahems Inc. 3/19

Our Focus  This paper is a story about previous version of HAMA  Only focus on matrix computation with MapReduce  Shows simple case studies 4/19

The HAMA Architecture  We propose distributed scientific framework called HAMA (based on HPMR)  Provide transparent matrix/graph primitives HAMA API HAMA Core HAMA Shell Computation Engine MapReduce / BSP Dryad (Plugged In/Out) HPMR Zookeeper Distributed Locking HBase Storage Systems HDFS RDBMS File 5/19

Case Study  With case study approach, we introduce two basic primitives with MapReduce model running on HAMA  matrix multiplication and finding linear solution  And compare with MPI versions of these primitives 6/19

Case Study  Representing matrices  As a default, HAMA use HBase (No-SQL database)  HBase is modeled after Google’s Bigtable  Column oriented, semi-structured distributed database with high scalability 7/19

Case Study – Multiplication (1/3)  Iterative approach (Algorithm) INPUT: key, /* the row index of B * / value, /* the column vector of the row * / context /* IO interface (HBase) * / void map (ImmutableBytesWritable key, Result value, Context context) { double ij-th = currVector.get(key); SparseVector mult /* Multiplication * /= new SparseVector(value).scale(ij-th); context.write(nKey, mult.getEntries()); } INPUT: key, /* key by map task * / value, /* value by map task * / context /* IO interface (HBase) * / void reduce (IntWritable key, Iterable< MapWritable> values, Context context) { SparseVector sum = new SparseVector(); for (MapWritable value : values) { sum.add(new SparseVector(value)); } } 8/19

Case Study – Multiplication (2/3)  Block approach  Minimize data movement (network cost) < reduce>  C_block(1,1) = A_block(1,1)* B_block(1,1) + A_block(1,2)* B_block(2,1) < map> < map> 9/19

Case Study – Multiplication (3/3)  Block approach (Algorithm) INPUT: key, /* the row index of B * / value, /* the column vector of the row * / context /* IO interface (HBase) * / void map (ImmutableBytesWritable key, Result value, Context context) { SubMatrix a = new SubMatrix(value,0); SubMatrix b = new SubMatrix(value,1); SubMatrix c = a.mult(b); /* In-memory * / context.write(new BlockID(key.get()), new BytesWritable(c.getBytes())); } INPUT: key, /* key by map task * / value, /* value by map task * / context /* IO interface (HBase) * / void reduce (BlockID key, Iterable< BytesWritable> values, Context context) { SubMatrix s = null; for (BytesWritable value : values) { SubMatrix b = new SubMatrix(value); if (s = = null) { s = b; } else { s = s.add(b);} } context.write(...); } 10/19

Case Study - Finding linear solution  Finding linear solution  Cramer’s rule  Conjugate Gradient Method 11/19

Case Study - Finding linear solution  Cramer’s rule Parallel task j from 1 to n : Input splits MapReducers for det b j From HBase = CramerReducer x j = (the output result) MapReducer for det A CramerMapper 12/19

Case Study - Finding linear solution  Conjugate Gradient Method  Find a direction (conjugate direction)  Find a step size (Line search) 13/19

Case Study - Finding linear solution  Conjugate Gradient Method (Algorithm) /* Using nested-map interface * / void map (ImmutableBytesWritable key, Result value, Context context) { /* For line search * / g = g.add(-1.0, mult(x).getRow(0)); alpha_new = g.transpose().mult(d) / d.transpose().mult(q); /* Find the conjugate direction * / d = g.mult(-1).add(d.mult(alpha)); q = A.mult(d); alpha = g.transpose().mult(d) / d.transpose().mult(q); /* Update x with gradient(alpha) * / x = x.add(d.mult(alpha)); /* Termination check method, such that length of direction is sufficiently small or x is converged into fixed value * / if (checkTermination(d, x.getRow(0))) { /* Pass the solution (x) to reducer * / context.write(new BlockID(key.get()), new BytesWritable(x.getBytes())); } context.write(new BlockID(key.get()), null); } 14/19

Evaluations  TUSCI (TU Berlin SCI) Cluster  16 nodes, two Intel P4 Xeon processors, 1GB memory  Connected with SCI (Scalable Coherent Interface) network interface in a 2D torus topology  Running in OpenCCS (similar environment of HOD)  Test sets Workload HAMA MPI Matrix Multiplication Hadoop HPMR CXML (HP) Conjugate Gradient (CG) Hadoop HPMR CXML (HP) 15/19

Evaluations  The comparison of average execution time and scaleup with Matrix Multiplication scaleup= log(T(dimension) / T(500)) 16/19

Evaluations  The comparison of average execution time and scaleup with CG Scaleup = log(T(dimension) / T(500)) 17/19

Evaluations  The comparison of average execution time with CG, when a single node is overloaded 18/19

Conclusion  HAMA provides the easy-of-use tool for data- intensive computations  Matrix computation with MapReduce  Graph computation with BSP  We are going to provide the HAMA package as a SaaS in a cloud platform 19/19

Q & A 20/19

Introduction The volume of information is increasing as evolving - PowerPoint PPT Presentation

HAMA : An Efficient Matrix Computation with the MapReduce Framework IEEE CLOUDCOM 2010 Workshop, Indianapolis, USA Sangwon Seo , Computer Architecture Lab, KAIST, 1 Dec. 2010 Introduction The volume of information is increasing as evolving

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Using the Theseus Plan Execution System Greg Barish CS 548 Feb 1 st , 2005 Outline of talk

3/16/15 1 2 1 3/16/15 3 4 2 3/16/15 5 6 3 3/16/15 7

A New Tool for Visualizing Blaise Logic Jason Ostergren, Rhonda Ash 15th International Blaise

GRANT PROPOSALS 2020 We hope you are all well and that you, your loved ones, and colleagues are

Using M icrosoft PivotViewer to M ake Sense of the Chaos M ax Slade Principal Test M anager M

Overview Overview Look at current state of high performance computing Top500 data for

DEFENSE LOGISTICS AGENCY AMERICA S COMBAT LOGISTICS SUPPORT AGENCY Doing Business with DLA

A Parallel Numerical Library for UPC Jorge Gonzlez-Domnguez 1 *, Mara J. Martn 1 ,

Introduction The volume of information is increasing as evolving - PowerPoint PPT Presentation

HAMA : An Efficient Matrix Computation with the MapReduce Framework IEEE CLOUDCOM 2010 Workshop, Indianapolis, USA Sangwon Seo , Computer Architecture Lab, KAIST, 1 Dec. 2010 Introduction The volume of information is increasing as evolving

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Using the Theseus Plan Execution System Greg Barish CS 548 Feb 1 st , 2005 Outline of talk

3/16/15 1 2 1 3/16/15 3 4 2 3/16/15 5 6 3 3/16/15 7

A New Tool for Visualizing Blaise Logic Jason Ostergren, Rhonda Ash 15th International Blaise

GRANT PROPOSALS 2020 We hope you are all well and that you, your loved ones, and colleagues are

Using M icrosoft PivotViewer to M ake Sense of the Chaos M ax Slade Principal Test M anager M

Overview Overview Look at current state of high performance computing Top500 data for

DEFENSE LOGISTICS AGENCY AMERICA S COMBAT LOGISTICS SUPPORT AGENCY Doing Business with DLA

A Parallel Numerical Library for UPC Jorge Gonzlez-Domnguez 1 *, Mara J. Martn 1 ,

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview