Nap : Network-Aware Data Partitions September 26, 2019 for - PowerPoint PPT Presentation

Nap 2019-09-25 Nap : Network-Aware Data Partitions for Efficient Distributed Processing Mr. Or Raz , Prof. Chen Avin, Prof. Stefan Schmid School of Electrical and Computer Engineering Faculty of Computer Science Ben-Gurion University of the Negev University of Vienna Beer-Sheva, Israel Vienna, Austria Nap : Network-Aware Data Partitions September 26, 2019 for Efficient Distributed Processing Hello everyone, my name is Or Raz, I am a Master graduate from the school of Electrical and Computer Engineering in Ben-Gurion University of the Negev, Israel. This research has been done with the support of Mr. Or Raz , Prof. Chen Avin, Prof. Stefan Schmid Professors Chen Avin and Stefan Schmid, and my Thesis is mainly about this work. Today, I will talk about Nap, a scheme that takes the network into consideration when partitioning the data, and therefore minimizes the School of Electrical and Computer Engineering Faculty of Computer Science Ben-Gurion University of the Negev University of Vienna completion time in distributed processing frameworks, such as Hadoop. Beer-Sheva, Israel Vienna, Austria September 26, 2019

Introduction and Motivation Model and Problem Nap Implementation and Conclusion Nap Outline 2019-09-25 Introduction and Motivation 1 Introduction and Motivation 2 Model and Problem Outline 3 Nap Outline 4 Proof-of-Concept and Conclusion First, I introduce the motivation in general, then with a join example, and I will give some empirical motivation. Introduction and Motivation 1 Next, I cover the model for the problem and the problem itself. Then, I go over what is Nap scheme with it’s relation to Young Lattice. Model and Problem In the end I go over the implementation, it’s difficulties and introduce 2 some points for future work. Nap 3 Proof-of-Concept and Conclusion 4 O. Raz (BGU - ECE) September 26, 2019 1 / 18 Nap

Introduction and Motivation Model and Problem Nap Implementation and Conclusion Nap Introduction 2019-09-25 Introduction and Motivation Nowadays, we are living in the Big Data era. Data is processed and stored in geographically distributed datacenters. Traditional query optimizations neglect the network . Introduction Introduction • The amount of data queried and processed by emerging applications is growing explosively (in many fileds such as health, business, and science). Nowadays, we are living in the Big Data era. • Traditionally, data processing frameworks were designed to run in Data is processed and stored in geographically distributed datacenters. Homogeneous environments or within a single datacenter, but today it is less Traditional query optimizations neglect the network . common with more Geographically distributed processing. • Because the scale of data and the data itself is generated in a geographically distributed fashion (IOT). • Therefore, to maximize performance, we need to consider the available network resources which has been neglected in the optimization analysis, otherwise we could have a poor performance (wide-area analytics). O. Raz (BGU - ECE) September 26, 2019 2 / 18 Nap

Introduction and Motivation Model and Problem Nap Implementation and Conclusion Nap Introduction 2019-09-25 Introduction and Motivation Nowadays, we are living in the Big Data era. Data is processed and stored in geographically distributed datacenters. Traditional query optimizations neglect the network . Introduction Introduction Contribution Nap, a network-aware and adaptive mechanism for fast large scale data processing based on MapReduce, such as joins. • Our contribution is Nap, a mechanism which minimizes the completion time in a network-aware manner and is optimized to the current network Nowadays, we are living in the Big Data era. conditions. In addition, it doesn’t require any logic modifications where it Data is processed and stored in geographically distributed datacenters. only fools the application for a better partitioning of the data. Traditional query optimizations neglect the network . • We are particularly interested in workloads based on relational databases and consider the most fundamental operation in distributed data processing: Contribution joins. Nap, a network-aware and adaptive mechanism for fast large scale data processing based on MapReduce, such as joins. O. Raz (BGU - ECE) September 26, 2019 2 / 18 Nap

Introduction and Motivation Model and Problem Nap Implementation and Conclusion Nap Multiway Join 2019-09-25 Introduction and Motivation ACM Tables Example Consider a small database of Papers, Papers-Authors, and Authors that we want to join them, X ( v , p ) ⊲ ⊳ Y ( p , a ) ⊲ ⊳ Z ( a , n ). Multiway Join X (v,p) Y (p,a) Z (a,Name) Venue Paper Paper Author Author Name Multiway Join SIGMOD SkewTune MapReduce 1 1 J. Dean EuroSys Riffle MapReduce 2 7 Y.Kwon OSDI MapReduce HaLoop 5 4 H. Zhang S2RDF 3 8 D. Ullman Riffle 4 2 S. Ghemawat Kraken 6 ACM Tables Example First, lets take a look on these three tables that has two joint attributes, Consider a small database of Papers, Papers-Authors, and Authors that we p and a . want to join them, X ( v , p ) ⊲ ⊳ Y ( p , a ) ⊲ ⊳ Z ( a , n ). X (v,p) Y (p,a) Z (a,Name) Venue Paper Paper Author Author Name SIGMOD SkewTune MapReduce 1 1 J. Dean EuroSys Riffle MapReduce 2 7 Y.Kwon OSDI MapReduce HaLoop 5 4 H. Zhang S2RDF 3 8 D. Ullman Riffle 4 2 S. Ghemawat Kraken 6 O. Raz (BGU - ECE) September 26, 2019 3 / 18 Nap

Introduction and Motivation Model and Problem Nap Implementation and Conclusion Nap Multiway Join 2019-09-25 Introduction and Motivation ACM Tables Example Consider a small database of Papers, Papers-Authors, and Authors that we want to join them, X ( v , p ) ⊲ ⊳ Y ( p , a ) ⊲ ⊳ Z ( a , n ). Multiway Join X (v,p) Y (p,a) Z (a,Name) Venue Paper Paper Author Author Name Multiway Join SIGMOD SkewTune MapReduce 1 1 J. Dean EuroSys Riffle MapReduce 2 7 Y.Kwon OSDI MapReduce HaLoop 5 4 H. Zhang S2RDF 3 8 D. Ullman Riffle 4 2 S. Ghemawat Kraken 6 ACM Tables Example We consider an operation which joins all of these tables, X ( v , p ) ⊲ ⊳ Y ( p , a ) ⊲ ⊳ Z ( a , n ) where ⊲ ⊳ denotes the join operator. Consider a small database of Papers, Papers-Authors, and Authors that we Attributes: v - the Venue, p - the Paper ID, a - the Author ID, and n - want to join them, X ( v , p ) ⊲ ⊳ Y ( p , a ) ⊲ ⊳ Z ( a , n ). the Author name. X (v,p) Y (p,a) Z (a,Name) Venue Paper Paper Author Author Name SIGMOD SkewTune MapReduce 1 1 J. Dean EuroSys Riffle MapReduce 2 7 Y.Kwon OSDI MapReduce HaLoop 5 4 H. Zhang S2RDF 3 8 D. Ullman Riffle 4 2 S. Ghemawat Kraken 6 O. Raz (BGU - ECE) September 26, 2019 3 / 18 Nap

Nap : Network-Aware Data Partitions September 26, 2019 for - PowerPoint PPT Presentation

Nap 2019-09-25 Nap : Network-Aware Data Partitions for Efficient Distributed Processing Mr. Or Raz , Prof. Chen Avin, Prof. Stefan Schmid School of Electrical and Computer Engineering Faculty of Computer Science Ben-Gurion University of the

Noninsured Crop Disaster Assistance Program (NAP) Overview NAP provides financial assistance to

Power of the nap addicts Public furniture Power of the nap addicts Public furniture LOCAL The

Noncrossing partitions, interval partitions and the Bruhat order Philippe Biane CNRS, IGM,

Chapter 5: Integer Compositions and Partitions and Set Partitions Prof. Tesler Math 184A

Chapter 5: Integer Compositions and Partitions and Set Partitions Prof. Tesler Math 184A Fall

Relations among partitions. I: Partitions of a finite set R. A. Bailey University of St Andrews

Relations among partitions. III: Some structures with three or four partitions R. A. Bailey

Neutral Public eXchange Point [NPXP] IIX-NAP and ARIX-NAP Program Information-Internet Society of

NAP in Jordan Marseille October 17-18, 2016 Outline.. Introduction NAP development

To use the Open Data Community to promote/increase the use of the NAP data Lars-Olof Hjrp

2018-02-27 6. Learning Partitions of a Set How to use set partitions? Also known as clustering!

Relations among partitions. II: Adjusting for one partition R. A. Bailey University of St

ADDITION AND COUNTING: THE ARITHMETIC OF PARTITIONS Scott Ahlgren and Ken Ono At first glance the

Optimal partitions of finite sets: a report on unfinished work in progress PJC60 at Ambleside:

Sudans National Adaptation Plan (NAP) Formulating NAPS for an integrated, risk-informed

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

!"#$% Pascal Felber, Raluca Halalai, Lorenzo Leonini, Etienne Rivire, Valerio Schiavoni,

Application-Aware Self-Optimization of Wireless Mesh Networks with AquareYoum and DES-SERT

COMMON: Coordinated Multi layer Multi domain Optical Network Framework for Large scale

Firewalls, IPsec and Linux by Harald Welte <laforge@netfilter.org> Firewalls, IPsec and

1 Fairness Evaluating Fairness First, need to define what is a fair allocation. Consider n

Security Analysis & Threat Models Dawn Song Logistics Sessions You can go to any

Tutorial: Traffic of Online Games Jose Saldana & Mirko Suznjevic IETF 87, Berlin, August 1 st

Circuit-Switched Coherence Natalie Enright Jerger* , Li-Shiuan Peh + , Mikko Lipasti* *

Nap : Network-Aware Data Partitions September 26, 2019 for - PowerPoint PPT Presentation

Nap 2019-09-25 Nap : Network-Aware Data Partitions for Efficient Distributed Processing Mr. Or Raz , Prof. Chen Avin, Prof. Stefan Schmid School of Electrical and Computer Engineering Faculty of Computer Science Ben-Gurion University of the

Noninsured Crop Disaster Assistance Program (NAP) Overview NAP provides financial assistance to

Power of the nap addicts Public furniture Power of the nap addicts Public furniture LOCAL The

Noncrossing partitions, interval partitions and the Bruhat order Philippe Biane CNRS, IGM,

Chapter 5: Integer Compositions and Partitions and Set Partitions Prof. Tesler Math 184A

Chapter 5: Integer Compositions and Partitions and Set Partitions Prof. Tesler Math 184A Fall

Relations among partitions. I: Partitions of a finite set R. A. Bailey University of St Andrews

Relations among partitions. III: Some structures with three or four partitions R. A. Bailey

Neutral Public eXchange Point [NPXP] IIX-NAP and ARIX-NAP Program Information-Internet Society of

NAP in Jordan Marseille October 17-18, 2016 Outline.. Introduction NAP development

To use the Open Data Community to promote/increase the use of the NAP data Lars-Olof Hjrp

2018-02-27 6. Learning Partitions of a Set How to use set partitions? Also known as clustering!

Relations among partitions. II: Adjusting for one partition R. A. Bailey University of St

ADDITION AND COUNTING: THE ARITHMETIC OF PARTITIONS Scott Ahlgren and Ken Ono At first glance the

Optimal partitions of finite sets: a report on unfinished work in progress PJC60 at Ambleside:

Sudans National Adaptation Plan (NAP) Formulating NAPS for an integrated, risk-informed

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

!&quot;#$% Pascal Felber, Raluca Halalai, Lorenzo Leonini, Etienne Rivire, Valerio Schiavoni,

Application-Aware Self-Optimization of Wireless Mesh Networks with AquareYoum and DES-SERT

COMMON: Coordinated Multi layer Multi domain Optical Network Framework for Large scale

Firewalls, IPsec and Linux by Harald Welte &lt;laforge@netfilter.org&gt; Firewalls, IPsec and

1 Fairness Evaluating Fairness First, need to define what is a fair allocation. Consider n

Security Analysis &amp; Threat Models Dawn Song Logistics Sessions You can go to any

Tutorial: Traffic of Online Games Jose Saldana &amp; Mirko Suznjevic IETF 87, Berlin, August 1 st

Circuit-Switched Coherence Natalie Enright Jerger* , Li-Shiuan Peh + , Mikko Lipasti* *

!"#$% Pascal Felber, Raluca Halalai, Lorenzo Leonini, Etienne Rivire, Valerio Schiavoni,

Firewalls, IPsec and Linux by Harald Welte <laforge@netfilter.org> Firewalls, IPsec and

Security Analysis & Threat Models Dawn Song Logistics Sessions You can go to any

Tutorial: Traffic of Online Games Jose Saldana & Mirko Suznjevic IETF 87, Berlin, August 1 st