Wave Computing in the Cloud Bingsheng He Mao Yang Zhenyu Guo - PDF document

Wave Computing in the Cloud Bingsheng He † Mao Yang † Zhenyu Guo † Rishan Chen †‡ Wei Lin † Bing Su † Hongyi Wang † Lidong Zhou † † Microsoft Research Asia ‡ Beijing University A BSTRACT databases and the view matching [15] techniques are particularly effective in identifying common computations We introduce the new Wave model for exposing the tem- or sub-computations across queries and in allowing the poral relationship among the queries in data-intensive results to be reused. distributed computing. The model defines the notion of While leveraging the proven concepts in the fields query series to capture the recurrent nature of batched such as databases is clearly a step in the right direc- computation on periodically updated input streams. This tion, applying those concepts in the current computing seemingly simple concept captures a significant portion environment itself is particularly challenging due to the of the queries we observed in a production system. The inherent complexity and unpredictability in the system. recurring nature of the computation on the same input For example, query optimization in databases hinges on stream opens up surprisingly significant opportunities for a cost model. For a query in our environment, the sys- achieving better performance and higher resource utiliza- tem often has little knowledge about the data being pro- tion. cessed; a query could use custom functions with un- 1 I NTRODUCTION known performance characteristics; a query is often complicated and contains sub-queries, resulting in compu- Recent work on data-intensive distributed computing tation consisting of multiple distributed steps . All those (e.g., MapReduce [4], Dryad [7], and Hadoop [6]) has make a reliable cost model nearly impossible. enabled large-scale data analysis as a query to exe- With challenges also come opportunities. We observe cute in parallel on a large cluster of machines, despite that log data mining has been the original motivation for failures during the computation. While the emergence such data-intensive distributed computing systems and it of high-level languages, such as Sawzall [11], Pig [9], remains a dominant workload in such systems. We there- SCOPE [3], and DryadLINQ [14], has further reduced fore introduce a new Wave model that captures the key programming complexity, the research remains largely properties of log mining. In the Wave computing, we centered on individual queries. In reality, we are fac- model the data not as a static file, but as a stream that ing the challenging system problem of executing a large is periodically updated. The stream is append-only and number of potentially complicated queries on a large partitioned on multiple machines. A segment is the data amount of data every day on a large-scale cluster. Ques- from a single bulk update, e.g., the daily generated log. tions naturally arise: is the system doing a good job of We further define the notion of query series to refer to re- utilizing the resources fully? Is the system executing the current computations on a stream, with each performed queries in a globally optimal way? We have not yet been on one or more stream segments. Query series captures able to answer such basic system questions satisfactorily a sequence of the same computation on different sets or even to define the system goals precisely. of segments of the same stream and explicitly exposes Our experience with a production computing cluster the correlations among the queries in the query series in shows that we are far from reaching the ideal. For ex- terms of both data and computation. ample, in the cluster we investigate, we have seen significant redundancy in computation across queries; that is, This seemingly simple notion of query series brings the same computation is performed multiple times on the predictability into the system, and opens up new re- same data for different queries, resulting in wasted I/O search opportunities by making previously unsolvable and computation. Load imbalance is also evident over problems tractable. For example, with query series, the time with periods of system overload and periods of re- system knows the queries that need to be executed as source under-utilization. Those can be attributed to inad- data streams are updated. Query series makes the occur- equate data and resource management in the system. rence of these queries predictable. This offers flexibility Performance and resource optimization through the in the scheduling decisions: Queries in different query management of data and resources has been studied ex- series might share the same I/O to scan the input data and tensively in databases systems and (distributed) operat- might even share common computation. Those queries ing systems for decades. It is natural for us to look for could be scheduled to run together as a single combined solutions and inspirations from those fields, as proposed query by removing redundancies. Furthermore, query se- by Olston et al. [1, 8]. For example, the notion of views in ries makes the construction of a reliable cost model a 1

Wave Computing in the Cloud Bingsheng He Mao Yang Zhenyu Guo - PDF document

Wave Computing in the Cloud Bingsheng He Mao Yang Zhenyu Guo Rishan Chen Wei Lin Bing Su Hongyi Wang Lidong Zhou Microsoft Research Asia Beijing University A BSTRACT databases and the view matching [15]

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

INSPIRATION Faxton Campus St . Lukes Campus Faxton-St . Lukes Healthcare EDUCATION

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia Joint work with Mao Yang,

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

GENwave What is Google wave? What is a wave? A wave is equal parts conversation and

1 8th Grade Wave Properties 20151028 www.njctl.org 2 Table of Contents: Wave properties

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Update on Local Flooding Issues Since last FAG 27th July 2016 Flooding Advisory Group Meeting

Specialized Turned & Forged Parts Subcontructor COMPANY PROFILE VIMEX S.A. was founded 33

To promote innovative Franco-Chinese teams Objectives: to respond to global, industrial, economic,

THE BROOKINGS INSTITUTION Center for East Asia Policy Studies Emergent uncertainty in regional

NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing Presenter : Dinghan

2020 2018 2019 Schedule of Major Watermaster Initiatives 2018-2020 2018 RMPU 7/2017 - 10/2018

Session 1 Where weve been Chino Basin Watermaster History, role, and successes in

Monthly public programs Programs to community groups Annual candidate forum Attend

Wave Computing in the Cloud Bingsheng He Mao Yang Zhenyu Guo - PDF document

Wave Computing in the Cloud Bingsheng He Mao Yang Zhenyu Guo Rishan Chen Wei Lin Bing Su Hongyi Wang Lidong Zhou Microsoft Research Asia Beijing University A BSTRACT databases and the view matching [15]

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

INSPIRATION Faxton Campus St . Lukes Campus Faxton-St . Lukes Healthcare EDUCATION

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia Joint work with Mao Yang,

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

GENwave What is Google wave? What is a wave? A wave is equal parts conversation and

1 8th Grade Wave Properties 20151028 www.njctl.org 2 Table of Contents: Wave properties

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Update on Local Flooding Issues Since last FAG 27th July 2016 Flooding Advisory Group Meeting

Specialized Turned &amp; Forged Parts Subcontructor COMPANY PROFILE VIMEX S.A. was founded 33

To promote innovative Franco-Chinese teams Objectives: to respond to global, industrial, economic,

THE BROOKINGS INSTITUTION Center for East Asia Policy Studies Emergent uncertainty in regional

NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing Presenter : Dinghan

2020 2018 2019 Schedule of Major Watermaster Initiatives 2018-2020 2018 RMPU 7/2017 - 10/2018

Session 1 Where weve been Chino Basin Watermaster History, role, and successes in

Monthly public programs Programs to community groups Annual candidate forum Attend

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

Specialized Turned & Forged Parts Subcontructor COMPANY PROFILE VIMEX S.A. was founded 33