Wave Computing in the Cloud Bingsheng He Microsoft Research Asia - PowerPoint PPT Presentation

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia Joint work with Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, Lidong Zhou 5/18/2009 1

My Dream Wave Computing 5/18/2009 2

But, Today, Wave Computing is Actually… The Wave model is a new paradigm for cloud computing. 5/18/2009 3

State-of-the-art in the Cloud - We provide scalability and fault- tolerance on thousands of machines. - We provide the query interference using high level languages. (MapReduce and its brothers: G . Y . M . ) 5/18/2009 4

Are G.Y.M.’s Executions Optimal? - We looked at a query trace from a production system (20 thousand queries, 29 million machine hours). - We focused on the I/O and computation efficiency. (Mr. Leopard) 5/18/2009 5

Our Finding: “Far From Ideal” Redudant I/O on 33% input data Distinct I/O 1 Normalized Total I/O 67% 0.8 46% 0.6 0.4 0.2 0 Common Current Ideal System computation 30% Production steps System Other (Results from simulation) 70% computation steps 5/18/2009 6

I/O Redundancy • Two sample workloads – Obtaining the top ten hottest Chinese pages daily – Obtaining the top ten hottest English pages daily Extract Extract Extract Filter: Filter: Filter: Filter: “Chinese” “English” “Chinese” “English” Compute Top Compute Top Compute Top Compute Top Ten Ten Ten Ten Output Output Output Output Ideal system Current system 5/18/2009 7

Computation Redundancy • Two sample workloads – Obtaining the top ten hottest Chinese pages daily – Obtaining the top ten hottest Chinese pages weekly Every day: Every week: Extract Extract Common computation on per-day log (Ideally) Filter: Filter: “Chinese” “Chinese” Compute Top Compute Top Ten Ten 5/18/2009 8

Why? Correlations among queries – Temporal correlations among queries (A series of queries with recurrent computation) 2% Recurring queries Non- recurring 98% queries 5/18/2009 9

Why? Correlations among queries – Spatial correlations among queries (Input data are targeted by multiple individual queries) Accesses to top ten 25% files Accesses to other 75% files 5/18/2009 10

How To Exploit the Correlations? Err… This is a little tricky. What about developing these? - a probabilistic model on scheduling the input data access - a predictive cache server - a speculative query decomposer. (G.Y.M.) No… Let’s K.I.S.S.: - Since correlations are inherent, we need a notion to capture them. - Our solution is the Wave model to capture the correlation for both the user and the system. (Mr. Leopard) 5/18/2009 11

The Wave Model • Key concepts capturing the correlation among queries – Data: not a static file, but a stream with periodically updated ( append-only ) – Query: computation on the input stream – Query series: recurrent computation on the stream 5/18/2009 12

Optimization Opportunities in Waves • Shared scan – Identifies the same input stream accesses among queries • Shared computation – Identifies common computation steps among queries • Query decomposition – Decomposes a query into a series of smaller queries – Uncovers more opportunities for shared scan and computation 5/18/2009 13

Query Optimizations in Wave Computing a jumbo query Series 1 • Decomposition (daily) • Form jumbo queries Series 2 • Optimizations on jumbo queries (daily) • Shared scan and computation Series 3 (weekly) 1 2 3 4 5 6 7 8 9 Query series 1: Obtaining the top ten hottest Chinese pages daily; Query series 2: Obtaining the top ten hottest English pages daily; Query series 3: Obtaining the top ten hottest Chinese pages weekly; 14

Ultimate (Wave+Cloud) Individual query series + Time = Jumbo queries 5/18/2009 15

Comet: Integration into DryadLINQ Translation: query to logical representation (expression tree) Query normalization Transformation: logical->physical More rules; Views Cost model Encapsulation: physical->Dryad Shared execution graph scan/partitioning Code generation 16

An Example of Query Decomposition in DryadLINQ Decompose an operator Q  seven daily queries + one combining query Daily query Views (Cost estimation) Combining Combine all the views Automatic query decomposition is challenging. 5/18/2009 17

Micro Benchmark • Overall effectiveness – Logical optimization of Comet reduces 12.3% of total I/O. – Full (Logical + Physical optimizations) of Comet reduces 42.3% of total I/O. 200 180 Original Logical Full 160 Total I/O (GB) 140 120 100 80 60 40 20 0 1 2 3 4 5 6 7 Day (Running three sample queries on one week data of around 120 GB; 18 A cluster of 40 machine)

Summary • The Wave model is a new paradigm for capturing the query correlations in the cloud. • The Wave model enables significant opportunities in improving performance and resource utilization. • Comet: our ongoing project integrating Wave computing into DryadLINQ. 5/18/2009 19

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia - PowerPoint PPT Presentation

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia Joint work with Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, Lidong Zhou 5/18/2009 1 My Dream Wave Computing 5/18/2009 2 But, Today, Wave Computing is

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

INSPIRATION Faxton Campus St . Lukes Campus Faxton-St . Lukes Healthcare EDUCATION

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

GENwave What is Google wave? What is a wave? A wave is equal parts conversation and

1 8th Grade Wave Properties 20151028 www.njctl.org 2 Table of Contents: Wave properties

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Secure Outsourcing Computation Li Xiong Outline Cloud computing Computing on encrypted

MAIs Media Industry Group Reed Phillips January 2016 Active Across the Media Landscape

8803 - MS Robotics Studio Henrik I Christensen & Mike Stilman Robotics & Intelligent

A Peer-to-Peer Registry for Network Management Web Services Torsten Klie 1 , Adrian Belger 2 ,

Web and Automotive W3C Workshop Renault - DREAM RENAULT PROPERTY 14-15 Nov 2012 SUMMARY 01 Car

The Insiders Guide to Networking September 18, 2018 Ann Mehl 95 Executive Coach 1 I. How

Unit Review .4 miles B 1.2 miles C 3,590,400 miles D Return to Table of Contents Slide 3 /

CAYCE TOWN HALL May 19, 2016 COMMUNITY SERVICES CAMPUS Project Overview OVERALL SITE PLAN

Welcome 2018 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia - PowerPoint PPT Presentation

Wave Computing in the Cloud Bingsheng He Microsoft Research Asia Joint work with Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, Lidong Zhou 5/18/2009 1 My Dream Wave Computing 5/18/2009 2 But, Today, Wave Computing is

Cloud Computing &amp; Cloud Models Cloud Models Topics Defining cloud computing

INSPIRATION Faxton Campus St . Lukes Campus Faxton-St . Lukes Healthcare EDUCATION

cloud computing Ridwaan Boda Director | Technology, Media and Telecommunications Overview

Cloud Computing and Cloud Storage By: Maurice Kelly History of Internet and Cloud Computing

GENwave What is Google wave? What is a wave? A wave is equal parts conversation and

1 8th Grade Wave Properties 20151028 www.njctl.org 2 Table of Contents: Wave properties

Chapter 4 Cloud Computing Applications and Paradigms Cloud Computing: Theory and Practice. 1

Storage Deduplication in Cloud Computing Joo Paulo and Jos Pereira University of Minho July

Patterns for Cloud Computing Simon Guest Senior Director, Technical Strategy Microsoft

Introduction to PaaS and IaaS Cloud Computing Roberto Beraldi Models for Cloud Computing

Are We Really Cloud-Native? Bert Ertman Cloud-Native Computing What is Cloud-Native? answer:

Linux Containers Drive P2P Social Cloud Computing By Alex Karasulu Social cloud computing ,

Cloud Computing Tom Hendrickx RESEARCH QUESTION Define Cloud Computing in context of the higher

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Secure Outsourcing Computation Li Xiong Outline Cloud computing Computing on encrypted

MAIs Media Industry Group Reed Phillips January 2016 Active Across the Media Landscape

8803 - MS Robotics Studio Henrik I Christensen &amp; Mike Stilman Robotics &amp; Intelligent

A Peer-to-Peer Registry for Network Management Web Services Torsten Klie 1 , Adrian Belger 2 ,

Web and Automotive W3C Workshop Renault - DREAM RENAULT PROPERTY 14-15 Nov 2012 SUMMARY 01 Car

The Insiders Guide to Networking September 18, 2018 Ann Mehl 95 Executive Coach 1 I. How

Unit Review .4 miles B 1.2 miles C 3,590,400 miles D Return to Table of Contents Slide 3 /

CAYCE TOWN HALL May 19, 2016 COMMUNITY SERVICES CAMPUS Project Overview OVERALL SITE PLAN

Welcome 2018 Silicon Valley All-Star Coaches www.siliconvalleynjb.com www.njbl.org SV All-Star

Cloud Computing & Cloud Models Cloud Models Topics Defining cloud computing

8803 - MS Robotics Studio Henrik I Christensen & Mike Stilman Robotics & Intelligent