Wave Computing in the Cloud
Bingsheng He Microsoft Research Asia
Joint work with Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, Lidong Zhou
5/18/2009 1
Wave Computing in the Cloud Bingsheng He Microsoft Research Asia - - PowerPoint PPT Presentation
Wave Computing in the Cloud Bingsheng He Microsoft Research Asia Joint work with Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, Lidong Zhou 5/18/2009 1 My Dream Wave Computing 5/18/2009 2 But, Today, Wave Computing is
Joint work with Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, Lidong Zhou
5/18/2009 1
5/18/2009 2
5/18/2009 3
The Wave model is a new paradigm for cloud computing.
5/18/2009 4
(MapReduce and its brothers: G. Y. M. )
tolerance on thousands of machines.
using high level languages.
5/18/2009 5
(Mr. Leopard)
5/18/2009 6
33% 67% Redudant I/O on input data Distinct I/O 30% 70% Common computation steps Other computation steps
0.2 0.4 0.6 0.8 1 Current Production System Ideal System Normalized Total I/O
(Results from simulation) 46%
5/18/2009 7
Extract
Filter: “Chinese” Compute Top Ten Output
Extract
Filter: “English” Compute Top Ten Output
Current system Extract
Filter: “Chinese” Compute Top Ten Output Filter: “English” Compute Top Ten Output
Ideal system
5/18/2009 8
Extract
Filter: “Chinese” Compute Top Ten
Extract
Filter: “Chinese” Compute Top Ten
Every day: Every week: Common computation on per-day log (Ideally)
5/18/2009 9
98% 2% Recurring queries Non- recurring queries
5/18/2009 10
75% 25% Accesses to top ten files Accesses to other files
5/18/2009 11
Err… This is a little tricky. What about developing these?
the input data access
(G.Y.M.) (Mr. Leopard) No… Let’s K.I.S.S.:
need a notion to capture them.
capture the correlation for both the user and the system.
5/18/2009 12
5/18/2009 13
1 2 3 4 5 6 7 8 9 Series 3 (weekly) Series 2 (daily) Series 1 (daily)
14
Query series 1: Obtaining the top ten hottest Chinese pages daily; Query series 2: Obtaining the top ten hottest English pages daily; Query series 3: Obtaining the top ten hottest Chinese pages weekly;
5/18/2009 15
Time
Individual query series Jumbo queries
16
Translation: query to logical representation (expression tree) Transformation: logical->physical Encapsulation: physical->Dryad execution graph Code generation
Query normalization More rules; Views Shared scan/partitioning Cost model
5/18/2009 17
Decompose an
Q seven daily queries + one combining query
Daily query
Combining Automatic query decomposition is challenging.
Views (Cost estimation) Combine all the views
– Logical optimization of Comet reduces 12.3% of total I/O. – Full (Logical + Physical optimizations) of Comet reduces 42.3% of total I/O.
20 40 60 80 100 120 140 160 180 200 1 2 3 4 5 6 7 Total I/O (GB) Day Original Logical Full
(Running three sample queries on one week data of around 120 GB; A cluster of 40 machine)
18
5/18/2009 19