Online Aggregation for Large MapReduce Jobs
Niketan Pansare1, Vinayak Borkar2, Chris Jermaine1, Tyson Condie3
1Rice University, 2UC Irvine, 3Yahoo! Research
Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , - - PowerPoint PPT Presentation
Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , Vinayak Borkar 2 , Chris Jermaine 1 , Tyson Condie 3 1 Rice University, 2 UC Irvine, 3 Yahoo! Research Outline Motivation Implementation Experiments Conclusion 2
1Rice University, 2UC Irvine, 3Yahoo! Research
2
Motivation Implementation Experiments Conclusion
3
Motivation Implementation Experiments Conclusion
4
5
Output range estimate: [0, 2000 ] with 95% probability
6
Output range estimate: [900, 1100 ] with 95% probability
7
Output range estimate: [950, 1040 ] with 95% probability
8
Output range estimate: [990, 1010 ] with 95% probability
9
Output range estimate: [995, 1005 ] with 95% probability
10
Output range estimate: [999, 1001.5 ] with 95% probability
11
Output final answer: 1000
Output final answer: 1000
12
Output range estimate: [990, 1015 ] with 95% probability
13
'Answer 1000' v/s 'Estimate 1002.5'
14
'Answer 1000' v/s 'Estimate 1002.5'
15
'Answer 1000' v/s 'Estimate 1002.5'
16
'Answer 1000' v/s 'Estimate 1002.5'
17
In Self-managed DB: costs are fixed In Cloud: You pay for amount of hardware used
User needs to justify the cost to the organization
18
In Self-managed DB: costs are fixed In Cloud: You pay for amount of hardware used
User needs to justify the cost to the organization
Traditional DB: Notoriously difficult Cloud: Much simpler
19
In Self-managed DB: costs are fixed In Cloud: You pay for amount of hardware used
User needs to justify the cost to the organization
Traditional DB: Notoriously difficult Cloud: Much simpler
20
21
Confidence interval found using classical sampling theory Tuples are bundled into blocks Blocks arrive in random order
22
5, 9 7, 4, 2 8, 3 1, 10, 6
23
5, 9 7, 4, 2 8, 3 1, 10, 6
24
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2
25
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2
26
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2
27
7, 4, 2 5, 9 7, 4, 2 8, 3 1, 10, 6
28
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3
29
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3
30
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3
31
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3
32
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9
33
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9
34
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9
35
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6
36
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6
37
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6
38
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6
39
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6
40
Confidence interval found using classical sampling theory Tuples are bundled into blocks Blocks arrive in random order
Blocks → Non-uniform → Size, Locality, Machine, Network Processing time for block can be large and highly variable
41
5, 9 7, 4, 2 8, 3 1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6
42
1, 10, 6 7, 4, 2 8, 3 5, 9 1, 10, 6 5, 9 8, 3 7, 4, 2
43
long time to process = RED Short time to process = Green
7, 4, 2 8, 3 5, 9 1, 10, 6
44
7, 4, 2 8, 3 5, 9 1, 10, 6
45
7, 4, 2 8, 3 5, 9 1, 10, 6
46
7, 4, 2 8, 3 5, 9 1, 10, 6
47
7, 4, 2 8, 3 5, 9 1, 10, 6
48
7, 4, 2 8, 3 5, 9 1, 10, 6
49
7, 4, 2 8, 3 5, 9 1, 10, 6
50
7, 4, 2 8, 3 5, 9 1, 10, 6
51
7, 4, 2 8, 3 5, 9 1, 10, 6
52
7, 4, 2 8, 3 5, 9 1, 10, 6
53
7, 4, 2 8, 3 5, 9 1, 10, 6
54
7, 4, 2 8, 3 5, 9 1, 10, 6
55
7, 4, 2 8, 3 5, 9 1, 10, 6
56
Confidence interval found using classical sampling theory Tuples are bundled into blocks
Blocks → Non-uniform → Size, Locality, Machine, Network Processing time for block can be large and highly variable
57
Eg: count query
58
Eg: count query
59
Eg: count query
60
Eg: count query
61
Confidence interval found using classical sampling theory Tuples are bundled into blocks
Blocks → Non-uniform → Size, Locality, Machine, Network Processing time for block can be large and highly variable
62
Along with values
Allows for correlation between processing time and values And also takes into account the processing time of current
63
Motivation Implementation Experiments Conclusion
64
Hadoop
Hyracks (developed at UC Irvine)
2 modifications to MapReduce (Hyracks) Bayesian Estimator
65
Hadoop
Hyracks (developed at UC Irvine)
2 modifications to MapReduce (Hyracks) Bayesian Estimator
66
Hadoop
Hyracks (developed at UC Irvine)
2 modifications to MapReduce (Hyracks) Bayesian Estimator
67
Maintains random ordering of blocks
Assigns block from head of queue Block comes to head of queue → Timer starts (processing
Data file → Values Metadata file → Timing information Shuffle phase of reducer
68
Client Master
Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
69
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
70
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 1 Blk 2 Blk 3 Blk 4 Blk 5 Blk 6 Blk 7
71
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
72
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
Worker 1 Worker 2
73
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
Worker 1 Worker 2
74
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
Worker 1 Worker 2 Blk6
75
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
Worker 1 Worker 2 Blk6
76
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
Worker 1 Worker 2 <MSFT, 2> Blk5
77
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2
Worker 1 Worker 2 <MSFT, 2> Blk5
78
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 <MSFT, 2> Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> tprocess = 4
79
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT
80
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT
Random Time instance: Do estimation
81
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT
Random Time instance: Do estimation
tprocess > 3
82
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT
Random Time instance: Do estimation
tprocess > 3
83
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT
Random Time instance: Do estimation
tprocess > 3 Estimation code Blk6: tprocess = 4 Blk5: tprocess > 3 Blk6: <MSFT, 2>
84
Client Master Blk1 MSFT AAPL 2 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4
Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Worker 1 Worker 2 Blk5 Reducer Shuffle Phase Reduce Phase <MSFT, 2> <MSFT, 2> tprocess = 4 Reducer-MSFT
Random Time instance: Do estimation
tprocess > 3 Estimation code
85
Hadoop
Hyracks (developed at UC Irvine)
2 modifications to MapReduce (Hyracks) Bayesian Estimator
86
87
Allows for correlation between processing time and values And also take into account the processing time of current
88
Allows for correlation between processing time and values And also take into account the processing time of current
C++ code using GNU Scientific Library and Minuit2 Input: Data file and Metadata file from Reducer Output: Confidence Interval → Eg:[995, 1005] with 95% prob
89
Timing Information:Tprocess, Tscheduling Value: X
90
Timing Information:Tprocess, Tscheduling Value: X
Classical sampling theory:
91
Timing Information:Tprocess, Tscheduling Value: X
Classical sampling theory:
Our approach:
92
Timing Information:Tprocess, Tscheduling Value: X
Classical sampling theory:
Our approach:
93
Timing Information:Tprocess, Tscheduling Value: X
Classical sampling theory:
Our approach:
94
Timing Information:Tprocess, Tscheduling Value: X
Classical sampling theory:
Our approach:
Gibbs Sampler
95
Timing Information:Tprocess, Tscheduling Value: X
Classical sampling theory:
Our approach:
Gibbs Sampler
96
Motivation Implementation Experiments Conclusion
97
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
select sum(page_count) from wikipedia_log group by language 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
98
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
select sum(page_count) from wikipedia_log group by language 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
99
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
Percentage of data processed
100
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
101
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
True answer
102
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
10% of data processed → Non-randomized: Inaccurate estimate 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
103
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
20% of data processed → Non-randomized: Inaccurate estimate 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
104
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
Non-randomized → Inaccurate estimates 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
105
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
Processing large block → no correlation detected 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
106
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
Correlation detected → With correlation: Slightly more accurate 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
107
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
Correlation detected → With correlation: Unbiased 6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
108
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
109
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
110
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
111
Randomized Queue required Allow correlation between processing time and value Convergence of estimates
6 months Wikipedia log (220 GB compressed, 3960 blocks) 11 node cluster (4 disks, 4 cores, 12GB RAM) Uniform configuration: Machines, Blocks 80 mappers and 10 reducer
↑ correlation (Non-uniform configuration)
112
Motivation Implementation Experiments Conclusion
113
Statistically robust estimates
114