online aggregation for large mapreduce jobs
play

Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , - PowerPoint PPT Presentation

Online Aggregation for Large MapReduce Jobs Niketan Pansare 1 , Vinayak Borkar 2 , Chris Jermaine 1 , Tyson Condie 3 1 Rice University, 2 UC Irvine, 3 Yahoo! Research Outline Motivation Implementation Experiments Conclusion 2


  1. OLA over single machine Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order  Example: Find SUM of below values 5, 9  7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67 37

  2. OLA over single machine Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order  Example: Find SUM of below values 5, 9  7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 Sample = {13, 11, 14} Estimate = (13 + 11 + 14) * 4 / 3 = 50.67 38

  3. OLA over single machine Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order  Example: Find SUM of below values 5, 9  7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 Sample = {13, 11, 14, 17} Estimate = (13 + 11 + 14 + 17) * 4 / 4 = 55 39

  4. Extend existing approaches OLA over single machine   Confidence interval found using classical sampling theory  Tuples are bundled into blocks  Blocks arrive in random order OLA over multiple machines   Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable Why it won't work ?  How do we deal with those issues ?  40

  5. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  So, instead of Example: Find SUM of below values  5, 9 7, 4, 2 7, 4, 2 8, 3 5, 9 1, 10, 6 8, 3 1, 10, 6 41

  6. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  1, 10, 6 5, 9 8, 3 Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 X axis = Processing Time 7, 4, 2 42

  7. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Blocks that take   long time to process = RED  Short time to process = Green 43

  8. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Arrows = Random Time Instances (Polling blocks) 44

  9. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 45

  10. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 46

  11. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 47

  12. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 48

  13. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 49

  14. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 50

  15. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 51

  16. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 52

  17. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 53

  18. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Notice, there are more arrows on red region than green region 54

  19. OLA over multiple machines Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable  Example: Find SUM of below values  7, 4, 2 8, 3 5, 9 1, 10, 6 Notice, there are more arrows on red region than green region Inspection Paradox: At any random time t, (stochastically) you will be processing those blocks that take long time 55

  20. Extend existing approaches OLA over single machine   Confidence interval found using classical sampling theory  Tuples are bundled into blocks - Arrive in random order OLA over multiple machines   Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable Why it won't work ?  How do we deal with those issues ?  56

  21. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks Possible: correlation between processing time and value   Eg: count query 57

  22. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks Possible: correlation between processing time and value   Eg: count query Biased estimates → current techniques won't work  58

  23. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks This effect is found experimentally in the paper: 'MapReduce Online' Possible: correlation between processing time and value   Eg: count query Biased estimates → current techniques won't work  59

  24. Why won't previous approach work ? Inspection paradox → At the time of estimation, processing  longer blocks Possible: correlation between processing time and value   Eg: count query Biased estimates → current techniques won't work  Therefore, need to deal with inspection paradox in principled  fashion 60

  25. Extend existing approaches OLA over single machine   Confidence interval found using classical sampling theory  Tuples are bundled into blocks - Arrive in random order OLA over multiple machines   Blocks → Non-uniform → Size, Locality, Machine, Network  Processing time for block can be large and highly variable Why it won't work ?  How do we deal with those issues ?  61

  26. How do we deal with Inspection Paradox Capture timing information (i.e. processing time of block)   Along with values Instead of using classical sampling theory, we output estimates  using bayesian model that:  Allows for correlation between processing time and values  And also takes into account the processing time of current block 62

  27. Outline  Motivation  Implementation  Experiments  Conclusion 63

  28. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 64

  29. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 65

  30. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 66

  31. Modifications to MapReduce (Hyracks) Master   Maintains random ordering of blocks - Logical not physical queue  Assigns block from head of queue  Block comes to head of queue → Timer starts (processing time) Two intermediates set of files   Data file → Values  Metadata file → Timing information  Shuffle phase of reducer 67

  32. Modifications to MapReduce (Hyracks) Client Master select sum(stock_price) from nasdaq_db group by company; Blk1 MSFT 2 AAPL 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 68

  33. Modifications to MapReduce (Hyracks) Client Master Blk1 MSFT 2 AAPL 4 Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 69 Time t = 0

  34. Modifications to MapReduce (Hyracks) Blk 1 Blk 2 Blk 3 Blk 4 Blk 5 Blk 6 Blk 7 Client Master Blk1 MSFT 2 Master maintains a logical AAPL 4 queue of the blocks Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 70 Time t = 1

  35. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Master randomizes the AAPL 4 queue Blk2 ORCL 3 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Blk6 MSFT 2 Blk7 AAPL 4 71 Time t = 1

  36. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Master forks workers AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 72 Time t = 2

  37. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Workers request for blocks AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 73 Time t = 3

  38. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk6 Blk1 MSFT 2 Masters reads head of AAPL 4 queue and assigns it to Blk2 ORCL 3 Worker 1 first worker Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 74 Time t = 4

  39. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 Blk6 Worker1 starts reading AAPL 4 Blk6 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 75 Time t = 5

  40. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk5 Blk1 MSFT 2 <MSFT, 2> Assigns Blk5 to Worker2 AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 76 Time t = 6

  41. Modifications to MapReduce (Hyracks) Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Blk1 MSFT 2 <MSFT, 2> Worker1 does its map task AAPL 4 Blk2 ORCL 3 Worker 1 Blk3 AAPL 4 Blk5 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 77 Time t = 7

  42. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Reducer Blk1 MSFT 2 <MSFT, 2> AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> Blk5 Blk4 MSFT 2 Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 78 Time t = 8

  43. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 79 Time t = 9

  44. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Random Time instance: Do estimation 80 Time t = 9

  45. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Random Time instance: Do estimation 81 Time t = 9

  46. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Random Time instance: Do estimation 82 Time t = 9

  47. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Blk6: <MSFT, 2> Estimation Blk6: t process = 4 code Random Time instance: Do estimation 83 Time t = 9 Blk5: t process > 3

  48. Modifications to MapReduce (Hyracks) t process = 4 Blk 6 Blk 5 Blk 3 Blk 1 Blk 4 Blk 7 Blk 2 Client Master t process > 3 Reducer Blk1 MSFT 2 AAPL 4 Shuffle Reduce Blk2 ORCL 3 Phase Phase Worker 1 Blk3 AAPL 4 <MSFT, 2> <MSFT, 2> Blk5 Blk4 MSFT 2 Reducer-MSFT Blk5 ORCL 3 Worker 2 Blk6 MSFT 2 Blk7 AAPL 4 Estimation code Random Time instance: Do estimation 84 [5.8, 8] Time t = 9

  49. Implementation Overview Framework for distributed systems: MapReduce   Hadoop - Staged processing → Online  Hyracks (developed at UC Irvine) - Pipelining → ”Online” - Architecture (and API) similar to Hadoop - http://code.google.com/p/hyracks/ For estimates of ”Aggregation”,   2 modifications to MapReduce (Hyracks)  Bayesian Estimator 85

  50. Bayesian Estimator Why ? → To deal with Inspection Paradox  86

  51. Bayesian Estimator Why ? → To deal with Inspection Paradox  How ?   Allows for correlation between processing time and values  And also take into account the processing time of current block 87

  52. Bayesian Estimator Why ? → To deal with Inspection Paradox  How ?   Allows for correlation between processing time and values  And also take into account the processing time of current block Implementation:   C++ code using GNU Scientific Library and Minuit2  Input: Data file and Metadata file from Reducer  Output: Confidence Interval → Eg:[995, 1005] with 95% prob 88

  53. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X 89

  54. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X) 90

  55. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) 91

  56. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling 92

  57. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling - f(X | T process > 100000000, T scheduling = 22) ≠ f(X) 93

  58. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X)  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling - f(X | T process > 100000000, T scheduling = 22) ≠ f(X) Estimation using Bayesian Machinery   Gibbs Sampler 94 - Developed probability (or update) equations

  59. Bayesian Estimator (Model) Parameterized model:   Timing Information:T process , T scheduling  Value: X Underlying distribution   Classical sampling theory: f(X) Detailed discussion in the paper  Our approach: f(X, T process , T scheduling ) - Correlation between X, T process and T scheduling - f(X | T process > 100000000, T scheduling = 22) ≠ f(X) Estimation using Bayesian Machinery   Gibbs Sampler 95 - Developed probability (or update) equations

  60. Outline  Motivation  Implementation  Experiments  Conclusion 96

  61. Experiments Hypothesis:   Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   select sum(page_count) from wikipedia_log group by language  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 97

  62. Experiments Hypothesis:   Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   select sum(page_count) from wikipedia_log group by language  6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer 98

  63. Experiments Hypothesis:  Reading the figures  Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   6 months Wikipedia log (220 GB compressed, 3960 blocks) Percentage of data processed  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer Experiment 2: (Simulated data set)   ↑ correlation (Non-uniform configuration) 99

  64. Experiments Hypothesis:  Reading the figures  Randomized Queue required  Allow correlation between processing time and value  Convergence of estimates Experiment 1: (Real dataset)   6 months Wikipedia log (220 GB compressed, 3960 blocks)  11 node cluster (4 disks, 4 cores, 12GB RAM)  Uniform configuration: Machines, Blocks  80 mappers and 10 reducer Experiment 2: (Simulated data set)   ↑ correlation (Non-uniform configuration) 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend