clarinet wan aware optimization for
play

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay - PowerPoint PPT Presentation

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay Viswanathan, Ganesh Ananthanarayanan, Aditya Akella 1 Overview Web apps hosted on multiple DCs Low latency access to end-user 2 Overview Web apps hosted on


  1. WAN Aware Query Optimization DC 2 QUERY 40 Gbps 80 Gbps T2 SELECT T1.user, T1.latency, T2.latency, T3.latency 40 s 1 s WAN-only FROM T1, T2, T3 WHERE T1.user == T2.user AND T1.user == T3.user bottleneck AND T1.device == T2.device == T3.device == “mobile”; 100 Gbps DC 3 DC 1 T1 T3 T1, T2, T3: Tables storing click logs WAN-aware query optimizer that uses network transfer Plan running time: 41 s Plan running time: 20.96 s Plan running time: 17.6 s duration to choose query plans ⋈ ⋈ ⋈ 10 GB 200 GB 12 GB 200 GB 16 GB 200 GB ⋈ ⋈ ⋈ 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 200 GB 200 GB 200 GB 200 GB 200 GB 200 GB 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T3 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T1 T2 Chosen by network T2 T3 T2 T1 T1 T3 agnostic query optimizer Plan C Plan A Plan B 5

  2. Outline 1. Motivation 2. Challenges in choosing query plan based on WAN transfer durations 3. Solution • Single query • Multiple simultaneous queries 4. Experimental Evaluation 6

  3. Other factors also affect query plan run time DC 2 40 Gbps T2 80 Gbps 100 Gbps DC 3 DC 1 T1 T3 7

  4. Other factors also affect query plan run time ⋈ DC 2 200 GB 40 Gbps 200 GB T2 𝜏 𝐷 𝜏 𝐷 80 Gbps 100 Gbps DC 3 DC 1 T1 T2 T1 T3 7

  5. Other factors also affect query plan run time REDUCE: JOIN ⋈ DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job 7

  6. Other factors also affect query plan run time Tasks REDUCE: JOIN placed in ⋈ single DC DC 2 200 GB 40 Gbps 200 GB 20 s T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job 7

  7. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 10 s 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job 7

  8. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 2. Plan B: 20.96 3. Plan C: 17.6 s 7

  9. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 T1 T2 Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7

  10. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 Used by high T1 T2 priority application Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7

  11. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 DC 1 T1 T2 T1 T3 Used by high T1 T2 priority application Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7

  12. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 Choose query plan based on: DC 1 T1 T2 T1 T3 1. Best available task placements Used by high T1 T2 priority application Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7

  13. Other factors also affect query plan run time Tasks are placed REDUCE: JOIN uniformly across ⋈ DC 1 and DC 2 DC 2 200 GB 40 Gbps 200 GB T2 200 GB 200 GB 𝜏 𝐷 𝜏 𝐷 80 Gbps MAP: SELECT MAP: SELECT 100 Gbps DC 3 Choose query plan based on: DC 1 T1 T2 T1 T3 1. Best available task placements Used by high T1 T2 priority application 2. Schedule of network transfers Map Reduce Job While evaluating different query plans 1. Plan A: 41 s 20.5 s 2. Plan B: 20.96 11.2 s 3. Plan C: 17.6 s 7

  14. Joint plan selection, placement and scheduling 8

  15. Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query 8

  16. Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query Assign parallelism for each stage Logical plan to physical plan 8

  17. Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query Assign parallelism for each stage Logical plan to physical plan Network aware task placement and scheduling for each query plan Clarinet Choose plan with smallest run time for execution 8

  18. Joint plan selection, placement and scheduling SELECT * FROM … WHERE.. ; Multiple query plans (join orders) per Query Optimizer query Clarinet binds query to plan lower in the stack Assign parallelism for each stage Logical plan to physical plan Network aware task placement and scheduling for each query plan Clarinet Choose plan with smallest run time for execution 8

  19. Network aware placement and scheduling JOIN JOIN SELECT SELECT SELECT T2 T1 T3 9

  20. Network aware placement and scheduling • Task placement decided greedily one stage at a time JOIN • Minimize per stage run time JOIN SELECT SELECT SELECT T2 T1 T3 9

  21. Network aware placement and scheduling • Task placement decided greedily one stage at a time JOIN • Minimize per stage run time JOIN SELECT • Scheduling of network transfers • Determines start times of inter-DC network transfers SELECT SELECT T2 T1 T3 9

  22. Network aware placement and scheduling • Task placement decided greedily one stage at a time JOIN • Minimize per stage run time JOIN SELECT • Scheduling of network transfers • Determines start times of inter-DC network transfers • SELECT SELECT Formulate a Binary Integer Linear Program to solve T2 scheduling • Factors transfer dependencies T1 T3 9

  23. How to extend the late-binding strategy to multiple queries? 10

  24. Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 11

  25. Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 QUERY 1 SELECT … device == “mobile” … ; QUERY 2 SELECT … genre == “pc” … ; 11

  26. Queries affect each others’ run time ⋈ ⋈ DC 2 200 GB 16 GB 16 GB 200 GB 40 Gbps 80 Gbps T2 ⋈ 𝜏 𝐷 ⋈ 𝜏 𝑆 200 GB 200 GB 200 GB 200 GB 100 Gbps DC 3 DC 1 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T2 T1 T3 T1 T3 T1 T3 QUERY 1 Same query plan (Plan C) for Query 1 and Query 2 SELECT … device == “mobile” … ; QUERY 2 SELECT … genre == “pc” … ; 11

  27. Queries affect each others’ run time ⋈ ⋈ DC 2 200 GB 16 GB 16 GB 200 GB 40 Gbps 80 Gbps T2 ⋈ 𝜏 𝐷 ⋈ 𝜏 𝑆 200 GB 200 GB 200 GB 200 GB 100 Gbps DC 3 DC 1 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T2 T1 T3 T1 T3 Contention increases query run time T1 T3 QUERY 1 Same query plan (Plan C) for Query 1 and Query 2 SELECT … device == “mobile” … ; QUERY 2 SELECT … genre == “pc” … ; 11

  28. Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 Different query plans for Query 1 (Plan C) and Query 2 (Plan B) ⋈ ⋈ QUERY 1 SELECT … 16 GB 200 GB 12 GB 200 GB device == “mobile” … ; ⋈ ⋈ 𝜏 𝐷 𝜏 𝐷 200 GB 200 GB 200 GB 200 GB QUERY 2 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 SELECT … 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T3 genre == “pc” … ; T1 T3 T2 T1 11

  29. Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 T1 T3 Different query plans for Query 1 (Plan C) No contention of network links and Query 2 (Plan B) ⋈ ⋈ QUERY 1 SELECT … 16 GB 200 GB 12 GB 200 GB device == “mobile” … ; ⋈ ⋈ 𝜏 𝐷 𝜏 𝐷 200 GB 200 GB 200 GB 200 GB QUERY 2 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 SELECT … 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T3 genre == “pc” … ; T1 T3 T2 T1 11

  30. Queries affect each others’ run time DC 2 40 Gbps 80 Gbps T2 100 Gbps DC 3 DC 1 Choosing execution plans jointly for multiple T1 T3 Different query plans for Query 1 (Plan C) No contention of network links and Query 2 (Plan B) queries improves performance ⋈ ⋈ QUERY 1 SELECT … 16 GB 200 GB 12 GB 200 GB device == “mobile” … ; ⋈ ⋈ 𝜏 𝐷 𝜏 𝐷 200 GB 200 GB 200 GB 200 GB QUERY 2 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 𝜏 𝑁𝑝𝑐𝑗𝑚𝑓 T2 SELECT … 𝜏 𝑄𝐷 𝜏 𝑄𝐷 T3 genre == “pc” … ; T1 T3 T2 T1 11

  31. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable 12

  32. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration 12

  33. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable Iter 1: 10 18 12 5 8 20 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration 12

  34. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable Iter 1: 10 18 12 5 8 20 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration 12

  35. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable Iter 1: 10 18 12 5 8 20 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12

  36. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12

  37. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable Iter 2: 15 18 17 25 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12

  38. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable Iter 2: 15 18 17 25 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 Reserve bandwidth to guarantee completion time Link 2 t 0 5 12

  39. Iterative Shortest Job First QUERY A QUERY B QUERY C • Best combination  minimize average completion QO QO QO • Computationally intractable Iter 2: 15 18 17 25 30 • Iterative Shortest Job First (SJF) Clarinet scheduling heuristic 1. Pick shortest physical query plan in each iteration Link 1 • B1 A1 Reserve bandwidth to guarantee completion time A2 Link 2 t 0 5 7 15 12

  40. Avoid fragmentation and improve completion time 13

  41. Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation 13

  42. Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Link 1 B1 A1 A2 Link 2 22 t 10 12 0 13

  43. Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Dominant transfers execute sequentially Link 1 B1 A1 A2 Link 2 22 t 10 12 0 13

  44. Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Dominant transfers execute sequentially Link 1 B1 A1 A2 Link 2 Extended idling 22 t 10 12 0 13

  45. Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Alternate schedule with same query plans Dominant transfers execute sequentially Link 1 B1 A1 A1 B1 A2 A2 Link 2 Extended idling 22 t t 10 12 12 2 0 0 13

  46. Avoid fragmentation and improve completion time • SJF & reservation leads to bandwidth fragmentation Scheduled in SJF order Alternate schedule with same query plans Re-arranging transfers resulting in deviation from Dominant transfers execute sequentially SJF schedule can help Link 1 B1 A1 A1 B1 A2 A2 Link 2 Extended idling 22 t t 10 12 12 2 0 0 13

  47. k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t 14

  48. k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t • Identify transfers of k-shortest yet incomplete jobs 14

  49. k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t • Identify transfers of k-shortest yet incomplete jobs • Relax transfer schedule  Start as soon as link is free and task is available 14

  50. k-Shortest Jobs First Heuristic Offline schedule Link n Link 2 Link 1 t • Identify transfers of k-shortest yet incomplete jobs • Relax transfer schedule  Start as soon as link is free and task is available • Best ’k’  Prior observations (or) through offline simulations 14

  51. Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO 15

  52. Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans 15

  53. Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select • Partition pruning 15

  54. Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select Clarinet • Partition pruning Enforces Clarinet’s schedule Execution framework 15

  55. Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select Clarinet • Partition pruning Enforces Clarinet’s schedule Execution framework • Modified Tez’s DAGScheduler 15

  56. Clarinet Implementation Batch of queries QUERY 1 QUERY 3 QUERY 2 Online query arrivals Existing Query Optimizers QO QO QO • Modified Hive to generate multiple plans • QOs control set of generated plans • Existing optimizations are applied • Push down Select Clarinet • Partition pruning Enforces Clarinet’s schedule Execution framework • Modified Tez’s DAGScheduler • Fairness guarantees 15

  57. Evaluation Compare Clarinet with following GDA approaches: 16

  58. Evaluation Compare Clarinet with following GDA approaches: 1. Hive 2. Hive + Iridium 3. Hive + Reducers in single DC 16

  59. Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium 3. Hive + Reducers in single DC 16

  60. Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC 16

  61. Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC : Distributed filtering + central aggregation 16

  62. Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC : Distributed filtering + central aggregation • Geo-Distributed Analytics stack across 10 EC2 regions 16

  63. Evaluation Compare Clarinet with following GDA approaches: 1. Hive : WAN agnostic task placement + scheduling 2. Hive + Iridium : WAN aware task placement across DCs 3. Hive + Reducers in single DC : Distributed filtering + central aggregation • Geo-Distributed Analytics stack across 10 EC2 regions • Workload: • 30 batches of 12 randomly chosen TPC-DS queries 16

  64. Evaluation: Reduction in average completion time GDA Approach Average Gains Vs. Hive Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in 0.6x single DC 17

  65. Evaluation: Reduction in average completion time GDA Approach Average Gains Vs. Hive Clarinet 2.7x Hive + Iridium 1.5x Hive + Reducers in 0.6x single DC Clarinet chooses a different plan for 75% of queries 17

  66. Evaluation: Reduction in average completion time 1 WAN GDA Approach Average Gains 0.9 bandwidth 0.8 distribution Vs. Hive 0.7 0.6 CDF Hive bytes 0.5 Clarinet 2.7x distribution 0.4 0.3 Hive + Iridium 1.5x Clarinet bytes 0.2 distribution 0.1 Hive + Reducers in 0.6x 0 1 6 11 16 21 26 31 36 41 46 51 56 single DC Link ID sorted by bandwidth Data from a single batch 12 queries Clarinet chooses a different plan for 75% of queries 17

  67. Evaluation: Reduction in average completion time 1 WAN GDA Approach Average Gains 0.9 bandwidth 0.8 distribution Vs. Hive 0.7 0.6 CDF Hive bytes 0.5 Clarinet 2.7x distribution 0.4 0.3 Hive + Iridium 1.5x Clarinet bytes 0.2 distribution 0.1 Hive + Reducers in 0.6x 0 1 6 11 16 21 26 31 36 41 46 51 56 single DC Link ID sorted by bandwidth Data from a single batch 12 queries Clarinet chooses a different plan for 75% of queries 17

  68. Evaluation: Reduction in average completion time 1 WAN GDA Approach Average Gains 0.9 bandwidth 0.8 distribution Vs. Hive 0.7 0.6 CDF Hive bytes 0.5 Clarinet 2.7x distribution 0.4 0.3 Hive + Iridium 1.5x Clarinet bytes 0.2 distribution 0.1 Hive + Reducers in 0.6x 0 1 6 11 16 21 26 31 36 41 46 51 56 single DC Link ID sorted by bandwidth Data from a single batch 12 queries Clarinet chooses a different plan for 75% of queries 17

  69. Evaluation: Optimization overhead 18

  70. Evaluation: Optimization overhead 1. Generate multiple query plans 2. Iterative multi-query plan selection 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend