lube mitigating bottlenecks in
play

Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data - PowerPoint PPT Presentation

HotCloud17 Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data Analytics Baochun Li i Qua Wide Area Data Analytics DC Master Namenode Workers Datanodes 2 Wide Area Data Analytics Why wide area data analytics? Data Volume


  1. HotCloud’17 Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data Analytics Baochun Li i Qua

  2. Wide Area Data Analytics DC Master Namenode Workers Datanodes 2

  3. Wide Area Data Analytics Why wide area data analytics? • Data Volume • User Distribution • Regulation Policy … DC #1 DC #2 DC #n … Problems Master Workers Workers • Widely shared resources … Namenode Datanodes Datanodes ‣ Fluctuating available provision • Distributed runtime environment ‣ Heterogenous utilizations 2

  4. Fluctuating WAN Bandwidths 10.6.3.3 (VC) 500 10.8.3.3 (CT) 10.12.3.32 (TR) 10.4.3.5 (WT) 10.2.3.4 (TR) 400 Bandwidth (Mbps) 300 200 100 0 0:00 6:00 12:00 18:00 0:00 6:00 12:00 Jan 1 Jan 2 Measured by iperf on SAVI testbed https://www.savinetwork.ca/ 3

  5. Heterogenous Memory Util Nodes in di ff erent DCs may have di ff erent resource utilizations 0.4 node_1 0.2 node_2 0.0 node_3 0.2 node_4 0.4 1 301 601 901 1201 1501 1801 2101 Time (s) Running Berkeley Big Data Benchmark on AWS EC2 4 nodes across 4 regions. Collected by jvmtop 4

  6. Runtime Bottlenecks Bottlenecks emerges at runtime Fluctuation Heterogeneity • Any time • Any nodes Bottlenecks • Any resources Data analytics performance • Long completion times • Low resource utilization • Invalid optimization 5

  7. Optimization of Data Analytics Existing optimization method does not consider runtime bottlenecks • Clarient [OSDI’16] considers the heterogeneity of available WAN bandwidth • Iridium [SIGCOMM’15] trades o ff between time and WAN bandwidth usage • Geode [NSDI’15] saves WAN usage via data placement and query plan selection • SWAG [SoCC’15] reorders jobs across datacenters “Much of this performance work has been motivated by three widely-accepted mantras about the performance of data analytics — network , disk and straggler .” Making Sense of Performance in Data Analytics Frameworks NSDI’15, Kay Ousterhout 6

  8. Mitigating Bottlenecks at Runtime Mitigating bottlenecks • How to detect bottlenecks? • How to overcome the scheduling delay? • How to enforce the bottleneck mitigation? Resource queue Task queue in bottleneck 7

  9. Architecture of Lube Lube Client Lube Master Bottleneck Detector Bottleneck Info. Cache Three major components Online Bottleneck Detector Available Worker Pool • Performance monitors Training Model (worker, intensity) • Bottleneck detecting module Pool Update • Bottleneck-aware scheduler Lube Scheduler Lightweight Performance Monitors Submitted Task Queue Network I/O JVM Bottleneck-aware Disk I/O more metrics Scheduling 8

  10. Detecting Bottlenecks — ARIMA y t = θ 0 + φ 1 y t − 1 + φ 2 y t − 2 +…+ φ p y t − p + ε t − θ 1 ε t − 1 − θ 2 ε t − 2 − … − θ q ε t − q θ φ ε Ramdon error y t Current state Coe ffi cients input output Historical Autoregressive (AR) + Current states Moving Average(MA) state (time_1, mem_util) (time_2, mem_util) ARIMA(p, d, q) (time_t, mem_util) … (time_t-1, mem_util) 9

  11. Detecting Bottlenecks — HMM Hidden Markov Model • Hidden states: O past future • Observation states: Q t • Emission probability: A A(a ij ) A(a ij ) … … Q q j q j q 1 q 1 q 2 q 2 q i q i • Transition probability: B B(b j ( k )) B(b j ( k )) To make HMM online … O k O k O 1 O 1 O 2 O 2 O d O d O Sliding Hidden Markov Model {time_stamp: mem, net, cpu, disk} • A sliding window for new observations • A moving average approximation for outdated observations 10

  12. Bottleneck-Aware Scheduling Memory utilization of executor processes Built-in task schedulers: • Data-locality Network utilization of datanode processes Bottleneck-aware scheduler: • Data-locality • Bottlenecks at runtime CPU utilization of executor processes A single worker node is bottlenecked continuously while Disk (SSD) utilization of datanode processes all nodes are rarely bottlenecked at the same time 11 Time (s)

  13. Implementation & Deployment Implementation • Spark-1.6.1 (scheduler) APIs: • redis database (cache) Master Node Lube Scheduler • Python scikit-learn, Keras (ML) HGET worker_id time Master Redis Server HSET worker_id {time: {metric: val_ob, val_inf}} Deployment Bottleneck Detection Module Worker Nodes • 37 EC2 m4.2xlarge instances SUBSCRIBE metric_1 metric_2 … Worker Redis Server • 9 regions PUBLISH + HSET • Berkeley Big Data Benchmark metric {time: val} … nethogs jvmtop iotop (e.g, iotop {time: I/O}) • An 1.1 TB dataset 12

  14. Evaluation — Accuracy ARIMA SlidHMM 100 100 Calculation Hit Rate (%) Hit Rate (%) 80 80 hitrate = #((time, detection) ∩ (time, observation)) 60 60 Query-1 Query-2 #(time, detection) a b c a b c 100 100 ARIMA ignores nonlinear patterns Hit Rate (%) Hit Rate (%) 80 80 60 60 Query-3 Query-4 a b c 13

  15. Evaluation — Completion Times Pure Spark Lube-SlidHMM Lube-ARIMA 1.0 1.0 Query-1 Query-2 Task completion times 0.5 0.5 Average 75th Time (ms) Time (ms) 0 0 0 5 0 5 0 5 0 5 0 0 Lube-ARIMA 12.454s 22.075s 1 1 1 1 × × × × 2 4 2 4 1.0 1.0 Query-3 Query-4 Lube-SlidHMM 14.783s 27.469s 0.5 0.5 Time (ms) Time (ms) 0 0 0 5 0 5 0 5 0 5 0 0 1 1 1 1 × × × × 2 4 2 4 14

  16. Evaluation — Completion Times Pure Spark Lube-ARIMA Query completion times ARIMA + Spark Lube-SlidHMM SlidHMM + Spark • Lube-ARIMA • Lube-SlidHMM 1600 1600 • Reduce median query response 1400 1400 Time (s) time by up to 33% 1200 1200 1000 1000 Query-1 Query-2 Control Groups for overhead 1800 1800 • ARIMA + Spark 1600 1600 Time (s) • SlidHMM + Spark 1400 1400 1200 • Negligible overhead 1200 1000 Query-3 Query-4 1000 800 15

  17. Conclusion • Runtime performance bottleneck detection ‣ ARIMA, HMM • A simple greedy bottleneck-aware task scheduler ‣ Jointly consider data-locality and bottlenecks • Lube , a closed-loop framework mitigating bottlenecks at runtime. 16

  18. The End Thank You

  19. Discussion Bottleneck detection models • More performance metrics could be explored • More e ffi cient models for time series prediction, e.g ., Reinforcement Learning, LSTM Bottleneck-aware scheduling • Fine-grained scheduling with specific resource awareness WAN conditions • We measure pair-wise WAN bandwidths by a cron job running iperf locally • Try to exploit support from SDN interfaces 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend