Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data - - PowerPoint PPT Presentation

lube mitigating bottlenecks in
SMART_READER_LITE
LIVE PREVIEW

Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data - - PowerPoint PPT Presentation

HotCloud17 Lube : Mitigating Bottlenecks in Hao Wang* Wide Area Data Analytics Baochun Li i Qua Wide Area Data Analytics DC Master Namenode Workers Datanodes 2 Wide Area Data Analytics Why wide area data analytics? Data Volume


slide-1
SLIDE 1

Lube: Mitigating Bottlenecks in Wide Area Data Analytics

Hao Wang* Baochun Li

iQua

HotCloud’17

slide-2
SLIDE 2

Wide Area Data Analytics

2

Workers Datanodes DC Master Namenode

slide-3
SLIDE 3

Wide Area Data Analytics

DC #2 DC #n Workers Workers Datanodes Datanodes

… … …

DC #1 Master Namenode

Why wide area data analytics?

  • Data Volume
  • User Distribution
  • Regulation Policy

Problems

  • Widely shared resources
  • Fluctuating available provision
  • Distributed runtime environment
  • Heterogenous utilizations

2

slide-4
SLIDE 4

Fluctuating WAN Bandwidths

10.8.3.3 (CT) 10.12.3.32 (TR) 10.4.3.5 (WT) 10.2.3.4 (TR) Bandwidth (Mbps) 100 200 300 400 500 0:00 6:00 12:00 18:00 0:00 6:00 12:00 Jan 1 Jan 2 10.6.3.3 (VC)

Measured by iperf on SAVI testbed https://www.savinetwork.ca/

3

slide-5
SLIDE 5

Heterogenous Memory Util

1 301 601 901 1201 1501 1801 2101

Time (s)

node_4 node_3 node_2 node_1 0.4 0.2 0.0 0.2 0.4

Running Berkeley Big Data Benchmark

  • n AWS EC2 4 nodes across 4 regions.

Collected by jvmtop Nodes in different DCs may have different resource utilizations

4

slide-6
SLIDE 6

Bottlenecks

Runtime Bottlenecks

Bottlenecks emerges at runtime

  • Any time
  • Any nodes
  • Any resources

Fluctuation Heterogeneity Data analytics performance

  • Long completion times
  • Low resource utilization
  • Invalid optimization

5

slide-7
SLIDE 7

Optimization of Data Analytics

“Much of this performance work has been motivated by three widely-accepted mantras about the performance of data analytics — network, disk and straggler.” Making Sense of Performance in Data Analytics Frameworks NSDI’15, Kay Ousterhout

Existing optimization method does not consider runtime bottlenecks

  • Clarient [OSDI’16] considers the heterogeneity of available WAN bandwidth
  • Iridium [SIGCOMM’15] trades off between time and WAN bandwidth usage
  • Geode [NSDI’15] saves WAN usage via data placement and query plan selection
  • SWAG [SoCC’15] reorders jobs across datacenters

6

slide-8
SLIDE 8

Mitigating Bottlenecks at Runtime

Task queue Resource queue Mitigating bottlenecks

  • How to detect bottlenecks?
  • How to overcome the scheduling delay?
  • How to enforce the bottleneck mitigation?

7

in bottleneck

slide-9
SLIDE 9

Lube Master Bottleneck Info. Cache Lube Scheduler Available Worker Pool Lube Client Model Update Online Bottleneck Detector Training Pool Network I/O JVM more metrics Disk I/O Lightweight Performance Monitors Bottleneck Detector Submitted Task Queue

(worker, intensity)

Bottleneck-aware Scheduling

Architecture of Lube

Three major components

  • Performance monitors
  • Bottleneck detecting module
  • Bottleneck-aware scheduler

8

slide-10
SLIDE 10

Detecting Bottlenecks — ARIMA

yt = θ0 +φ1yt−1 +φ2yt−2 +…+φpyt−p +εt −θ1εt−1 −θ2εt−2 −…−θqεt−q

yt Current state

θ φ

Coefficients

ε Ramdon error

Historical states Autoregressive (AR) + Moving Average(MA) Current state input

  • utput

(time_1, mem_util) (time_2, mem_util) (time_t-1, mem_util) … (time_t, mem_util)

9

ARIMA(p, d, q)

slide-11
SLIDE 11

Detecting Bottlenecks — HMM

past … … future t

{time_stamp: mem, net, cpu, disk}

A(aij) A(aij) B(bj(k)) B(bj(k)) … Q Od Od O2 O2 O1 O1

q1 q1 q2 q2 qi qi qj qj

Ok Ok O

Hidden Markov Model

  • Hidden states: O
  • Observation states: Q
  • Emission probability: A
  • Transition probability: B

Sliding Hidden Markov Model

  • A sliding window for new
  • bservations
  • A moving average approximation

for outdated observations To make HMM online

10

slide-12
SLIDE 12

Bottleneck-Aware Scheduling

Memory utilization of executor processes Network utilization of datanode processes CPU utilization of executor processes Disk (SSD) utilization of datanode processes Time (s)

Built-in task schedulers:

  • Data-locality

Bottleneck-aware scheduler:

  • Data-locality
  • Bottlenecks at runtime

A single worker node is bottlenecked continuously while all nodes are rarely bottlenecked at the same time

11

slide-13
SLIDE 13

Implementation & Deployment

PUBLISH + HSET metric {time: val} (e.g, iotop {time: I/O}) SUBSCRIBE metric_1 metric_2 … HSET worker_id {time: {metric: val_ob, val_inf}} HGET worker_id time

Worker Redis Server iotop jvmtop nethogs Master Redis Server Lube Scheduler … Master Node Worker Nodes Bottleneck Detection Module APIs:

Implementation

  • Spark-1.6.1 (scheduler)
  • redis database (cache)
  • Python scikit-learn, Keras (ML)

Deployment

  • 37 EC2 m4.2xlarge instances
  • 9 regions
  • Berkeley Big Data Benchmark
  • An 1.1 TB dataset

12

slide-14
SLIDE 14

Evaluation — Accuracy

Query-1 Hit Rate (%) 60 80 100 a b c Query-2 Hit Rate (%) 60 80 100 a b c Query-3 Hit Rate (%) 60 80 100 a b c Query-4 Hit Rate (%) 60 80 100 ARIMA SlidHMM

Calculation ARIMA ignores nonlinear patterns

hitrate = #((time, detection)∩(time, observation)) #(time, detection)

13

slide-15
SLIDE 15

Evaluation — Completion Times

Query-1 Time (ms) 0.5 1.0 2 × 1 05 4 × 1 05 Query-2 Time (ms) 0.5 1.0 2 × 1 05 4 × 1 05 Query-3 Time (ms) 0.5 1.0 2 × 1 05 4 × 1 05 Query-4 Time (ms) 0.5 1.0 2 × 1 05 4 × 1 05 Pure Spark Lube-ARIMA Lube-SlidHMM

14

Task completion times Average 75th Lube-ARIMA 12.454s 22.075s Lube-SlidHMM 14.783s 27.469s

slide-16
SLIDE 16

Evaluation — Completion Times

Pure Spark ARIMA + Spark SlidHMM + Spark Lube-ARIMA Lube-SlidHMM Query-1 1000 1200 1400 1600 Query-2 1000 1200 1400 1600 Query-3 800 1000 1200 1400 1600 1800 Query-4 1000 1200 1400 1600 1800 Time (s) Time (s)

15

Query completion times

  • Lube-ARIMA
  • Lube-SlidHMM
  • Reduce median query response

time by up to 33% Control Groups for overhead

  • ARIMA + Spark
  • SlidHMM + Spark
  • Negligible overhead
slide-17
SLIDE 17

Conclusion

  • Runtime performance bottleneck detection
  • ARIMA, HMM
  • A simple greedy bottleneck-aware task scheduler
  • Jointly consider data-locality and bottlenecks
  • Lube, a closed-loop framework mitigating bottlenecks at runtime.

16

slide-18
SLIDE 18

The End Thank You

slide-19
SLIDE 19

Discussion

Bottleneck detection models

  • More performance metrics could be explored
  • More efficient models for time series prediction, e.g., Reinforcement

Learning, LSTM Bottleneck-aware scheduling

  • Fine-grained scheduling with specific resource awareness

WAN conditions

  • We measure pair-wise WAN bandwidths by a cron job running iperf locally
  • Try to exploit support from SDN interfaces

18