Sol
Fast Distributed Computation Over Slow Networks
Fan Lai, Jie You, Xiangfeng Zhu Harsha V. Madhyastha, Mosharaf Chowdhury
1
Sol Fast Distributed Computation Over Slow Networks Fan Lai , Jie - - PowerPoint PPT Presentation
Sol Fast Distributed Computation Over Slow Networks Fan Lai , Jie You, Xiangfeng Zhu Harsha V. Madhyastha, Mosharaf Chowdhury 1 Distributed Data Processing is Ubiquitous Distributed computation in Local-Area Networks (LAN) To accelerate
Fast Distributed Computation Over Slow Networks
Fan Lai, Jie You, Xiangfeng Zhu Harsha V. Madhyastha, Mosharaf Chowdhury
1
Distributed Data Processing is Ubiquitous
2
Efforts for Computation in LAN
Distributed Data Processing is Ubiquitous
2
Google Spanner
Iridium CLARINEt
Efforts for Computation over WAN
Azure Cosmos DB
Tetrium
Efforts for Computation in LAN
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML
Select * FROM …; K-means, SVM
Stream Processing
… …
WordCount, TopKCount
3
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner
Select * FROM …; K-means, SVM
Stream Processing
… …
WordCount, TopKCount
3
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner
Select * FROM …; K-means, SVM
Stream Processing
… …
WordCount, TopKCount
Typical job execution plans
Job 2 Job 1
3
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner
Select * FROM …; K-means, SVM
Coordinator Worker1 Worker2 WorkerN
…
Execution Engine Stream Processing
… …
WordCount, TopKCount
3
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner Storage System Resource Scheduler
Select * FROM …; K-means, SVM
Coordinator Worker1 Worker2 WorkerN
…
Execution Engine Stream Processing
… …
WordCount, TopKCount
3
Google Spanner
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner Storage System Resource Scheduler
Select * FROM …; K-means, SVM
Coordinator Worker1 Worker2 WorkerN
…
Execution Engine Stream Processing
… …
WordCount, TopKCount
Efforts for Computation in LAN
IridiumCLARINEt
Efforts for Computation over WAN
Azure Cosmos DB
Tetrium
3
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner Storage System Resource Scheduler
Select * FROM …; K-means, SVM
Coordinator Worker1 Worker2 WorkerN
…
Execution Engine Stream Processing
… …
WordCount, TopKCount
Efforts for Computation in LAN
Iridium CLARINEtEfforts for Computation over WAN
Azure Cosmos DBTetrium
3
Execution Engine: Core of Big Data Stack
SQL Queries AI/ML Execution Planner Storage System Resource Scheduler
Select * FROM …; K-means, SVM
Coordinator Worker1 Worker2 WorkerN
…
Execution Engine Stream Processing
… …
WordCount, TopKCount
Efforts for Computation in LAN
Iridium CLARINEtEfforts for Computation over WAN
Azure Cosmos DBTetrium
3
While network conditions are diverse in real, execution engines remain the same
Outline
4
50 100 150 Query Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF across Queries 10 Gbps, O(1) ms 1 Gbps, O(1) ms 10 Gbps, O(100) ms 1 Gbps, O(100) ms
Impact of Networks on Latency-sensitive Jobs
5
Queries from 100 GB TPC Benchmarks
Job Completion Time (s) CDF
50 100 150 Query Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF across Queries 10 Gbps, O(1) ms 1 Gbps, O(1) ms 10 Gbps, O(100) ms 1 Gbps, O(100) ms
Impact of Networks on Latency-sensitive Jobs
5
Queries from 100 GB TPC Benchmarks
Job Completion Time (s) CDF
50 100 150 Query Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF across Queries 10 Gbps, O(1) ms 1 Gbps, O(1) ms 10 Gbps, O(100) ms 1 Gbps, O(100) ms
Impact of Networks on Latency-sensitive Jobs
5
Queries from 100 GB TPC Benchmarks
4.9X Job Completion Time (s) CDF
Slow job execution in high-latency networks
Problem #1
50 100 150 Query Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF across Queries 10 Gbps, O(1) ms 1 Gbps, O(1) ms 10 Gbps, O(100) ms 1 Gbps, O(100) ms
Impact of Networks on Latency-sensitive Jobs
5
Queries from 100 GB TPC Benchmarks
4.9X Job Completion Time (s) CDF
Slow job execution in high-latency networks
Problem #1
Control Plane Inefficiency Due to High Latency
6
Coordinator Worker Busy
O(1) ms
Time
Tasks Launch(■)
Slow job execution in high-latency networks
Problem #1
Control Plane Inefficiency Due to High Latency
6
Coordinator Worker Busy
O(1) ms
Time Busy
Tasks Tasks Launch(■) Complete(■) Launch(■) Complete(■)
Slow job execution in high-latency networks
Problem #1
Control Plane Inefficiency Due to High Latency
7
Coordinator Worker Busy
C
p l e t e ( ■ ) L a u n c h ( ■ )
Time Busy
L a u n c h ( ■ ) Complete(■)
Idle
Tasks Tasks O(100) ms
Late-binding of tasks postpones scheduling
Impact of Networks on Bandwidth-intensive Jobs
8
Stage 1 Stage 2
Data transfers
Stage 3
Query25 on 1TB TPC benchmark
Impact of Networks on Bandwidth-intensive Jobs
8
Stage 1 Stage 2
Data transfers
Stage 3
Query25 on 1TB TPC benchmark
Resource utilization throughout the job
50 100 150 200 250 Time (s) 25 50 75 100 Percentage of the Total (%) Occupied CPUs CPU Util. B/w Util. Stage 1 Stage 2 Stage 3
Time (s)
Impact of Networks on Bandwidth-intensive Jobs
8
Stage 1 Stage 2
Data transfers
Stage 3
Query25 on 1TB TPC benchmark
50 100 150 200 250 Time (s) 25 50 75 100 Percentage of the Total (%) Occupied CPUs CPU Util. B/w Util.
Resource utilization throughout the job
Stage 1 Stage 2 Stage 3
Time (s)
Low CPU util.
Resource utilization throughout the job
50 100 150 200 250 Time (s) 25 50 75 100 Percentage of the Total (%) Occupied CPUs CPU Util. B/w Util. Stage 1 Stage 2 Stage 3
Time (s)
50 100 150 200 250 Time (s) 25 50 75 100 Percentage of the Total (%) Occupied CPUs CPU Util. B/w Util.
Resource utilization throughout the job
Stage 1 Stage 2 Stage 3
Time (s)
CPU underutilization in low-bandwidth networks
Data Plane Inefficiency Due to Low Bandwidth
9
Stage 1 Stage 2
Data transfers
Stage 3
Query25 on 1TB TPC benchmark
Tasks hog CPUs throughout the lifespan
Problem #2
Outline
10
Problem #1
High latency → Idleness of workers
Problem #2
Low b/w → CPU underutilization
Outline
11
A federated execution engine for diverse network conditions w/
LAN
Sol: A Federated Execution Engine
12
Sol Architecture
WAN
O(100) ms
Sol Coordinator
Task Arrivals Site 2 Site 3
WAN
O(100) ms
Site 1
LAN
Sol: A Federated Execution Engine
12
Sol Architecture
WAN
O(100) ms
Sol Coordinator
Task Arrivals Site 2 Site 3
WAN
O(100) ms
Site Manager
LAN
LAN
Sol: A Federated Execution Engine
12
Sol Architecture
WAN
O(100) ms
Sol Coordinator
Task Arrivals Site 2 Site 3
WAN
O(100) ms
Site Manager
LAN
LAN
Sol: A Federated Execution Engine
12
Sol Architecture
WAN LAN
O(100) ms
Sol Coordinator
Task Arrivals Site 2 Site 3
WAN
O(100) ms
Worker
Task Manager
Worker
Task Manager
Site Manager
LAN
Outline
13
Push tasks proactively to reduce worker idle time Problem #1
High latency → Idleness of workers
Task Early-binding in Control Plane
14
Coordinator Worker Time
C
p l e t e ( ■ )
Busy O(100) ms
Launch(■)
Idle
Launch(■)
Existing designs
Tasks Tasks
Task Early-binding in Control Plane
15
Coordinator Worker Time Site Manager O(100) ms O(1) ms
Tasks
Task Early-binding in Control Plane
15
Coordinator Worker Time Site Manager O(100) ms O(1) ms
Launch(■ ■) Tasks
Task Early-binding in Control Plane
15
Coordinator Worker Time Site Manager O(100) ms O(1) ms
Launch(■ ■) Launch(■) Complete(■) Busy Tasks
→ Guarantee high utilization
Task Early-binding in Control Plane
15
Coordinator Worker Time
C
p l e t e ( ■ )
Site Manager
Launch(■)
Idle Busy O(100) ms O(1) ms
Launch(■ ■) Launch(■) Complete(■) Busy Launch(■) Tasks Tasks
→ Guarantee high utilization
Task Early-binding in Control Plane
15
Coordinator Worker Time
C
p l e t e ( ■ )
Site Manager
Launch(■)
Idle Busy O(100) ms O(1) ms
Launch(■ ■) Launch(■) Complete(■) Busy Launch(■)
→ Retain precise views
Tasks Tasks
Challenge 1.1: How Many Tasks to Push?
16
Coordinator Site Manager RTT
Tasks Tasks
Challenge 1.1: How Many Tasks to Push?
16
Coordinator Site Manager RTT
Tasks Tasks
Challenge 1.1: How Many Tasks to Push?
16
Coordinator Site Manager RTT
Tasks Tasks
Challenge 1.1: How Many Tasks to Push?
16
Coordinator Site Manager RTT
Tasks Tasks
Challenge 1.2: How to Push Tasks w/ Dependencies?
17
Challenge 1.2: How to Push Tasks w/ Dependencies?
17
C S1 S2
Time
T1 T2
Design in Existing Engines
T1 T2 T3
Task Dependencies
Challenge 1.2: How to Push Tasks w/ Dependencies?
17
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2)
Design in Existing Engines
T1 T2 T3
Task Dependencies
Output1 Output2
Challenge 1.2: How to Push Tasks w/ Dependencies?
17
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
T1 T2 T3
Task Dependencies
Output1 Output2
Challenge 1.2: How to Push Tasks w/ Dependencies?
17
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
T1 T2 T3
Task Dependencies
Output1 Output2
W/o full knowledge, pushing leads to tradeoff
Challenge 1.2: How to Push Tasks w/ Dependencies?
18
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
Challenge 1.2: How to Push Tasks w/ Dependencies?
18
C S1 S2
Time
T1 T2
Push w/ Correct Speculations
T3
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
Challenge 1.2: How to Push Tasks w/ Dependencies?
18
Complete(T1) Complete(T2)
C S1 S2
Time
T1 T2
Push w/ Correct Speculations
T3
Sol saves RTTs
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
Activate(T3)
Challenge 1.2: How to Push Tasks w/ Dependencies?
19
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
C S1 S2
Time
T1 T2
Challenge 1.2: How to Push Tasks w/ Dependencies?
19
Push under Mispredictions
Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
T3
C S1 S2
Time
T1 T2
Challenge 1.2: How to Push Tasks w/ Dependencies?
19
Push under Mispredictions
Complete(T1) Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
T3
C S1 S2
Time
T1 T2
Challenge 1.2: How to Push Tasks w/ Dependencies?
19
Push under Mispredictions
Cancel(T3)
Complete(T1) Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
C S1 S2
Time
T1 T2
Challenge 1.2: How to Push Tasks w/ Dependencies?
19
Complete(T2)
Push under Mispredictions
Start(T3) Re-Schedule(T3) Cancel(T3)
Sol does not make things worse
Complete(T1) Complete(T1)
C S1 S2
Time
T1 T2 Complete(T2) Start(T3) Launch(T3)
Design in Existing Engines
C S1 S2
Time
T1 T2
Task Early-binding in Control Plane
20
Complete(T2)
Push under Mispredictions
Start(T3) Re-Schedule(T3) Cancel(T3)
Sol retains good scheduling quality
Complete(T1) T3 Complete(T1) Complete(T2)
C S1 S2
Time
T1 T2
Push w/ Correct Speculations
T3
Sol improves utilization
Activate(T3)
Outline
21
Decouple resource provisioning to improve CPU utilization Problem #2
Low b/w → CPU underutilization
Resource Decoupling in Data Plane
22
Resource Decoupling in Data Plane
22
Resource Decoupling in Data Plane
22
Sol scales down CPU requirements and reclaims unused CPUs
Challenge 2: How to Manage Jobs?
23
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
23
Incoming tasks Create
Y Large remote read?
Control flow of decoupling
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
23
Incoming tasks Create
Y Complete
Large remote read?
Control flow of decoupling
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
23
Incoming tasks Create
Y Complete
Large remote read?
Control flow of decoupling
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
23
Incoming tasks Create
Y Complete
Large remote read?
Control flow of decoupling
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
24
Incoming tasks Create
Y Complete
Control flow of decoupling
Large remote read?
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
24
Incoming tasks Create
Y Complete
Control flow of decoupling
Available CPUs N
Pending tasks Large remote read?
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
24
Incoming tasks Create
Y Complete
Control flow of decoupling
Available CPUs N
Y Pending tasks Large remote read?
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
24
Incoming tasks Create
Y Complete
Control flow of decoupling
Available CPUs N
Y Pending tasks Large remote read?
For bandwidth-intensive task
Challenge 2: How to Manage Jobs?
24
Incoming tasks Create
Y Complete
Control flow of decoupling
Available CPUs N
Y Pending tasks Large remote read?
25
Evaluation
With a prototype supporting generic data processing
Deployment over WAN
25
Evaluation
With a prototype supporting generic data processing 1. compared to existing engines? 2. across design space? 3. under uncertainties?
How does Sol perform:
Deployment over WAN
Sol Improves Job Performance and Resource Util. (WAN)
26
Benchmark — multi-job execution
Baseline
Sol Improves Job Performance and Resource Util. (WAN)
26
Benchmark — multi-job execution
Baseline
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Spark
16.4x improvement on average
Sol Improves Job Performance and Resource Util. (WAN)
27
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Sol – Spark
16.4x improvement on average
Control-plane benefits (2.6x on avg.)
16.4x improvement on average
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Sol – Spark
Control plane benefits (2.6x on avg.)
Job Performance
Data plane benefits
Control Plane:
Early-binding → Less idle time
Sol w/o decoupling
Sol Improves Job Performance and Resource Util. (WAN)
27
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Sol – Spark
16.4x improvement on average
Data-plane benefits Control-plane benefits (2.6x on avg.)
16.4x improvement on average
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Sol – Spark
Control plane benefits (2.6x on avg.)
Job Performance
Data plane benefits
Control Plane:
Early-binding → Less idle time
Data Plane:
Decoupling → Less under-util.
Sol w/o decoupling
Sol Improves Job Performance and Resource Util. (WAN)
27
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Sol – Spark
16.4x improvement on average
Data-plane benefits Control-plane benefits (2.6x on avg.)
16.4x improvement on average
100 101 102 103 Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Sol Sol – Spark
Control plane benefits (2.6x on avg.)
Job Performance
Data plane benefits
Control Plane:
Early-binding → Less idle time
Data Plane:
Decoupling → Less under-util.
16.4x better job completion
+
1.8x better CPU util.
Sol w/o decoupling
Sol Performs Well Across Design Space (LAN)
28
High-bandwidth setting (10 Gbps)
1.3x improvement on average
100 101 102
Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Spark Sol
3.9x improvement on average
100 101 102
Job Completion Time (s) 0.00 0.25 0.50 0.75 1.00 CDF Spark Sol
Low-bandwidth setting (1 Gbps)
https://github.com/SymbioticLab/Sol
A federated execution engine for diverse network conditions with
Improve CPU util.{
before task executions → during task executions → Early-binding of tasks Decoupling of resource provisioning
29