Edge Replication Strategies for Wide-Area Distributed Processing
Niklas Semmler, Matthias Rost, Georgios Smaragdakis, Anja Feldmann
Edge Replication Strategies for Wide-Area Distributed Processing - - PowerPoint PPT Presentation
Edge Replication Strategies for Wide-Area Distributed Processing Niklas Semmler, Matthias Rost, Georgios Smaragdakis, Anja Feldmann Generate Data Heavy processing & Content Distribution Local processing & Temp. storage Internet
Niklas Semmler, Matthias Rost, Georgios Smaragdakis, Anja Feldmann
Generate Data Local processing &
Heavy processing & Content Distribution
How do we reduce the transfe ferred data volume? Limited bandwidth & Pay for transfers
2
App
Many large
Few small non-
Good for ... Per-query-result (cumulative) Replication cost (one time) Cost
3
Query Result
Option A: Transfer query results.
Replication
Option B: Replicate raw data.
Past Future
Fu Future demand is not known in advance! Now
4
demand is larger than replication cost. Data-dependent Requires knowledge of future Ca Can we do better?
5
6
a Global 2000 company.
7
Note: logarithmic color-scale!
sent over time window
replication cost factor
Cheap replication Costly replication >50% potential reduction
8
Replication cost factor depends
9
replicate!
Wh Why do we need more than this?
10
A strategy that has a bounded worst- case performance in comparison to the optimal offline strategy.
11
< 1% partitions have > 100k accesses Similar activity
Does popularity depend on location?
Repeating Patterns
Do popular partitions exhibit patterns of activity?
> 50% partitions have < 1k accesses
Skewed distribution: Accessed partition is more likely to be accessed in the future than not. Ski-rental does not use this!
12
13
Classifier
min(Replicate-all, Replicate-nothing)
14
15
Costly replication Cheap replication Worse than baseline Better than baseline
Insights
16
Hybrid: Slight improvement in replication timing.
Interested in the performance on your data? Please contact us: niklas.semmler@sap.com Both traces
17
18