edge replication strategies for wide area distributed
play

Edge Replication Strategies for Wide-Area Distributed Processing - PowerPoint PPT Presentation

Edge Replication Strategies for Wide-Area Distributed Processing Niklas Semmler, Matthias Rost, Georgios Smaragdakis, Anja Feldmann Generate Data Heavy processing & Content Distribution Local processing & Temp. storage Internet


  1. Edge Replication Strategies for Wide-Area Distributed Processing Niklas Semmler, Matthias Rost, Georgios Smaragdakis, Anja Feldmann

  2. Generate Data Heavy processing & Content Distribution Local processing & Temp. storage Internet Edge World Datacenter Limited bandwidth & Pay for transfers How do we reduce the transfe ferred data volume? 2

  3. Setting App Query Result Replication Edge World Datacenter Option B: Option A: Replicate raw data. Transfer query results. Cost Per-query-result Replication cost (cumulative) (one time) Many large Good for ... Few small non- overlapping results overlapping results 3

  4. Problem ??? Past Future Now Fu Future demand is not known in advance! 4

  5. Replication strategy Strategy determines when data is replicated given a record of its past accesses. Naïve Optimal Offline • Replicate immediately. • Replicate immediately, if future demand is larger than replication cost. • Replicate never. Data-dependent Requires knowledge of future Ca Can we do better? 5

  6. Data Organization: Partition • Data is immutable. • e.g., machine logs • Data is partitioned. • Space: e.g., by machine, by location, etc. • A partition is accessible for a time window. • then removed or archived. 6

  7. Dataset • Trace of an ERP database of a Global 2000 company. • Accesses at row-level. • Partition := 10k rows • Time window := 1 day Note: logarithmic color-scale! 7

  8. >50% Cheap replication Costly replication potential Potential reduction reduction • Cumulative cost := • Sum of query result sizes sent over time window • Replication cost := • Partition size x replication cost factor Replication cost factor depends on compression, overhead, ... 8

  9. Replication Strategies I. Competitive II. Heuristic • • Guaranteed worst-case performance. Exploit access traces. III. Hybrid • Combination of above. 9

  10. Strategies: Competitive Ski-rental (Karlin et al.) Competitive Strategy • Use threshold to decide replication. A strategy that has a bounded worst- • If past transfer cost > replication cost: case performance in comparison to replicate! the optimal offline strategy. • 2-competitive algorithm. • Provably best worst-case bound. Wh Why do we need more than this? 10

  11. Dataset Insights Similar activity Repeating Patterns > 50% partitions < 1% partitions have < 1k accesses have > 100k accesses Skewed distribution: Accessed partition is more likely to be Does popularity Do popular partitions accessed in the future than not. depend on exhibit patterns of Ski-rental does not use this! location? activity? 11

  12. Strategies: Heuristics • Last-partition • Replicate if partition in previous time window exceeded replication cost. • Last-threshold • Compute best threshold over partitions in past time window. • Machine learning classifier (Random Forest) • Classify patterns into exceeding/not exceeding replication cost. • Replicate if accesses pattern match. 12

  13. Strategies: Hybrid • Replicate if either Ski-rental OR Classifier replicate. • Configure ML to be conservative. • Goal: Replicate earlier than pure Ski rental → avoid transfers. 13

  14. Replication Strategies I. Competitive II. Heuristic III. Hybrid • Ski-rental • Last-partition • Ski-rental OR Classifier • Classifier • Last-threshold VS Naïve Baseline Optimal Offline min(Replicate-all, Replicate-nothing) 14

  15. Transfer Cost Reduction Worse than baseline Better than Insights Cheap Costly baseline 1. Ski-rental achieves 38% reduction replication replication on average. Up to 50% for some cases. 2. Last-partitionperforms poorly. 3. Last-threshold close to ski-rental. 4. Classifier worse than ski-rental. 5. Hybrid: Small improvement. 15

  16. Transfer Cost Reduction Hybrid: Slight improvement in replication timing. 16

  17. Conclusion • Introduced replication strategies. • Ski-rental reduces transfers by 22%/50% on average/best-case. Both traces • Hybrid strategy improves performance by 25%/51%. Ongoing work • Improve machine learning. • Include other cost factors (storage, etc.) Interested in the performance on your data? Please contact us: niklas.semmler@sap.com 17

  18. Thank you! 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend