aggregation and degradation in jetstream streaming
play

Aggregation and Degradation in JetStream: Streaming Analytics in the - PowerPoint PPT Presentation

Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Ariel Rabkin Princeton University asrabkin@cs.princeton.edu Work done with Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman Todays Analytics


  1. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area Ariel Rabkin Princeton University asrabkin@cs.princeton.edu Work done with Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman

  2. Today’s Analytics Architectures MillWheel Storm (Google) — Backhaul is inefficient and inflexible 2

  3. Tomorrow’s Architecture: JetStream JetStream — Backhaul is inefficient and inflexible — Goal: optimize use of WAN links by exposing them to streaming system. 3

  4. Backhaul is Intrinsically Inefficient Needed for backhaul Bandwidth Available Time [two days] Buyer’s remorse : Analyst’s remorse : wasted bandwidth system overload or missing data 4

  5. Stream Processing Basics Input Data Site A Stream Stream Operators Operators Site C Stream Input Data Operator Stream Stream Operators Operators Site B Some Operators in JetStream: Filtering (count > 100) Quantiles (95 th percentile) Sampling (drop 90% of data) Query stored data Image Compression 5

  6. The JetStream System What : Streaming with aggregation and degradation as first-class primitives Where : Storage and processing at edge Why : Maximize goodput using aggregation and degradation How : Data cubes and feedback control 6

  7. An Example Query Requests Requests CDN Requests How popular is every URL? Requests Requests CDN Requests 7

  8. Mechanism 1: Storage with Aggregation Requests Local Requests CDN Aggregation and Storage Requests Every minute, compute request counts by URL Requests Requests Local CDN Aggregation Requests and Storage 8

  9. Mechanism 2: Adaptive Degradation Requests Local Requests Adjustable CDN Aggregation Filtering and Storage Requests Every minute, compute request counts by URL Requests Requests Local Adjustable CDN Aggregation Filtering Requests and Storage 9

  10. Requirements for Storage Abstraction — Update-able (locally and incrementally) Stored Data += Data — Data size is reducible (with predictable accuracy cost) Data Data — Merge-able (without accuracy penalty) + = Merged Data Data Representation 10

  11. The Data Cube Model Cube: A multidimensional array, indexed by a set of dimensions , whose cells hold aggregates. Aggregation used for: Counts by URL 12:00 12:01 12:02 — Updates www.mysite.com/a 3 5 0 — Roll-ups www.mysite.com/b 0 2 0 www.yoursite.com 5 4 … — Merging cubes www.her-site.com 8 12 … — Summarizing cubes Cubes have aggregation function: Agg( , ) à 11

  12. Cubes can be “Rolled Up” Cube: A multidimensional array, indexed by a set of dimensions , whose cells hold aggregates. Counts by URL 12:00 12:01 12:02 Counts by URL * www.mysite.com/a 3 5 0 www.mysite.com/a 8 www.mysite.com/b 0 2 0 www.mysite.com/b 2 www.yoursite.com 5 4 … www.yoursite.com 9 www.her-site.com 8 12 … www.her-site.com 20 Counts by URL 12:00 12:01 12:02 * 16 23 … 12

  13. Cubes Unify Storage and Aggregation Standing Update sent Query downstream Update Update Stored Data Update One-off query 13

  14. Degradation: The Big Picture Summarized or Dataflow Dataflow Network Approximated Operators Operators Local Data Data Feedback control — Level of degradation auto-tuned to match bandwidth. — Challenge: Supporting mergeability and flexible policies 14

  15. Mergeability Imposes Constraints 01 - 05 06 - 10 11 - 15 16 - 20 21 - 25 26 - 30 Every 5 ?????? Every 6 01 - 06 07 - 12 13 - 18 19 - 24 25 - 30 02 - 06 07 - 11 12 - 16 17 - 21 22 - 26 27 - 31 Every 5 01 - 30 Every 30?? Every 10 01 - 10 11 - 20 21 - 30 — Insight: Degradation may be discontinuous 15

  16. There Are Many Ways to Degrade Data — Can coarsen a dimension — Can drop low-rank values 16

  17. Coarsening Does Not Always Help Domains Domains 256 256 Savings from Aggregation Savings from Aggregation URLs 128 128 64 64 32 32 16 16 8 8 4 4 2 2 1 1 5s minute 5 m hour day 5s minute 5 m hour day Aggregation time period Aggregation time period 17

  18. Degradations Have Trade-offs Name Fixed BW Fixed Accuracy Parameter Savings cost Dim. Coarsening Usually no Yes Dimension Scale Drop values Yes No Cut-off (locally) Drop values No, multi-round Yes Cut-off (globally) protocol Audiovisual Yes Yes Sample rate downsampling Histogram Yes Yes Number of Coarsening Buckets 18

  19. A Simple Idea that Does Not Work Coarsening Incoming Sampled Network data Operator Data Sending 4x too much — We have sensors that report congestion … . — Have operators read sensor and adjust themselves? 19

  20. A Simple Idea that Does Not Work Coarsening Incoming Sampled Network data Operator Data Increase aggregation period up to 10 sec. If insufficient, use sampling Sending 4x too much — We have sensors that report congestion … . — Have operators read sensor and adjust themselves? 20

  21. Challenge: Composite Policies Coarsening Sampling Incoming Network data Operator Operator Sending 4x too much — Chaos if two operators are simultaneously responding to the same sensor 21

  22. Interfacing with Operators Coarsening Sampling Incoming Network data Operator Operator Controller Shrinking data by 50% Sending 4x too much Possible levels: [0%, 50%, 75%, 95%, … ] Go to level 75% 22

  23. Experimental Setup 80 nodes on VICCI testbed at three sites (Seattle, Atlanta, and Germany) Princeton Policy: Drop data if insufficient BW 23

  24. Without Degradation BW (Mbits/sec) 800 Drop 600 BW 400 200 0 0 20 40 60 80 100 120 140 Experiment time (minutes) 1000 1000 1000 Maximum latency Latency (sec) 800 Latency (sec) 800 Latency (sec) 800 95 th percentile latency 600 600 600 400 400 400 200 200 200 Median Latency 0 0 0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Elapsed time (minutes) Elapsed time (minutes) Elapsed time (minutes) 24

  25. Degradation Keeps Latency Bounded BW (Mbits/sec) 400 Bandwidth Shaping 300 200 100 0 0 10 20 30 40 50 60 70 80 90 Experiment time (minutes) 20 20 95 th percentile latency Latency (sec) Latency (sec) 15 15 10 10 Median Latency 5 5 0 0 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Elapsed time (minutes) Elapsed time (minutes) 25

  26. Showing maximum latencies 40 35 30 Latency (sec) 25 Maximum Latency 20 95 th percentile latency 15 10 Median Latency 5 0 0 10 20 30 40 50 60 70 80 90 Elapsed time (minutes) 26

  27. Programming Ease Scenario Lines of code Slow requests 5 Requests by URL 5 Bandwidth by node 15 Bad referrers 16 Latency and size quantiles 25 Success by domain 30 Top 10 domains by period 40 Big Requests 97 27

  28. Conclusions and Future Work — Useful to embed aggregation and degradation abstractions in streaming systems. — Aggregation can be unified with storage. — System must accommodate degradation semantics. — Open questions: — How to guide users to the right degradation policy? — How to embed abstractions in higher-level language? 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend