Aggregation and Degradation in JetStream: Streaming analytics in - - PowerPoint PPT Presentation

aggregation and degradation in jetstream streaming
SMART_READER_LITE
LIVE PREVIEW

Aggregation and Degradation in JetStream: Streaming analytics in - - PowerPoint PPT Presentation

NSDI 2014 Aggregation and Degradation in JetStream: Streaming analytics in the wide area Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman Princeton University 2014 12 11 1


slide-1
SLIDE 1

Aggregation and Degradation in JetStream: Streaming analytics in the wide area

Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S. Pai, and Michael J. Freedman Princeton University

报告人:申毅杰 2014年12月11日 NSDI 2014

1

slide-2
SLIDE 2

Outline

  • Motivation
  • Solutions

– Aggregation – Degradation

  • Experiment
  • Related work
  • Conclusions

2

slide-3
SLIDE 3

Motivation

  • Target

– Analyze data be continuously created across wide- area networks

  • Challenges

– Queries have real-time requirements – Available bandwidth is limited & change over time

  • Goal

– Optimize use of WAN links by exposing them to stream system

3

slide-4
SLIDE 4

Limitation of Current systems

  • Address latency in a single datacenter with

high-bandwidth

– E.g. Google MillWheel, Storm, Spark Streaming – Edge node backhaul all potential useful data to central location

  • High bandwidth demand
  • Limited use of edge nodes’ storage & computation

– Developer should specify everything based on pessimistic assumption about bandwidth

  • Bandwidth is not used efficiently

4

slide-5
SLIDE 5

JetStream’s Methodology

  • Reducing the data being transferred

– Aggregation: store & process data at edge

  • Data cube

– Degradation: monitor available bandwidth & reduce data size at the expense of accuracy

  • Feedback control
  • Application Scenarios

– Log processing across the globe – Smart electric grids, highway – Networks of Video cameras

5

slide-6
SLIDE 6

A Example Query

6

slide-7
SLIDE 7

Mechanism 1: Storage with aggregation

7

CDN Requests Every minute, compute request count by URL

Local Aggregation & Storage

CDN Requests

Local Aggregation & Storage

slide-8
SLIDE 8

Mechanism 2: Adaptive Degradation

8

CDN Requests Every minute, compute request count by URL

Local Aggregation & Storage

Degradation Operation

CDN Requests

Local Aggregation & Storage

Degradation Operation

slide-9
SLIDE 9

The Data Cube Model

  • Cube

– A multi-dimensional array, indexed by a set of dimensions, whose cells holds aggregates

9

Aggregation can:

  • Updates
  • Roll-ups
  • Merging cubes
  • Summarizing cubes
slide-10
SLIDE 10

Aggregates on Cubes

  • Roll-up: Aggregate along some dimension

10

Aggregate functions supported by JetStream should be deterministic & Order-independent

slide-11
SLIDE 11

Cube Unify Storage & Aggregation

  • Operators in traditional Stream Processing

System

– Stateful, maintaining state in itself – Store input tuples into durable buffer

  • Replay to restore state in face of Node failure
  • Or, re-scan all the data on every query
  • Operators in JetStream

– Query the cube each time and generate results – Cube are stored where it is generated

11

slide-12
SLIDE 12

Degradation: The Big Picture

  • Level of degradation auto-tuned to match

bandwidth

12

Local D at a t a S u S um m ari ze zed

  • r

A ppoxi m at e t ed D at a t a

Operators

N et w or k Feedback C ont rol

slide-13
SLIDE 13

Degradation Mechnisms

  • Achieved via three components

– Operators with multiple degradation level – Congestion monitor measures the available bandwidth – Policy specify how to adjust degradation level to meet bandwidth

13

slide-14
SLIDE 14

Components of Degradation

  • Degradation Operator

– Associate with a set of degradation levels

  • E.g. roll-up across different time intervals(1s, 5s, 10s)

– Characterize the levels with bandwidth usage

  • E.g. [1, 0.2, 0.1]
  • Monitoring bandwidth

– Attached to each queue in system – Network congestion

  • Insert periodic markers & get response

– Storage bottleneck

  • Change queue length & measure the rate of queue growth

14

slide-15
SLIDE 15

Components of Degradation

  • Congestion response policies (inside a

controller)

– Several operators affect queue length – A single degradation technique is only useful up to a certain level – Several operators degradation should be combined to reach a limitation in bandwidth – Policy control priories or simultaneous degradation in multiple operators

15

slide-16
SLIDE 16

Example: degradation in image sending

  • By default, send all images at maximum

fidelity from cameras to a central repository

16

C ube t

  • t
  • st
  • t
  • re

vi deo

D ow nsam pl e D r D ropFram e N et w or k

Send Image by X fidelity X levels: [50%, 75%] Reduce FrameRate by [25%, 50%, 75%] Controller

Policy

slide-17
SLIDE 17

Degradation methods

  • Coarsen a dimension
  • Drop low-rank values
  • Consistent sampling
  • Synopsis approximation

17

slide-18
SLIDE 18

Challenge: Mergeability of heterogeneous data

  • Since degradation level will vary over time &

vary across different nodes feeding into a single cube, no additional penalty is desired

18

slide-19
SLIDE 19

Experiment Setup

  • 80 nodes on VICCI testbed at three sites

– Seattle – Atlanta – Germany

  • (Send image) To a single union node in

Princeton

  • Degradation Policy

– Drop data if insufficient Bandwidth

19

slide-20
SLIDE 20

Without & with degradation

  • a683

20

slide-21
SLIDE 21

Related Works

  • Single datacenter stream processing

– Google MillWheel, Spark-Streaming, Storm – All rely on underlying fault tolerant storage system – Orthogonal to JetStream

  • Wide area streaming system

– Use redundant path for performance – Assume edge nodes has little computation ability

21

slide-22
SLIDE 22

Conclusion

  • Useful to embed aggregation and degradation

abstraction in streaming systems

  • Aggregation can be unified with storage
  • Degradation semantic is workflow specific

22