Operator Placement for Stream-Processing Systems Written by Peter - - PowerPoint PPT Presentation

operator placement for
SMART_READER_LITE
LIVE PREVIEW

Operator Placement for Stream-Processing Systems Written by Peter - - PowerPoint PPT Presentation

Network-Aware Operator Placement for Stream-Processing Systems Written by Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, Margo Seltzer Presented by Ee Lee Ng February 11 Slides adapted from ICDE06 ppt


slide-1
SLIDE 1

February 11 Slides adapted from ICDE06 ppt

Network-Aware Operator Placement for Stream-Processing Systems

Written by Peter Pietzuch, Jonathan Ledlie, Jeffrey Shneidman, Mema Roussopoulos, Matt Welsh, Margo Seltzer Presented by Ee Lee Ng

slide-2
SLIDE 2

2

Large-Scale Stream-Processing

Many geographically distributed data sources

  • e.g., sensors, network routers, RFID tag readers, …
  • High volume of real-time stream data

Many users, submitting individual stream queries

  • Queries use the Internet for stream transport

Queries include operators for stream-processing

  • e.g., join, filter, aggregate, XPath, image analysis, …
  • Operators require nodes for execution
  • In-network processing can often reduce data volume
slide-3
SLIDE 3

3

Stream-Based Overlay Network

SBON Node Data Source User

aggregate transform

slide-4
SLIDE 4

4

Operator Placement Problem

How do you map operators to overlay nodes? Efficiency

  • Node and network resources are limited and shared
  • Operator placement must be network-aware
  • Consider link latency, bandwidth, congestion, jitter, …
  • Filter and aggregate data close to sources

Scalability

  • Must scale to many sources, overlay nodes and queries
  • No global view of the system

Adaptability

  • Resource conditions change over time
slide-5
SLIDE 5

5

Contributions

Stream-Based Overlay Network (SBON)

  • Generic layer between network and stream-processing apps
  • Shields applications from network complexity

Operator placement using a metric cost space

  • Decentralized framework for minimizing network impact
  • Relaxation placement algorithm for operator placement
  • Adaptive to change in network conditions

Deployment of SBON and sample applications (Borealis extension) on PlanetLab

slide-6
SLIDE 6

6

Two conflicting optimization goals:

  • 1. Global system performance with concurrent queries
  • Minimize network usage
  • Balance node and network load
  • 2. Individual query performance
  • Minimize data delay
  • Maximize stream throughput

Operator Placement Goals

 Minimize global network usage

slide-7
SLIDE 7

7

In-flight Traffic: ∑ Datarate * Latency = 17 MB In-flight Traffic: ∑ Datarate * Latency = 5.8 MB

Network Usage

Datarate = 50 MB/s Latency = 100 ms Datarate = 10 MB/s Latency = 200 ms Datarate = 75 MB/s Latency = 100 ms Datarate = 50 MB/s Latency = 50 ms A B Datarate = 50 MB/s Latency = 30 ms Datarate = 10 MB/s Latency = 80 ms Datarate = 75 MB/s Latency = 20 ms Datarate = 50 MB/s Latency = 40 ms

slide-8
SLIDE 8

8

Network-Aware Operator Placement

Perform operator placement in a decentralized fashion

  • Need information about data rate and latency

But measuring network metrics is expensive

  • All pairs latency measurements are O(n2)
  • Network latencies change over time
  • No global knowledge of measurements

Idea: Approximate optimal query with a cost space [NetDB’05]

  • 1. Build metric cost space to encode current network latencies
  • 2. Find query with minimal network usage in cost space
  • 3. Map query back to physical Internet nodes and instantiate
slide-9
SLIDE 9

9

Cost Space

Embed latency measurements into a metric space

  • Assign each SBON node a coordinate in a cost space
  • Euclidean distance ≈ network latency
  • Vivaldi algorithm [MIT]
  • Repeated measurements to refine local coordinate

Advantages

  • Mathematical model for using

geometric algorithms

  • Optimization decisions

without global knowledge

  • Adaptive to change
slide-10
SLIDE 10

10

Relaxation Placement

Find a location for an operator that reduces network usage

slide-11
SLIDE 11

11

Relaxation Placement

Latency Datarate

Use spring relaxation technique to find best location

  • Spring extension ≈ latency
  • Spring constant ≈ data rate
slide-12
SLIDE 12

12

Relaxation Placement

Use spring relaxation technique to find best location

  • Springs “relax” to low energy state, minimizing network usage
  • Dynamically adapts to changes in cost space
slide-13
SLIDE 13

13

Relaxation Placement

Uses nearest k-neighbor search for mapping of coordinates

  • Interesting problem in decentralized context
  • Geometric routing [HUJI], DHT range queries [UCB], …
slide-14
SLIDE 14

14

Relaxation Placement

Any SBON node can perform the placement for a new query

  • Local computation without global state
  • Inputs are coordinates of nodes and data rates in query
  • Supports placement of arbitrary complex queries
  • Model multiple queries as networks of spring

Each node is then responsible for the operators it is hosting

  • Periodically re-execute Relaxation placement
  • Dynamically migrate operator to reflect new placement
  • Adapts to changes in latency and data rate
slide-15
SLIDE 15

15

Simulation Setup

Discrete event simulator to evaluate placement algorithms

  • GATech transit-stub topology with 1550 nodes
  • 10 transit domains and 150 stub domains
  • Realistic Internet routing tables
  • 1000 queries with 5 random endpoints
  • Comparison of Relaxation placement

to 4 other algorithms

1KB/s

Optimal Exhaustive search Producer Common strategy Consumer Central data warehouse Random Worst case

slide-16
SLIDE 16

16

Global Network Usage

  • Relaxation placement performs close to Optimal

10 20 30 40 50 60 70 80 90 100 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500

Percentage of Queries Network Usage (in KB) Placement Algorithm

  • Avg. Network

Usage Penalty Optimal (NU) 0% Relaxation 15% Producer 43% Consumer 60% Random 81%

slide-17
SLIDE 17

17

Application Delay Penalty

  • Consumer has smallest delay penalty
  • Relaxation has low delay penalty for an overlay network

10

  • 50%

50% 150% 20 30 40 50 60 70 80 90 100 0% 100% 200% 250% 300% 350%

Percentage of Queries Delay penalty

Consumer

Delay penalty: Longest path delay IP delay Placement Algorithm

  • Avg. Delay

Penalty Optimal (NU) 13% Relaxation 24% Producer 75% Consumer 0% Random 76%

slide-18
SLIDE 18

18

Operator Migration on PlanetLab

  • Migration decreased network usage for 75% of queries
  • 17% less network usage and 11% lower application delay
  • 48 concurrent

queries on 130 nodes

  • ½ of the queries

could migrate

  • Same initial place-

ment for migrating and non-migrating queries

  • Change in network

usage of migrating queries after 5 hours Improved Network Usage Worse Network Usage

Query Pair Number Relative Improvement in Network Usage

slide-19
SLIDE 19

19

Operator Reuse

Share operators between overlapping sub-queries

  • Use cost space to bound search effort for reuse
slide-20
SLIDE 20

20

Related Work

Borealis [MIT, Brown, Brandeis] , Medusa [MIT], Gates [Ohio]

  • Focus on high-availability and load management
  • Wide-area operator placement specified by user

SAND [Brown] , PIER [UCB]

  • Operator placement at edge (prod/cons) or in-network
  • Exploit DHT routing paths for operator placement
  • Can lead to poor placement efficiency [IPTPS’05]

IrisNet [Intel]

  • Hierarchical placement following DNS structure
slide-21
SLIDE 21

21

Summary

Large-scale stream applications need new systems support

  • SBON: Infrastructure for stream-processing applications
  • Provides network-aware stream query optimization

Cost space approach for query optimization

  • Metric space for decentralised optimization decisions
  • Express query optimization as geometric problem

Relaxation placement algorithm for operator placement

  • Scalable placement decisions reducing network usage
  • Continuous optimization as network conditions change

Thank You. Any Questions?