An Internet-wide Distributed System for Data-stream Processing - - PowerPoint PPT Presentation

an internet wide distributed system for data stream
SMART_READER_LITE
LIVE PREVIEW

An Internet-wide Distributed System for Data-stream Processing - - PowerPoint PPT Presentation

An Internet-wide Distributed System for Data-stream Processing Gabriel Parmer, Richard West, Xin Qi, Gerald Fry, and Yuting Zhang Boston University Boston, MA gabep1@cs.bu.edu Computer Science Introduction Computer Science Internet


slide-1
SLIDE 1

Computer Science

An Internet-wide Distributed System for Data-stream Processing

Gabriel Parmer, Richard West, Xin Qi, Gerald Fry, and Yuting Zhang

Boston University Boston, MA gabep1@cs.bu.edu

slide-2
SLIDE 2

Computer Science

Introduction

Internet growth has stimulated development of data- rather than CPU-intensive applications

e.g., streaming media delivery, interactive distance learning, webcasting (e.g., SHOUTcast)

Peer-to-peer (P2P) systems now popular

Can efficiently locate data, but not used to deliver it

To date, limited work on scalable delivery & processing of data streams

Especially when these streams have QoS constraints!

Aim: Build an Internet-wide distributed system for delivery & processing of data streams considering QoS throughout

Implement logical network of end-systems Support multiple channels connecting publishers to 1000s of subscribers with individual QoS constraints

slide-3
SLIDE 3

Computer Science

A Data-stream Processing Network

Video sensors (publishers) Static Subscribers Overlay network Mobile Subscriber Wireless Access point Intermediate nodes

slide-4
SLIDE 4

Computer Science

Properties of k-ary n-cubes

E C A G F D B H R1 R2 5 2 4 8 10 1 9 6 3 Physical view Logical view B A C D F E G H 16 18 21 19 12 10 16 7 18 14 10 8 [000] [100] [111] [101] [010] [011]

M = kn nodes in the graph If k = 2, degree of each node is n If k > 2, degree of each node is 2n Worst-case hop count between nodes:

nk/2

Average case path length:

A(k,n) = n (k2/4) 1/k

Optimal dimensionality:

n = ln M Minimizes A(k,n) for given k and n

slide-5
SLIDE 5

Computer Science

QoS considerations in k-ary n- cubes

Methods for considering QoS Routing algorithms

Ordered Dimensional Routing (ODR) Random Ordering of Dimensions (Random) Proximity-based Greedy Routing (Greedy)

Dynamic node re-assignment

Subscribers can exchange their logical identifier with nodes that are closer to the publisher of their data- stream Less hops from publishers to subscribers on average

slide-6
SLIDE 6

Computer Science

Optimizations via routing

10 20 30 40 50 60 70 80 90 100 1 2 4 8 16 32 64 128 256 512

Cumulative % of Subscribers Delay Penalty (relative to unicast)

2x16 ODR 2x16 Random 2x16 Greedy 16x4 ODR 16x4 Random 16x4 Greedy

Greedy routing up to 40% better

slide-7
SLIDE 7

Computer Science

End-system Architecture

Modify COTS systems to support efficient and predictable methods for execution of data-stream processing agents (SPAs).

Must consider QoS throughout, not only on the network level

User-level sandboxing for efficient SPAs:

Provide efficient method for isolating and executing extensions Provide efficient method for passing data between user-level and network interface (eg. by using DMA)

Kernel Level Control / Data Channels

  • Overlay management
  • Resource monitoring

Sandbox Region App process User Level App process SPAs (e.g., routing agents) Publisher Intermediate Subscriber

  • Overlay management
  • Resource monitoring
slide-8
SLIDE 8

Computer Science

User-level Sandbox Implementation

Modify address spaces of all processes to contain

  • ne or more shared pages of virtual addresses

Normally inaccessible at user-level Kernel upcalls to execute sandbox extensions This action also flips the protection bits so sandboxed extensions always execute at user-level, thus protecting the kernel . . .

Process- private address space Sandbox region (shared virtual address space) Kernel Level User Level P1 P2 Mapped data Pn SPA for Pn SPA for P2 Kernel events make sandbox region user-level accessible

Can avoid address- space context switching costs when executing extensions because they exist in all address spaces

slide-9
SLIDE 9

Computer Science

SPA predictable execution support

User-level networking stack in sandbox

Interacts with the NIC via DMA Can execute and process at interrupt-time because sandbox is resident in every address space

Elimination of extra copies allows for greater efficiency Interrupt-time execution allows isolation and predictability

slide-10
SLIDE 10

Computer Science

Conclusions

Use ideas from overlay routing and user-level sandboxing to implement an Internet-wide distributed system

Provide efficient support for app-specific services and scalable data delivery

QoS is important throughout the entire system and should be considered on the network as well as end-host level