Computer Science
An Internet-wide Distributed System for Data-stream Processing - - PowerPoint PPT Presentation
An Internet-wide Distributed System for Data-stream Processing - - PowerPoint PPT Presentation
An Internet-wide Distributed System for Data-stream Processing Gabriel Parmer, Richard West, Xin Qi, Gerald Fry, and Yuting Zhang Boston University Boston, MA gabep1@cs.bu.edu Computer Science Introduction Computer Science Internet
SLIDE 1
SLIDE 2
Computer Science
Introduction
Internet growth has stimulated development of data- rather than CPU-intensive applications
e.g., streaming media delivery, interactive distance learning, webcasting (e.g., SHOUTcast)
Peer-to-peer (P2P) systems now popular
Can efficiently locate data, but not used to deliver it
To date, limited work on scalable delivery & processing of data streams
Especially when these streams have QoS constraints!
Aim: Build an Internet-wide distributed system for delivery & processing of data streams considering QoS throughout
Implement logical network of end-systems Support multiple channels connecting publishers to 1000s of subscribers with individual QoS constraints
SLIDE 3
Computer Science
A Data-stream Processing Network
Video sensors (publishers) Static Subscribers Overlay network Mobile Subscriber Wireless Access point Intermediate nodes
SLIDE 4
Computer Science
Properties of k-ary n-cubes
E C A G F D B H R1 R2 5 2 4 8 10 1 9 6 3 Physical view Logical view B A C D F E G H 16 18 21 19 12 10 16 7 18 14 10 8 [000] [100] [111] [101] [010] [011]
M = kn nodes in the graph If k = 2, degree of each node is n If k > 2, degree of each node is 2n Worst-case hop count between nodes:
nk/2
Average case path length:
A(k,n) = n (k2/4) 1/k
Optimal dimensionality:
n = ln M Minimizes A(k,n) for given k and n
SLIDE 5
Computer Science
QoS considerations in k-ary n- cubes
Methods for considering QoS Routing algorithms
Ordered Dimensional Routing (ODR) Random Ordering of Dimensions (Random) Proximity-based Greedy Routing (Greedy)
Dynamic node re-assignment
Subscribers can exchange their logical identifier with nodes that are closer to the publisher of their data- stream Less hops from publishers to subscribers on average
SLIDE 6
Computer Science
Optimizations via routing
10 20 30 40 50 60 70 80 90 100 1 2 4 8 16 32 64 128 256 512
Cumulative % of Subscribers Delay Penalty (relative to unicast)
2x16 ODR 2x16 Random 2x16 Greedy 16x4 ODR 16x4 Random 16x4 Greedy
Greedy routing up to 40% better
SLIDE 7
Computer Science
End-system Architecture
Modify COTS systems to support efficient and predictable methods for execution of data-stream processing agents (SPAs).
Must consider QoS throughout, not only on the network level
User-level sandboxing for efficient SPAs:
Provide efficient method for isolating and executing extensions Provide efficient method for passing data between user-level and network interface (eg. by using DMA)
Kernel Level Control / Data Channels
- Overlay management
- Resource monitoring
Sandbox Region App process User Level App process SPAs (e.g., routing agents) Publisher Intermediate Subscriber
- Overlay management
- Resource monitoring
SLIDE 8
Computer Science
User-level Sandbox Implementation
Modify address spaces of all processes to contain
- ne or more shared pages of virtual addresses
Normally inaccessible at user-level Kernel upcalls to execute sandbox extensions This action also flips the protection bits so sandboxed extensions always execute at user-level, thus protecting the kernel . . .
Process- private address space Sandbox region (shared virtual address space) Kernel Level User Level P1 P2 Mapped data Pn SPA for Pn SPA for P2 Kernel events make sandbox region user-level accessible
Can avoid address- space context switching costs when executing extensions because they exist in all address spaces
SLIDE 9
Computer Science
SPA predictable execution support
User-level networking stack in sandbox
Interacts with the NIC via DMA Can execute and process at interrupt-time because sandbox is resident in every address space
Elimination of extra copies allows for greater efficiency Interrupt-time execution allows isolation and predictability
SLIDE 10
Computer Science