Yunhao Zhang, Rong Chen, Haibo Chen
Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University
Sub-millisecond Stateful Stream Querying over Fast-evolving Linked - - PowerPoint PPT Presentation
Sub-millisecond Stateful Stream Querying over Fast-evolving Linked Data Yunhao Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University Stream Query is Important Multiple data sources are
Institute of Parallel and Distributed Systems (IPADS) Shanghai Jiao Tong University
2
3
4
5
6
Feed
Like
Rong Feed #SOSP# Yunhao Haibo
Haibo IPADS Rong
Yunhao
member_of
Cornell
7
IPADS
?Feed time
8
Feed 12:30 Like 12:40
Rong Feed Haibo
Yunhao
12:31
Rong Haibo IPADS Rong Feed Haibo
9
10
11
OSDI’16 Stream Processing System Graph Store System
12
13
14
► Eliminate cross-system cost ► Global semantics for query optimization ► Better scalability by sharing data between the queries
15
► Hybrid Store: efficiently handle streaming data and fast-
► Stream Index: fast path to access streaming data in a
► Consistent Data View: through decentralized vector
16
Continuous Query One-shot Query
Engine Store Engine Store Engine Store
Serve queries Hold data partition
17
18
user-defined predicate
19
► Timeless Data: continuous persistent store ► Timed Data: time-based transient store
20
► Continuously absorb the timeless portion of streams ► Goal: support stateful continuous query and up-to-
21
► Timed data will only be accessed by relevant
► Goal: support fast garbage collection (GC) for the
22
► Streaming data contains order information ► Early output from a stream source should always be
► No order relation across data sources
Source0: Source1: Server0 Server1 11 11 12 12 Local_VTS Local_VTS Stable_VTS
4 4 5 5
8:00 8:01 8:02
4 5 11 12
Time 23
24
Key
Server0 11 12 4 [4,10] 5 [4,11] [4,12]
[5,12]
25
26
► Stream index & locality-aware partitioning ► Data-driven query trigger ► One-shot query execution ► Fault tolerance ► Leveraging RDMA
27
28
□ Wukong+S: sub-millisecond □ 13.7X speedup vs. Storm+Wukong □ 3 orders of magnitude speedup vs. Spark Streaming
10 20 30 40 50 L1 L2 L3 L4 L5 L6
219 527 712 346 2215 1422 latency (msec) 0.10 0.08 0.11 0.23 1.64 2.62 31.14 40.77 49.03 1.78 3.50 1.68
29
□ Cross-system Cost for Storm+Wukong □ Joining large stored data (3.75B) for Spark Streaming
10 20 30 40 50 L1 L2 L3 L4 L5 L6
219 527 712 346 2215 1422 latency (msec) 0.10 0.08 0.11 0.23 1.64 2.62 31.14 40.77 49.03 1.78 3.50 1.68
30
Mixture of LSBench 1-6
200 400 600 800 2 4 6 8
#machine
200 400 600 800 1000 2 4 6 8
Mixture of LSBench 1-3
#machine throughput (kilo query/second) throughput (kilo query/second)
31
► Influence of different stream rate ► Data insertion latency ► Performance of one-shot queries ► Memory consumption ► Fault-tolerance overhead
32
33
http://ipads.se.sjtu.edu.cn/projects/wukong
34