HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben - PowerPoint PPT Presentation

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben Meyer, Marvin Thiele, Anton von Weltzien 14.04.2020 Masterprojekt WS 19/20 Data Engineering Systems 1

Agenda 1. Goals 2. Features 3. Architecture Overview 4. Query Transformation 5. Query Execution 6. Ad-hoc join processing 7. Benchmark Results Chart 2

Goals 1. Build a standalone prototype of a stream processing engine that has first class support for dynamic query deployment and removal 2. Support processing simple queries and streams 3. Support online optimizations for efficient multi-query processing Chart 3

Features - Stream Processing Framework written in Java 11 - Ad-hoc addition and removal of arbitrary queries - Single node, multi-threaded execution - Optimization for Joins and Aggregations in multi-query execution - Queries are defined in a Flink-like dataflow language - Support for Sliding- and Tumbling-Windows both with Event- and Processing-Time Chart 4

Dataflow API Overview Chart 5

Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 6

Code Example JobManager jobManager = new JobManager(); Create JobManager and start jobManager.runEngine(); engine NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 7

Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); Define Sources NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 8

Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); Create new query with TopologyBuilder TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 9

Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); Define query s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 10

Code Example JobManager jobManager = new JobManager(); jobManager.runEngine(); NetworkSource nws1 = new NetworkSource(7001, ...); NetworkSource nws2 = new NetworkSource(7002, ...); TopologyBuilder builder = TopologyBuilder.newQuery(); AStream<Tuple3<String,Float,Long>> s1 = builder.streamOf(nws1); AStream<Tuple3<String,Float,Long>> s2 = builder.streamOf(nws2); s1.window(TumblingWindow.ofEventTime(Time.seconds(5))) .join(s2, (t1, t2) -> new Tuple4<>(t1.v1,t1.v2,t2.v2,t1.v3), Tuple3::v1, Tuple3::v1, WatermarkGenerator.seconds(1, 1_000), t3 -> t3.v4 ) .to(new FileSink("join")); Query joinQuery = builder.buildAsQuery(); Build and submit query jobManager.addQuery(joinQuery, 50, ChronoUnit.Seconds); Chart 11

3. Architecture Overview 12

Chart 13

Chart 14

4. Query Transformation 15

Transformation Pipeline Chart 16

Operators Source: - Read from a source - Attaches metadata OneInputOperator: - Transform a single event into n new events TwoInputOperator: - Transform events from two different origins Sink: - Write to a sink Chart 17

Source Operator Chart 18

Logical Plan Source OneInput Sink Node Node Node BinaryInput Node Source OneInput Sink Node Node Node Chart 19

Execution Plan Source OneInput Slot PushSlot TwoInput PullSlot Source OneInput Slot PushSlot Chart 20

Transformation Properties Layered architecture decouples query definition and execution. - Interchangeable query definition - Interchangeable Execution Plan Chart 21

5. Query Execution 22

Routing Chart 23

Slot Operator Operator Collector Operator Event Events Operator Chart 24

Slot Types Pull Slot Push Slot Thread reads Slot Buffer Slot Event Event Events Events Chart 25

Execution Plan Source OneInput Slot PushSlot TwoInput PullSlot Source OneInput Slot PushSlot Chart 26

6. Ad-hoc join processing 27

Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Chart 28

Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Indexing ● Windowing ● Chart 29

Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Indexing ● Set intersection ● Windowing ● of join index Chart 30

Efficient Distributed Join Architecture Source Operator Join operator 1 Join operator Sink Operator N ... Indexing ● Set intersection ● Joins matching tuples ● Windowing ● of join index Pushes to output ● channels Chart 31

AJoin in HDES Upstream Operator AJoin Downstream Operator Source Join Sink Source Upstream Operator Chart 32

HDES AJoin Example Orders <OrderID, ItemID, …> AJoin Shipped Orders <OrderID, ShipmentID, ItemID …> Source Join Sink Source <ShipmentID, OrderID, …> Shipments Chart 33

HDES AJoin Example AJoin Orders Source Shipped Orders <OrderID, ItemID, … > Join Sink <OrderID, ShipmentID, ItemID … > Shipments Source <ShipmentID, OrderID, … > <1, 5,...> <1, 8,...> <4, 7,...> <5, 2,...> <5, 7,...> <5, 7,...> <6, 8,...> <9, 1,...> <6, 4,...> <3, 5,...> Chart 34

HDES AJoin Example AJoin Orders Source Shipped Orders <OrderID, ItemID, … > Join Sink <OrderID, ShipmentID, ItemID … > Shipments Source <ShipmentID, OrderID, … > <5, 7,...> <6, 8,...> <9, 1,...> <6, 4,...> <3, 5,...> <1, 5,...> <1, 8,...> <4, 7,...> <5, 2,...> <5, 7,...> 1 ← [<1, 5,...>, <1,8,...>] 7 ← [<5, 7,...>] 8 ← [<6, 8,...>] 4 ← [<4,7,...>] 1 ← [<9, 1,...>] 4 ← [<6, 4,...>] 5 ← [<5,2,...>, <5,7,...>] 5 ← [<3, 5,...>] Chart 35 Orders Bucket Shipment Bucket

HDES AJoin Example AJoin Orders Source Shipped Orders <OrderID, ItemID, … > Join Sink <OrderID, ShipmentID, ItemID … > Shipments Source <ShipmentID, OrderID, … > 1 ← [<1, 5,...>, <1,8,...>]|4 ← [<4,7,...>]|5 ← [<5,2,...>, <5,7,...>] 7 ← [<5, 7,...>]|8 ← [<6, 8,...>]|1 ← [<9, 1,...>]|4 ← [<6, 4,...>]|5 ← [<3, 5,...>] [<1, 5,...>, <1,8,...>] [<5,2,...>, <5,7,...>] 1 ← 5 ← [<9, 1,...>] [<3, 5,...>] [<4,7,...>] 4 ← Chart 36 [<6, 4,...>]

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben - PowerPoint PPT Presentation

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben Meyer, Marvin Thiele, Anton von Weltzien 14.04.2020 Masterprojekt WS 19/20 Data Engineering Systems 1 Agenda 1. Goals 2. Features 3. Architecture Overview 4. Query

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

SPADE: The System S Declarative Stream Processing Engine B.Gedik, H. Andrade, K. Wu, P. Yu, and

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

An Engine for Ontology-Based Stream Processing Theory and Implementation Christian Neuenstadt 6.

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

An Introduction To Data Stream Query Processing Neil Conway <nconway@truviso.com> Truviso,

Text Stream Processing Dunja Mladeni Artificial Intelligence Laboratory Marko Grobelnik Jo

Auto-sizing for Stream Processing Applications at LinkedIn Rayman Preet Singh, Bharath

Introduction to Data Stream Processing Amir H. Payberah payberah@kth.se 19/09/2019 The Course

Whats New in Engine Research Whats New in Engine Research Mark Musculus Engine Combustion

1 Mapping Relational Data Model Patterns To The App Engine Datastore Max Ross November 19,

Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

Chapter 8 The Max-Flow Min-Cut Theorem Prof. Tesler Math 154 Winter 2020 Prof. Tesler Ch. 8:

Numerical Experiments on (Proto) star Formation Aleksandra Kuznetsova University of Michigan

Routing Algorithms for Mobile Ad Hoc Networks Costas Busch (RPI) Srikanth Surapaneni (RPI)

[ ] doff : s E.g., each verb can have its own distribution of arguments NP obj doff subj

Directed Diffusion II Matching Data Dissemination Algorithms to Application Requirements, John

CS 525M Mobile and Ubiquitous Computing Seminar A Survey on Sensor Networks presented by

Lecture 12: Sequential Networks: Timing (contd), Standard Modules CSE 140: Components and Design

" CS"#$I&"'$B ") Lo op sk ewing Used with lo op in terc hange

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben - PowerPoint PPT Presentation

HDES: A Dynamic Stream Processing Engine Nico Duldhardt, Torben Meyer, Marvin Thiele, Anton von Weltzien 14.04.2020 Masterprojekt WS 19/20 Data Engineering Systems 1 Agenda 1. Goals 2. Features 3. Architecture Overview 4. Query

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

SPADE: The System S Declarative Stream Processing Engine B.Gedik, H. Andrade, K. Wu, P. Yu, and

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

An Engine for Ontology-Based Stream Processing Theory and Implementation Christian Neuenstadt 6.

Search Engine Optimization What is Search Engine Optimization Search Engine Optimization is the

Stream Ciphers Stream Ciphers 1 Stream Ciphers Generalization of one-time pad Trade

An Introduction To Data Stream Query Processing Neil Conway &lt;nconway@truviso.com&gt; Truviso,

Text Stream Processing Dunja Mladeni Artificial Intelligence Laboratory Marko Grobelnik Jo

Auto-sizing for Stream Processing Applications at LinkedIn Rayman Preet Singh, Bharath

Introduction to Data Stream Processing Amir H. Payberah payberah@kth.se 19/09/2019 The Course

Whats New in Engine Research Whats New in Engine Research Mark Musculus Engine Combustion

1 Mapping Relational Data Model Patterns To The App Engine Datastore Max Ross November 19,

Google App Engine Guido van Rossum Stanford EE380 Colloquium, Nov 5, 2008 Google App Engine

Fresh water stream ecosystem Gr ov p 2 The description of stream lives Quadrat 1: Hong Kong Newt

Phase III Stream Assessment Study: Potential Stream Restoration Projects Strawberry Run and

Chapter 8 The Max-Flow Min-Cut Theorem Prof. Tesler Math 154 Winter 2020 Prof. Tesler Ch. 8:

Numerical Experiments on (Proto) star Formation Aleksandra Kuznetsova University of Michigan

Routing Algorithms for Mobile Ad Hoc Networks Costas Busch (RPI) Srikanth Surapaneni (RPI)

[ ] doff : s E.g., each verb can have its own distribution of arguments NP obj doff subj

Directed Diffusion II Matching Data Dissemination Algorithms to Application Requirements, John

CS 525M Mobile and Ubiquitous Computing Seminar A Survey on Sensor Networks presented by

Lecture 12: Sequential Networks: Timing (contd), Standard Modules CSE 140: Components and Design

&quot; CS&quot;#$I&amp;&quot;'$B &quot;) Lo op sk ewing Used with lo op in terc hange

An Introduction To Data Stream Query Processing Neil Conway <nconway@truviso.com> Truviso,

" CS"#$I&"'$B ") Lo op sk ewing Used with lo op in terc hange