Tutorial: Stream Processing Languages
Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems
Tutorial: Stream Processing Languages Martin Hirzel, IBM Research - - PowerPoint PPT Presentation
Tutorial: Stream Processing Languages Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems DSLs DSELs (aka EDSLs) Modular Domain Specific Martin Hirzel, IBM Research AI 2 Languages and Tools
Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems
Martin Hirzel, IBM Research AI 2 Modular Domain Specific Languages and Tools [ICSR’98]
Martin Hirzel, IBM Research AI 3
Martin Hirzel, IBM Research AI 4
Performance Generality Productivity
Fast streaming on a cluster Hide complexity of distributed system Support diverse application domains
Different prioritization drives language diversity
Martin Hirzel, IBM Research AI 5
Martin Hirzel, IBM Research AI
The CQL continuous query language: semantic foundations and query execution [VLDBJ’06]
6
SQL, plus IStream and Window
Martin Hirzel, IBM Research AI River: An Intermediate Language for Stream Processing [SP&E'16] 7
Relational algebra, plus convert streams from/to relations S2R Operators R2R Operators R2S Operators Windows: now, unbounded, sliding, tumbling, time, partitioned, F Classic relational algebra: select, project, union, group-by, aggregate, join, F IStream: inserts, DStream: deletes, RStream: relations
Martin Hirzel, IBM Research AI 8
Insert newest Evict oldest Trigger aggregation L T Front Back
as time or count or delta
known as tumbling
size granularity
The CQL continuous query language: semantic foundations and query execution [VLDBJ’06]
Martin Hirzel, IBM Research AI StreamIt: A Compiler for Streaming Applications [MIT TR'05] 9
float->float pipeline ABC { add float->float filter A() { work pop … push 2 { … } } add float->float filter B() { work pop 3 push 1 { … } } add float->float filter C() { work pop 2 push … { … } } }
A B C
2 3 1 2 F F B pops 3 per firing B pushes 1 per firing Statically known push/pop rates
Martin Hirzel, IBM Research AI
Dynamic Expressivity with Static Optimization for Streaming Languages [DEBS'13]
10
A B C
F F
A B C A B C A B C A B C A B C
2 3 1 2 Statically known firing schedule and FIFO queue sizes
Martin Hirzel, IBM Research AI ibmstreams.github.io 11
Martin Hirzel, IBM Research AI
SPL: An Extensible Language for Distributed Stream Processing [TOPLAS'17]
12
Kind of type Type example Literal example Number int32 42 String ustring "Saarbrücken" Boolean boolean true Enumeration enum<error,info,trace> LogLevel.info XML xml<"schemaURI"> '<x a="b">55</x>'x Tuple tuple<float64 x, float64 y> {x=0.5, y=0.8} Map map<ustring, int32> {"Mon": -1, "Fri": 1} List list<int32> [1, 2, 3]
Strongly and statically typed Composite types (tuple, map, list) can nest Streams can carry any tuple type
Martin Hirzel, IBM Research AI
http://www.cs.cornell.edu/bigreddata/cayuga/
13
M-shape (double-top) stock pattern Series of rising peaks and troughs Deep drop below start
Martin Hirzel, IBM Research AI Partition and Compose: Parallel Complex Event Processing [DEBS'12] 14
Composite events Simple events Regular expression Aggregation Key Operator only, no extensions to SPL syntax
Martin Hirzel, IBM Research AI
Stream Processing with a Spreadsheet [ECOOP'14] (Distinguished Paper Award)
15
=SUM(C3:C10) =B3*C3 =B10*C10 =SUM(G3:G10) =A15 =C15 =C12/G12 =B15<G15
Scrolling Scrolling
Martin Hirzel, IBM Research AI
Spreadsheets for Stream Processing with Unbounded Windows and Partitions (DEBS'16)
16
Sheets Time Columns Rows Need more than two dimensions in practice
Martin Hirzel, IBM Research AI
http://www.ibm.com/software/products/en/odm
17
a Client is a business entity identified by a name. a Client is related to a marketer (a Marketer). a Read Event is a business event time-stamped by a date. a Read Event is related to a client (a Client). a Read Event has a topic. a Read Event has a length (a number).
'ClientRules' is an agent related to a Client , processing events :
when a Read Event occurs if the length of this Read Event is more than 'Average Read Event Length' then emit a new Alert where the client is 'the Client' , the topic is the topic of this Read Event , the marketer is the marketer of 'the Client' ;
define 'Average Read Event Length' as the average length of all Read Events during the last period of 6 hours . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Martin Hirzel, IBM Research AI 18
WXS Shard Analytics agent Event agent WXS Shard Analytics agent Event agent WXS Shard Analytics agent Event agent Event router Event Action
Exactly-once event processing Transaction Replication Shuffle, using X10 META: Middleware for Events, Transactions, and Analytics [IBMRD'16]
Martin Hirzel, IBM Research AI A Universal Calculus for Stream Processing Languages [ESOP'10] 19
Pure opaque functions, explicit state
Martin Hirzel, IBM Research AI
From a Calculus to an Execution Environment for Stream Processing [DEBS'12] (Best Paper Award)
20
Atomic steps, non-determinism, fire on any port
Martin Hirzel, IBM Research AI
Medical Telco Science Finance F Insights Actions Streaming engine High-level programming experience
21