Tutorial: Stream Processing Languages Martin Hirzel, IBM Research - - PowerPoint PPT Presentation

tutorial stream processing languages
SMART_READER_LITE
LIVE PREVIEW

Tutorial: Stream Processing Languages Martin Hirzel, IBM Research - - PowerPoint PPT Presentation

Tutorial: Stream Processing Languages Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems DSLs DSELs (aka EDSLs) Modular Domain Specific Martin Hirzel, IBM Research AI 2 Languages and Tools


slide-1
SLIDE 1

Tutorial: Stream Processing Languages

Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems

slide-2
SLIDE 2

DSLs ⊇ DSELs (aka EDSLs)

Martin Hirzel, IBM Research AI 2 Modular Domain Specific Languages and Tools [ICSR’98]

slide-3
SLIDE 3

Definitions

  • A stream is a conceptually infinite ordered

sequence of data items.

  • A streaming application is a computer

program that continuously ingests input streams and produces output streams.

  • A stream processing language is a DSL

(domain-specific language) for writing streaming applications.

Martin Hirzel, IBM Research AI 3

slide-4
SLIDE 4

Requirements

Martin Hirzel, IBM Research AI 4

Performance Generality Productivity

Fast streaming on a cluster Hide complexity of distributed system Support diverse application domains

Different prioritization drives language diversity

slide-5
SLIDE 5

Outline

  • Streaming SQL: CQL
  • Synchronous Dataflow: StreamIt
  • Explicit Stream Graph: SPL
  • Complex Events: MatchRegex
  • Reactive: ActiveSheets
  • Controlled Natural Language: META
  • Foundational Calculus: Brooklet

Martin Hirzel, IBM Research AI 5

slide-6
SLIDE 6

Streaming SQL: CQL

Martin Hirzel, IBM Research AI

The CQL continuous query language: semantic foundations and query execution [VLDBJ’06]

6

SQL, plus IStream and Window

slide-7
SLIDE 7

Streaming SQL: CQL

Martin Hirzel, IBM Research AI River: An Intermediate Language for Stream Processing [SP&E'16] 7

Relational algebra, plus convert streams from/to relations S2R Operators R2R Operators R2S Operators Windows: now, unbounded, sliding, tumbling, time, partitioned, F Classic relational algebra: select, project, union, group-by, aggregate, join, F IStream: inserts, DStream: deletes, RStream: relations

slide-8
SLIDE 8

Streaming SQL: CQL

Martin Hirzel, IBM Research AI 8

Insert newest Evict oldest Trigger aggregation L T Front Back

  • In general: policies specified

as time or count or delta

  • Special case T = L also

known as tumbling

size granularity

The CQL continuous query language: semantic foundations and query execution [VLDBJ’06]

slide-9
SLIDE 9

Synchronous Dataflow: StreamIt

Martin Hirzel, IBM Research AI StreamIt: A Compiler for Streaming Applications [MIT TR'05] 9

float->float pipeline ABC { add float->float filter A() { work pop … push 2 { … } } add float->float filter B() { work pop 3 push 1 { … } } add float->float filter C() { work pop 2 push … { … } } }

A B C

2 3 1 2 F F B pops 3 per firing B pushes 1 per firing Statically known push/pop rates

slide-10
SLIDE 10

Synchronous Dataflow: StreamIt

Martin Hirzel, IBM Research AI

Dynamic Expressivity with Static Optimization for Streaming Languages [DEBS'13]

10

A B C

F F

A B C A B C A B C A B C A B C

2 3 1 2 Statically known firing schedule and FIFO queue sizes

slide-11
SLIDE 11

Explicit Stream Graph: SPL

Martin Hirzel, IBM Research AI ibmstreams.github.io 11

slide-12
SLIDE 12

Explicit Stream Graph: SPL

Martin Hirzel, IBM Research AI

SPL: An Extensible Language for Distributed Stream Processing [TOPLAS'17]

12

Kind of type Type example Literal example Number int32 42 String ustring "Saarbrücken" Boolean boolean true Enumeration enum<error,info,trace> LogLevel.info XML xml<"schemaURI"> '<x a="b">55</x>'x Tuple tuple<float64 x, float64 y> {x=0.5, y=0.8} Map map<ustring, int32> {"Mon": -1, "Fri": 1} List list<int32> [1, 2, 3]

Strongly and statically typed Composite types (tuple, map, list) can nest Streams can carry any tuple type

slide-13
SLIDE 13

Complex Events: MatchRegex

Martin Hirzel, IBM Research AI

http://www.cs.cornell.edu/bigreddata/cayuga/

13

M-shape (double-top) stock pattern Series of rising peaks and troughs Deep drop below start

  • f match
slide-14
SLIDE 14

Complex Events: MatchRegex

Martin Hirzel, IBM Research AI Partition and Compose: Parallel Complex Event Processing [DEBS'12] 14

Composite events Simple events Regular expression Aggregation Key Operator only, no extensions to SPL syntax

slide-15
SLIDE 15

Reactive: ActiveSheets

Martin Hirzel, IBM Research AI

Stream Processing with a Spreadsheet [ECOOP'14] (Distinguished Paper Award)

15

=SUM(C3:C10) =B3*C3 =B10*C10 =SUM(G3:G10) =A15 =C15 =C12/G12 =B15<G15

Scrolling Scrolling

slide-16
SLIDE 16

Reactive: ActiveSheets

Martin Hirzel, IBM Research AI

Spreadsheets for Stream Processing with Unbounded Windows and Partitions (DEBS'16)

16

Sheets Time Columns Rows Need more than two dimensions in practice

slide-17
SLIDE 17

Controlled Natural Language: META

Martin Hirzel, IBM Research AI

http://www.ibm.com/software/products/en/odm

17

  • -- data model ---

a Client is a business entity identified by a name. a Client is related to a marketer (a Marketer). a Read Event is a business event time-stamped by a date. a Read Event is related to a client (a Client). a Read Event has a topic. a Read Event has a length (a number).

  • -- agent descriptor ---

'ClientRules' is an agent related to a Client , processing events :

  • Read Event , where this Client comes from the client of this Read Event
  • -- event-condition-action rule ---

when a Read Event occurs if the length of this Read Event is more than 'Average Read Event Length' then emit a new Alert where the client is 'the Client' , the topic is the topic of this Read Event , the marketer is the marketer of 'the Client' ;

  • -- global event query ---

define 'Average Read Event Length' as the average length of all Read Events during the last period of 6 hours . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

slide-18
SLIDE 18

Controlled Natural Language: META

Martin Hirzel, IBM Research AI 18

WXS Shard Analytics agent Event agent WXS Shard Analytics agent Event agent WXS Shard Analytics agent Event agent Event router Event Action

Exactly-once event processing Transaction Replication Shuffle, using X10 META: Middleware for Events, Transactions, and Analytics [IBMRD'16]

slide-19
SLIDE 19

Foundational Calculus: Brooklet

Martin Hirzel, IBM Research AI A Universal Calculus for Stream Processing Languages [ESOP'10] 19

Pure opaque functions, explicit state

slide-20
SLIDE 20

Foundational Calculus: Brooklet

Martin Hirzel, IBM Research AI

From a Calculus to an Execution Environment for Stream Processing [DEBS'12] (Best Paper Award)

20

Atomic steps, non-determinism, fire on any port

slide-21
SLIDE 21

Democratization of Streaming

Martin Hirzel, IBM Research AI

Medical Telco Science Finance F Insights Actions Streaming engine High-level programming experience

21