tutorial stream processing languages
play

Tutorial: Stream Processing Languages Martin Hirzel, IBM Research - PowerPoint PPT Presentation

Tutorial: Stream Processing Languages Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems DSLs DSELs (aka EDSLs) Modular Domain Specific Martin Hirzel, IBM Research AI 2 Languages and Tools


  1. Tutorial: Stream Processing Languages Martin Hirzel, IBM Research AI 1 November 2017 Dagstuhl Seminar on Big Stream Processing Systems

  2. DSLs ⊇ DSELs (aka EDSLs) Modular Domain Specific Martin Hirzel, IBM Research AI 2 Languages and Tools [ICSR’98]

  3. Definitions • A stream is a conceptually infinite ordered sequence of data items. • A streaming application is a computer program that continuously ingests input streams and produces output streams. • A stream processing language is a DSL (domain-specific language) for writing streaming applications. Martin Hirzel, IBM Research AI 3

  4. Requirements Performance Fast streaming on a cluster Hide complexity of Support diverse distributed system application domains Generality Productivity � Different prioritization drives language diversity Martin Hirzel, IBM Research AI 4

  5. Outline • Streaming SQL: CQL • Synchronous Dataflow: StreamIt • Explicit Stream Graph: SPL • Complex Events: MatchRegex • Reactive: ActiveSheets • Controlled Natural Language: META • Foundational Calculus: Brooklet Martin Hirzel, IBM Research AI 5

  6. Streaming SQL: CQL � SQL, plus IStream and Window The CQL continuous query language: semantic Martin Hirzel, IBM Research AI 6 foundations and query execution [VLDBJ’06]

  7. Streaming SQL: CQL S2R Operators R2R Operators R2S Operators Windows: now, Classic relational IStream: inserts, unbounded, algebra: select, DStream: deletes, sliding, tumbling, project, union, RStream: relations time, partitioned, group-by, F aggregate, join, F � Relational algebra, plus convert streams from/to relations River: An Intermediate Language for Martin Hirzel, IBM Research AI 7 Stream Processing [SP&E'16]

  8. Streaming SQL: CQL Insert newest size • In general: policies specified T as time or count or delta Back Front • Special case T = L also L known as tumbling granularity Evict oldest Trigger aggregation The CQL continuous query language: semantic Martin Hirzel, IBM Research AI 8 foundations and query execution [VLDBJ’06]

  9. Synchronous Dataflow: StreamIt F float->float pipeline ABC { add float->float filter A() { A work pop … push 2 { … } 2 } 3 B pops 3 per firing add float->float filter B () { B work pop 3 push 1 B pushes 1 per firing { … } 1 } 2 add float->float filter C() { C work pop 2 push … { … } F } } � Statically known push/pop rates StreamIt: A Compiler for Streaming Martin Hirzel, IBM Research AI 9 Applications [MIT TR'05]

  10. Synchronous Dataflow: StreamIt F A A A A A A 2 3 B B B B B B 1 2 C C C C C C F � Statically known firing schedule and FIFO queue sizes Dynamic Expressivity with Static Optimization Martin Hirzel, IBM Research AI 10 for Streaming Languages [DEBS'13]

  11. Explicit Stream Graph: SPL ibmstreams.github.io Martin Hirzel, IBM Research AI 11

  12. Explicit Stream Graph: SPL Kind of type Type example Literal example Number int32 42 String ustring "Saarbrücken" Boolean boolean true Enumeration enum<error,info,trace> LogLevel.info XML xml<"schemaURI"> '<x a="b">55</x>'x Tuple tuple<float64 x, float64 y> {x=0.5, y=0.8} Map map<ustring, int32> {"Mon": -1, "Fri": 1} List list<int32> [1, 2, 3] � Strongly and statically typed � Composite types (tuple, map, list) can nest � Streams can carry any tuple type SPL: An Extensible Language for Distributed Martin Hirzel, IBM Research AI 12 Stream Processing [TOPLAS'17]

  13. Complex Events: MatchRegex Series of rising peaks and troughs Deep drop below start of match M-shape (double-top) stock pattern Martin Hirzel, IBM Research AI 13 http://www.cs.cornell.edu/bigreddata/cayuga/

  14. Complex Events: MatchRegex Composite events Simple events Regular expression Key Aggregation � Operator only, no extensions to SPL syntax Partition and Compose: Parallel Martin Hirzel, IBM Research AI 14 Complex Event Processing [DEBS'12]

  15. Reactive: ActiveSheets =B3*C3 Scrolling Scrolling =B10*C10 =SUM(C3:C10) =SUM(G3:G10) =A15 =C15 =C12/G12 =B15<G15 Stream Processing with a Spreadsheet Martin Hirzel, IBM Research AI 15 [ECOOP'14] (Distinguished Paper Award)

  16. Reactive: ActiveSheets Columns Time Rows Sheets � Need more than two dimensions in practice Spreadsheets for Stream Processing with Martin Hirzel, IBM Research AI 16 Unbounded Windows and Partitions (DEBS'16)

  17. Controlled Natural Language: META 1 --- data model --- 2 a Client is a business entity identified by a name . 3 a Client is related to a marketer (a Marketer ). 4 5 a Read Event is a business event time-stamped by a date . 6 a Read Event is related to a client (a Client ). 7 a Read Event has a topic . 8 a Read Event has a length (a number). 9 10 --- agent descriptor --- 11 ' ClientRules ' is an agent related to a Client , 12 processing events : 13 - Read Event , where this Client comes from the client of this Read Event 14 15 --- event-condition-action rule --- 16 when a Read Event occurs 17 if 18 the length of this Read Event is more than ' Average Read Event Length ' 19 then 20 emit a new Alert where 21 the client is 'the Client ' , 22 the topic is the topic of this Read Event , 23 the marketer is the marketer of 'the Client ' ; 24 25 --- global event query --- 26 define ' Average Read Event Length ' as 27 the average length of all Read Events 28 during the last period of 6 hours . Martin Hirzel, IBM Research AI 17 http://www.ibm.com/software/products/en/odm

  18. Controlled Natural Language: META Event Event router Exactly-once Shuffle, using X10 event processing Event Analytics Event Analytics Event Analytics Action agent agent agent agent agent agent Transaction WXS Shard WXS Shard WXS Shard Replication META: Middleware for Events, Martin Hirzel, IBM Research AI 18 Transactions, and Analytics [IBMRD'16]

  19. Foundational Calculus: Brooklet � Pure opaque functions, explicit state A Universal Calculus for Stream Martin Hirzel, IBM Research AI 19 Processing Languages [ESOP'10]

  20. Foundational Calculus: Brooklet � Atomic steps, non-determinism, fire on any port From a Calculus to an Execution Environment for Martin Hirzel, IBM Research AI 20 Stream Processing [DEBS'12] (Best Paper Award)

  21. Democratization of Streaming Telco Medical Science Finance F Streaming engine High-level programming experience Insights Actions Martin Hirzel, IBM Research AI 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend