Formal Semantics for Composable Workflows for Scraper
Understanding Flows Albert Schimpf
wiki.scraper.server1.link
Technische Universität Kaiserslautern (TUK), Kyoto University
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 1 / 26
Formal Semantics for Composable Workflows for Scraper Understanding - - PowerPoint PPT Presentation
Formal Semantics for Composable Workflows for Scraper Understanding Flows Albert Schimpf wiki.scraper.server1.link Technische Universitt Kaiserslautern (TUK), Kyoto University Schimpf (TUK, Kyoto University) Scraper Masters Thesis 18/19
wiki.scraper.server1.link
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 1 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 2 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 3 / 26
◮ Resource-intensive (proxies, I/O bound) ◮ Resume-able, long-running ◮ Flexible stream of fresh data ◮ Easily modifiable structure Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 3 / 26
◮ Resource-intensive (proxies, I/O bound) ◮ Resume-able, long-running ◮ Flexible stream of fresh data ◮ Easily modifiable structure
◮ CPU-intensive ◮ user interactive Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 3 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 4 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 4 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 4 / 26
◮ too much effort ◮ fragile ◮ code duplication ... Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 5 / 26
◮ too much effort ◮ fragile ◮ code duplication ...
◮ modifications of sub-routines affected other programs ◮ mixed control-flow and data-flow hard to reason about ◮ language focused on control-flow less suited for data-flow problems Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 5 / 26
◮ too much effort ◮ fragile ◮ code duplication ...
◮ modifications of sub-routines affected other programs ◮ mixed control-flow and data-flow hard to reason about ◮ language focused on control-flow less suited for data-flow problems
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 5 / 26
◮ ... connecting them (graph structure)? ◮ ... how data is passed around (API)? ◮ ... concurrent access? ◮ ... configuration? ◮ ... complex control-flow, data-parallelism? Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 6 / 26
◮ ... connecting them (graph structure)? ◮ ... how data is passed around (API)? ◮ ... concurrent access? ◮ ... configuration? ◮ ... complex control-flow, data-parallelism?
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 6 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
◮ Guarantee concurrent access and processing of data at any time
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
◮ Guarantee concurrent access and processing of data at any time
◮ Errors only happen during initialization of the specification ◮ After initialization, errors are guaranteed to be of business-logic nature Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 7 / 26
◮ ... implement single unit of work ◮ ... forward data to another node
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 8 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 8 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 8 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 9 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 10 / 26
◮ Goal: type safety Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 10 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 11 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 12 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 13 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 14 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 15 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 16 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 17 / 26
◮ Time
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
◮ Time
◮ Map ◮ Map join Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
◮ Time
◮ Map ◮ Map join
◮ Template expressions ◮ Key lookup @|τ : String| ⋆ @|a| : Simple template ⋆ @@|a| : Look inside maps (UnpackMapNode) ◮ Array lookup |τ : List<T>|[τ : Integer] ◮ String concatenation τ : String + τ : String ◮ Simple value Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
◮ Time
◮ Map ◮ Map join
◮ Template expressions ◮ Key lookup @|τ : String| ⋆ @|a| : Simple template ⋆ @@|a| : Look inside maps (UnpackMapNode) ◮ Array lookup |τ : List<T>|[τ : Integer] ◮ String concatenation τ : String + τ : String ◮ Simple value
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 18 / 26
◮ Nodes don’t crash ◮ If configuration is well-typed, flows are guaranteed to finish
◮ Quasi-static graph makes reasoning about control-flow easy ◮ Business logic can be easily configured via templates without touching
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 19 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 20 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 21 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 22 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 23 / 26
◮ Remember: separation of control-flow and data-flow is forced
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 24 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 24 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 24 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 24 / 26
Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 25 / 26
◮ Nodes are black boxes with ports, send/receive data ◮ Not typed ◮ No frameworks focused on concurrency ⋆ JavaFBP, NoFlo, C#FBP, other domain specific flow-based languages...
◮ Similar problems to FBP ◮ Processes can crash ◮ Message passing
◮ Nodes are processes again ◮ Too complex, even with DSL (incomplete functionality)
◮ Don’t want to write a program for each task ◮ No separation of control-flow and data-flow is forced Schimpf (TUK, Kyoto University) Scraper Master’s Thesis 18/19 26 / 26