dsp frameworks
play

DSP Frameworks Corso di Sistemi e Architetture per Big Data A.A. - PowerPoint PPT Presentation

Macroarea di Ingegneria Dipartimento di Ingegneria Civile e Ingegneria Informatica DSP Frameworks Corso di Sistemi e Architetture per Big Data A.A. 2018/19 Valeria Cardellini Laurea Magistrale in Ingegneria Informatica DSP frameworks we


  1. Heron API: shift to functional style • Processing graphs consist of streamlets – One or more supplier streamlets inject data into the graph to be processed by downstream operators • Operations (similar to Spark) Valeria Cardellini - SABD 2018/19 23

  2. Heron API: shift to functional style • Operations (continued) Valeria Cardellini - SABD 2018/19 24

  3. Heron: topology lifecycle • Topology lifecycle managed through Heron’s CLI tool • Stages – Submit the topology to the cluster – Activate the topology – Restart an active topology if, e.g., after updating the topology configuration – Deactivate the topology – Kill a topology to completely remove it from the cluster Valeria Cardellini - SABD 2018/19 25

  4. Heron topology: logical and physical plans • Topology’s logical plan : analogous to a database query plan in that it maps out the basic operations associated with a topology • Topology’s physical plan : determines the “physical” execution logic of a topology, i.e. how topology processes are divided between Heron containers • Logical and physical plans are automatically created by Heron Valeria Cardellini - SABD 2018/19 26

  5. Heron architecture per topology • Master-work architecture • One Topology Master (TM) – Manages a topology throughout its entire lifecycle • Multiple Containers – Each Container multiple Heron Instances, a Stream Manager, and a Metrics Manager – A Heron Instance is a process that handles a single task of a spout or bolt – Containers communicate with TM to ensure that the topology forms a fully connected graph Valeria Cardellini - SABD 2018/19 27

  6. Heron architecture per topology Valeria Cardellini - SABD 2018/19 28

  7. Heron architecture per topology • Stream Manager (SM): routing engine for data streams – Each Heron container connects to its local SM, while all of the SMs in a given topology connect to one another to form a network – Responsible for propagating backpressure Valeria Cardellini - SABD 2018/19 29

  8. Heron: topology submit sequence Valeria Cardellini - SABD 2018/19 30

  9. Heron: self-adaptation • Dhalion: framework on top of Heron to autonomously reconfigure topologies to meet throughput SLOs, scaling resource consumption up and down as needed • Phases in Dhalion: - Symptom detection (backpressure, skew, … ) - Diagnosis generation - Resolution • Adaptation actions: parallelism changes Valeria Cardellini - SABD 2018/19 31

  10. Heron environment • Heron supports deployment on Apache Mesos • Can also run on Mesos using Apache Aurora as a scheduler or using a local scheduler Valeria Cardellini - SABD 2018/19 32

  11. Batch processing vs. stream processing • Batch processing is just a special case of stream processing Valeria Cardellini - SABD 2018/19 33

  12. Batch processing vs. stream processing • Batched/stateless: scheduled in batches – Short-lived tasks (Hadoop, Spark) – Distributed streaming over batches (Spark Streaming) • Dataflow/stateful: continuous/scheduled once (Storm, Flink, Heron) – Long-lived task execution – State is kept inside tasks Valeria Cardellini - SABD 2018/19 34

  13. Native vs. non-native streaming Valeria Cardellini - SABD 2018/19 35

  14. Apache Flink • Distributed data flow processing system • One common runtime for DSP applications and batch processing applications – Batch processing applications run efficiently as special cases of DSP applications • Integrated with many other projects in the open-source data processing ecosystem • Derives from Stratosphere project by TU Berlin, Humboldt University and Hasso Plattner Institute • Support a Storm-compatible API Valeria Cardellini - SABD 2018/19 36

  15. Flink: software stack • Flink is a layered system • On top: libraries with high-level APIs for different use cases https://ci.apache.org/projects/flink/flink-docs-release-1.8/ Valeria Cardellini - SABD 2018/19 37

  16. Flink: programming model • Data streams – Unbounded, partitioned immutable sequence of events • Stream operators – Stream transformations that take one or more streams as input, and produce one or more output streams as a result Valeria Cardellini - SABD 2018/19 38

  17. DSP and time • Different notions of time in a DSP application: – Processing time: time at which events are observed in the system (local time of the machine executing the operator) – Event time: time at which events actually occured • Usually described by a timestamp in the events – Ingestion time: when an event enters the dataflow at the source operator(s) See https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 Valeria Cardellini - SABD 2018/19 39

  18. Flink: time • Flink supports all the 3 notions of time – Internally, ingestion time is treated similarly to event time • Event time makes it easy to compute over streams where events arrive out-of-order , and where events may arrive delayed • How to measure the progress of event time? – Flink uses watermarks Valeria Cardellini - SABD 2018/19 40

  19. Flink: backpressure • Continuous streaming model with backpressure – Flink’s streaming runtime provides flow control: slow data sinks backpressure faster sources – Flink’s UI allows to monitor backpressure behavior of running jobs • Back pressure warning (e.g. High ) for an upstream operator Valeria Cardellini - SABD 2018/19 41

  20. Flink: other features • Highly flexible streaming windows – Also user-defined windows • Exactly-once semantics for stateful computations – Based on two-phase commit Valeria Cardellini - SABD 2018/19 42

  21. Flink: levels of abstraction • Different levels of abstraction to develop streaming/batch applications • APIs in Java and Scala Valeria Cardellini - SABD 2018/19 43

  22. Flink: APIs and libraries • Streaming data applications: DataStream API – Supports functional transformations on data streams, with user-defined state and flexible windows – Example: how to compute a sliding histogram of word occurrences of a data stream of texts WindowWordCount in Flink's DataStream API Sliding time window of 5 sec length and 1 sec trigger interval Valeria Cardellini - SABD 2018/19 44

  23. Flink: APIs and libraries • Batch processing applications: DataSet API – Supports a wide range of data types beyond key/value pairs and a wealth of operators Core loop of the PageRank algorithm for graphs Valeria Cardellini - SABD 2018/19 45

  24. Anatomy of a Flink program • Let’s analyze the DataStream API https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html • Each Flink program consists of the same basic parts: 1. Obtain an execution environment 2. Load/create the initial data Valeria Cardellini - SABD 2018/19 46

  25. Anatomy of a Flink program 3. Specify transformations on data 4. Specify where to put the results of your computations 5. Trigger the program execution Valeria Cardellini - SABD 2018/19 47

  26. Flink: lazy evaluation • All Flink programs are executed lazily – When the program’s main method is executed, the data loading and transformations do not happen directly – Rather, each operation is created and added to the program’s plan – Operations are actually executed when the execution is explicitly triggered by execute() call on the execution environment Valeria Cardellini - SABD 2018/19 48

  27. Flink: data sources • Several predefined stream sources accessible from the StreamExecutionEnvironment 1. File-based: – E.g., readTextFile(path) to read text files – Flink splits file reading process into two sub-tasks: directory monitoring and data reading • Monitoring is implemented by a single, non-parallel task, while reading is performed by multiple tasks running in parallel, whose parallelism is equal to the job parallelism 2. Socket-based 3. Collection-based 4. Custom – E.g., to read from Kafka addSource(new FlinkKafkaConsumer08<>(...)) – See Apache Bahir for streaming connectors and SQL data sources https://bahir.apache.org/ Valeria Cardellini - SABD 2018/19 49

  28. Flink: DataStream transformations • Map DataStream → DataStream – Example: double the values of the input stream • FlatMap DataStream → DataStream – Example: split sentences to words Valeria Cardellini - SABD 2018/19 50

  29. Flink: DataStream transformations • Filter DataStream → DataStream – Example: filter out zero values • KeyBy DataStream → KeyedStream – To specify a key, that logically partitions a stream into disjoint partitions – Internally, implemented with hash partitioning – Different ways to specify keys, the simplest case is grouping tuples on one or more fields of the tuple: – Examples: Valeria Cardellini - SABD 2018/19 51

  30. Flink: DataStream transformations • Reduce KeyedStream → DataStream – “Rolling” reduce on a keyed data stream – Combines the current element with the last reduced value and emits the new value – Example: create a stream of partial sums Valeria Cardellini - SABD 2018/19 52

  31. Flink: DataStream transformations • Fold KeyedStream → DataStream – “Rolling” fold on a keyed data stream with an initial value – Combines the current element with the last folded value and emits the new value – Example: to emit the sequence "start-1", "start-1-2", "start-1-2-3", ... when applied on the sequence (1,2,3,4,5) Valeria Cardellini - SABD 2018/19 53

  32. Flink: DataStream transformations • Aggregations KeyedStream → DataStream – To aggregate on a keyed data stream – min returns the minimum value, whereas minBy returns the element that has the minimum value in this field • Window KeyedStream → WindowedStream Valeria Cardellini - SABD 2018/19 54

  33. Flink: DataStream transformations • Other transformations available in Flink – Join: joins two data streams on a given key – Union: union of two or more data streams creating a new stream containing all the elements from all the streams – Split : splits the stream into two or more streams according to some criterion – Iterate : creates a “feedback” loop in the flow, by redirecting the output of one operator to some previous operator • Useful for algorithms that continuously update a model See https://ci.apache.org/projects/flink/flink-docs-release- 1.8/dev/stream/operators/ Valeria Cardellini - SABD 2018/19 55

  34. Example: streaming window WordCount • Count the words from a web socket in 5 sec windows // Key by the first element of a Tuple Valeria Cardellini - SABD 2018/19 56

  35. Example: streaming window WordCount Valeria Cardellini - SABD 2018/19 57

  36. Flink: windows support • Windows can be applied either to keyed streams or to non-keyed ones • General structure of a windowed Flink program Valeria Cardellini - SABD 2018/19 58

  37. Flink: window lifecycle • First, specify if stream is keyed or not and define the window assigner – Keyed stream allows to perform the windowed computation in parallel by multiple tasks – The window will be completely removed when the time (event or processing time) passes its end timestamp plus the user-specified allowed lateness • Then associate to the window the trigger and function – Trigger determines when a window is ready to be processed by the window function – Function specifies the computation to be applied to the window contents Valeria Cardellini - SABD 2018/19 59

  38. Flink: window assigners • How elements are assigned to windows • Support for different window assigners – Each WindowAssigner comes with a default Trigger • Built-in assigners for most common use cases: – Tumbling windows – Sliding windows – Session windows – Global windows • Except global windows, they assign elements to windows based on time, which can either be processing time or event time • Also possible to implement a custom window assigner Valeria Cardellini - SABD 2018/19 60

  39. Flink: window assigners • Session windows – To group elements by sessions of activity – Differently from tumbling and sliding windows, do not overlap and do not have a fixed start and end time – A session window closes when a gap of inactivity occurs • Global windows – To assign all elements with the same key to the same single global window – Only useful if you also specify a custom trigger Valeria Cardellini - SABD 2018/19 61

  40. Flink: window functions • Different window functions to specify the computation on each window • ReduceFunction – To incrementally aggregate the elements of a window – Example: sum up the second fields of the tuples for all elements in a window Valeria Cardellini - SABD 2018/19 62

  41. Flink: window functions • AggregateFunction : generalized version of a ReduceFunction – Example: compute the average of the second field of the elements in the window Valeria Cardellini - SABD 2018/19 63

  42. Flink: window functions • FoldFunction : to specify how an input element of the window is combined with an element of the output type • ProcessWindowFunction : gets an Iterable containing all the elements of the window, and a Context object with access to time and state information – More flexibility than other window functions, at the cost of performance and resource consumption: elements are buffered until the window is ready for processing • ReduceFunction and AggregateFunction can be executed more efficiently – Flink can incrementally aggregate the elements for each window as they arrive Valeria Cardellini - SABD 2018/19 64

  43. Flink: control events • Control events: special events injected in the data stream by operators • Two types of control events in Flink ⎼ Watermarks ⎼ Checkpoint barriers Valeria Cardellini - SABD 2018/19 65

  44. Flink: watermarks • Watermarks signal the progress of event time within a data stream – Watermark(t) declares that event time has reached time t in that stream, meaning that there should be no more elements with timestamp t’ <= t – Crucial for out-of-order streams, where events are not ordered by their timestamps • Flink does not provide ordering guarantees after any form of stream partitioning or broadcasting – In such case, dealing with out-of-order tuples is left to the operator implementation Valeria Cardellini - SABD 2018/19 66

  45. Flink: checkpoint barriers • To provide fault tolerance (see next slides), special barrier markers (called checkpoint barriers) are periodically injected at streams sources and then pushed downstream up to sinks Valeria Cardellini - SABD 2018/19 67

  46. Fault tolerance • To provide consistent results, DSP systems need to be resilient to failures • How? By periodically capturing a snapshot of the execution graph which can be used later to restart in case of failures (checkpointing) Snapshot : global state of the execution graph, capturing all necessary information to restart computation from that specific execution state • Common approach is to rely on periodic global state snapshots, but has drawbacks: – Stall overall computation – Eagerly persist all tuples in transit along with states which results in larger snapshots than required Valeria Cardellini - SABD 2018/19 68

  47. Flink: fault tolerance • Flink offers a lightweight snapshotting mechanism – Allows to maintain high throughput and provide strong consistency guarantees at the same time • Such mechanism: – Draws consistent snapshots of stream flows and operators’ state, – Even in presence of failures, the application state will reflect every record from the data stream exactly once – State stored at configurable place – Disabled by default • Inspired by Chandy-Lamport algorithm for distributed snapshot and tailored to Flink’s execution model Valeria Cardellini - SABD 2018/19 69

  48. Chandy-Lamport algorithm • The observer process (process initiating the snapshot): – Saves its own local state – Sends a snapshot requestmessage bearing a snapshot token to all other processes • If a process receives the token for the first time : – Sends the observer process its own saved state – Attaches the snapshot token to all subsequent messages (to help propagate the snapshot token) • When a process that has already received the token receives a message not bearing the token, it will forward that message to the observer process – This message was sent before the snapshot “cut off” (as it does not bear a snapshot token) and needs to be included in the snapshot • The observer builds up a complete snapshot: a saved state for each process and all messages “in the ether” are saved Valeria Cardellini - SABD 2018/19 70

  49. Flink: fault tolerance • Uses checkpoint barriers – When an operator has received a barrier for snapshot n from all of its input streams, it emits a barrier for snapshot n into all of its outgoing streams. Once a sink operator has received barrier n from all of its input streams, it acknowledgesthat snapshot n to the checkpoint coordinator. After all sinks have acknowledged a snapshot, it is considered completed https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html Valeria Cardellini - SABD 2018/19 71

  50. Flink: performance and memory management • High throughput and low latency • Memory management – Flink implements its own memory management inside the JVM Valeria Cardellini - SABD 2018/19 72

  51. Flink: architecture • The usual master-worker architecture Valeria Cardellini - SABD 2018/19 73

  52. Flink: architecture • Master (JobManager): schedules tasks, coordinates checkpoints, coordinates recovery on failures, etc. • Workers (TaskManagers): JVM processes that execute tasks of a dataflow, and buffer and exchange the data streams – Workers use task slots to control the number of tasks they accept (at least one) – Each task slot represents a fixed subset of resources of the worker Valeria Cardellini - SABD 2018/19 74

  53. Flink: application execution • The JobManager receives the JobGraph – Representation of the data flow consisting of operators (JobVertex) and intermediate results (IntermediateDataSet) – Each operator has properties, like parallelism and code that it executes • The JobManager transforms the JobGraph into an ExecutionGraph – Parallel version of JobGraph Valeria Cardellini - SABD 2018/19 75

  54. Flink: application execution • Data parallelism – Different operators of the same program may have different levels of parallelism – The parallelism of an individual operator, data source, or data sink can be defined by calling its setParallelism() method Valeria Cardellini - SABD 2018/19 76

  55. Flink: application execution • Execution plan can be visualized Valeria Cardellini - SABD 2018/19 77

  56. Flink: application monitoring • Flink has a built-in monitoring and metrics system • Built-in metrics include – Throughput: in terms of number of records per sec. (per operator/task) – Latency • Support for latency tracking: special markers are periodically inserted at all sources in order to obtain a distribution of latency between sources and each downstream operator – But do not account for time spent in operator processing – Assume that all machines clocks are sync – Used JVM heap/non-heap/direct memory • Application-specific metrics can be added – E.g., counters for the number of invalid records • All metrics can be either queried via Flink’s REST API or send to external systems (e.g., Graphite and InfluxDB) See https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html Valeria Cardellini - SABD 2018/19 78

  57. Flink: deployment • Designed to run on large-scale clusters with many thousands of nodes • Can be run in a fully distributed fashion on a static (but possibly heterogeneous) standalone cluster • For a dynamically shared cluster, can be deployed on YARN or Mesos • Docker images for Apache Flink available on Docker Hub Valeria Cardellini - SABD 2018/19 79

  58. A recent need • A common need for many companies – Run both batch and stream processing • Alternative solutions 1. Lambda architecture 2. Unified frameworks 3. Unified programming model Valeria Cardellini - SABD 2018/19 80

  59. Lambda architecture • Data-processing design pattern to integrate batch and real-time processing • Streaming framework used to process real-time events, and, in parallel, batch framework to process the entire dataset • Results from the two parallel pipelines are then merged Source: https://voltdb.com/products/alternatives/lambda-architecture 81 Valeria Cardellini - SABD 2018/19

  60. Lambda architecture: example • Lambda architecture used at LinkedIn before Samza development Valeria Cardellini - SABD 2018/19 82

  61. Lambda architecture: pros and cons • Pros: – Flexibility in the frameworks’ choice • Cons: – Implementing and maintaining two separate frameworks for batch and stream processing can be hard and error-prone – Overhead of developing and managing multiple source codes • The logic in each fork evolves over time, and keeping them in sync involves duplicated and complex manual effort, often with different languages Valeria Cardellini - SABD 2018/19 83

  62. Unified frameworks • Use a unified (Lambda-less) design for processing both real-time as well as batch data using the same data structure • Spark, Flink, Samza and Apex follow this trend Valeria Cardellini - SABD 2018/19 84

  63. Unified programming model: Apache Beam • A new layer of abstraction • Provides advanced unified programming model – Allows to define batch and streaming data processing pipelines that run on any execution engine (for now: Apex, Flink, Spark, Google Cloud Dataflow) – Java, Python and Go as programming languages • Translates the data processing pipeline defined by the user with the Beam program into the API compatible with the chosen distributed processing engine • Developed by Google and released as open- source top-level project Valeria Cardellini - SABD 2018/19 85

  64. Apache Samza • A distributed framework for stateful and fault- tolerant stream processing – Unified framework for batch and stream processing • Similarly to Flink, streams as first-class citizen, batch as special case of streaming – Used in production at LinkedIn Valeria Cardellini - SABD 2018/19 86

  65. Apache Samza • Why stateful and fault-tolerant processing? User profiles, email digests, aggregate counts, … • Example: Email Digestion System at LinkedIn – Production application running to digest updates into one email Valeria Cardellini - SABD 2018/19 87

  66. Samza: features • Unified processing API for stream and batch – Supports both stateless and stateful stream processing – Supports both processing time and event time • Configurable and heterogeneous data sources and sinks (e.g., Kafka, HDFS, AWS Kinesis) • At-least once processing • Efficient state management – Local state (in-memory or on disk) partitioned among tasks (rather than remote data store) – Incremental checkpointing: only the delta rather than the entire state • Flexible deployment – As light-weight embedded library that can be integrated with a larger application – Alternately, as managed framework using YARN Valeria Cardellini - SABD 2018/19 88

  67. Samza: architecture • Task: logical unit of parallelism • Container: physical unit of parallelism • Usual architecture – The coordinator manages the assignment of tasks across containers, monitors the liveness of containers and redistributes the tasks during a failure – One coordinator per application – Host-affinity : during a new deployment Samza tries to preserve the assignment of tasks to hosts to re-use the snapshot of its local state Valeria Cardellini - SABD 2018/19 89

  68. DSP state management • How to manage state information, i.e., “intermediate information” that needs to be maintained between tuples for processing streams of data correctly? • Common approach (e.g., in Storm) to deal with large amounts of state: use remote data store (e.g., Redis) Valeria Cardellini - SABD 2018/19 90

  69. Samza: state management • Samza approach: keep state local to each node and make it robust to failures by replicating state changes across multiple machines Local state External store Valeria Cardellini - SABD 2018/19 91

  70. Samza: High Level Streams API • Samza offers multiple APIs – High Level Streams API, Low Level Task API, Samza SQL – High Level Streams API : includes common stream processing operations such as filter, partition, join, and windowing – Example: Wikipedia stream application using Samza that consumes events from Wikipedia and produce stats to a Kafka topic https://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-code.html Valeria Cardellini - SABD 2018/19 92

  71. Towards strict delivery guarantees • Most frameworks provide at-least-once delivery guarantees (e.g., Storm, Samza) – For stateful non-idempotent operators such as counting, at- least-once delivery guarantees can give incorrect results • Flink, Storm with Trident, and Google’s MillWheel offer stronger delivery guarantees (i.e., exactly-once) – Exactly-once low latency stream processing in MillWheel works as follows: • The record is checked against de-duplication data from previous deliveries; duplicates are discarded • User code is run for the input record, possibly resulting in pending changes to timers, state, and productions • Pending changes are committed to the backing store • Senders are acked • Pending downstream productions are sent Valeria Cardellini - SABD 2018/19 93

  72. Comparing DSP frameworks • Let’s compare open source DSP frameworks according to some features API Windows Delivery Fault tol. State Flow Operator semantics mgmt. ctl. elasticity Storm Low-level Yes At least once Acking Limited Back No High-level Exactly once Checkpoint. Yes with pressure SQL with Trident (similar to Trident No batch Fink) Low-level Yes At least once Limited Back Yes with Heron High-level Effectively pressure Dhalion No SQL once No batch Flink High-level Yes, also At least once Checkpoint. Yes Back No SQL used-def. Exactly once pressure Also batch Samza Low-level Yes At least once Incremental Yes No No High-level checkpoint. SQL Unified Valeria Cardellini - SABD 2018/19 94

  73. DSP in the Cloud • Data streaming systems also as Cloud services – Amazon Kinesis Data Streams – Google Cloud Dataflow – IBM Streaming Analytics – Microsoft Azure Stream Analytics • Abstract the underlying infrastructure and support dynamic scaling of computing resources • Appear to execute in a single data center (i.e., no geo-distribution) Valeria Cardellini - SABD 2018/19 95

  74. Google Cloud Dataflow • Fully-managed data processing service, supporting both stream and batch data processing – Automated resource management – Dynamic work rebalancing – Horizontal auto-scaling • Provides a unified programming model based on Apache Beam – Apache Beam SDKs in Java and Python – Enable developers to implement custom extensions and choose other execution engines • Provides exactly-once processing – MillWheel is Google’s internal version of Cloud Dataflow Valeria Cardellini - SABD 2018/19 96

  75. Google Cloud Dataflow • Can be seamlessly integrated with GCP services for streaming events ingestion (Cloud Pub/Sub), data warehousing (BigQuery), machine learning (Cloud Machine Learning) Valeria Cardellini - SABD 2018/19 97

  76. Amazon Kinesis Data Streams • Allows to collect and ingest streaming data at scale for real-time analytics Valeria Cardellini - SABD 2018/19 98

  77. Kinesis Data Analytics • Allows to process data streams in real time with SQL or Java – Java open source libraries based on Apache Flink • Usual operators to filter aggregate and transform streaming data – Per-hour pricing based on the number of Kinesis Processing Units used to run the application • Horizontal scaling of KPUs Valeria Cardellini - SABD 2018/19 99

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend