The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora - - PowerPoint PPT Presentation
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora - - PowerPoint PPT Presentation
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org What is Apache Flink? Open Started in 2009 by the Berlin-based database research groups Source In the Apache Incubator since mid 2014 61
What is Apache Flink?
- Started in 2009 by the Berlin-based database
research groups
- In the Apache Incubator since mid 2014
- 61 contributors as of the 0.7.0 release
Open Source
- Fast, general purpose distributed data processing
system
- Combining batch and stream processing
- Up to 100x faster than Hadoop
Fast + Reliable
- Programming APIs for Java and Scala
- Tested on large clusters
Ready to use
11/18/2014 2 Flink - M. Balassi & Gy. Fora
What is Apache Flink?
Master Worker Worker
Flink Cluster
Analytical Program Flink Client & Op0mizer
11/18/2014 3 Flink - M. Balassi & Gy. Fora
This Talk
- Introduction to Flink
- API overview
- Distinguishing Flink
- Flink from a user perspective
- Performance
- Flink roadmap and closing
11/18/2014 4 Flink - M. Balassi & Gy. Fora
Open source Big Data Landscape
MapReduce Hive Flink Spark Storm Yarn Mesos HDFS Mahout Cascading Tez Pig Data processing engines App and resource management Applica3ons Storage, streams KaCa HBase Crunch …
11/18/2014 5 Flink - M. Balassi & Gy. Fora
Flink stack
Common API Storage Streams Hybrid Batch/Streaming Run0me
HDFS
Files S3
Cluster Manager YARN EC2 Na0ve Flink Op0mizer Scala API
(batch)
Graph API
(„Spargel“)
JDBC
Redis Rabbit MQ KaCa Azure
…
Java Collec0ons Streams Builder Apache Tez Python API Java API
(streaming)
Apache MRQL Batch Streaming Java API
(batch)
Local Execu0on
11/18/2014 6 Flink - M. Balassi & Gy. Fora
Flink APIs
Programming model
DataSet/Stream A A (1) A (1) A (2) A (2) B (1) B (1) B (2) B (2) C (1) C (1) C (2) C (2) X X Y Y
Program Parallel Execution X Y Operator X Operator Y Data abstractions: Data Set, Data Stream
DataSet/Stream B DataSet/Stream C
11/18/2014 8 Flink - M. Balassi & Gy. Fora
Flexible pipelines
Reduce Join Map Reduce Map Iterate Source Sink Source
Map, FlatMap, MapPartition, Filter, Project, Reduce, ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross, Iterate, Iterate Delta, Iterate-Vertex-Centric, Windowing
11/18/2014 9 Flink - M. Balassi & Gy. Fora
WordCount, Java API
DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("\\W")) {
- ut.collect(new Tuple2<>(token, 1));
}) .groupBy(0) .sum(1);
11/18/2014 10 Flink - M. Balassi & Gy. Fora
WordCount, Scala API
val input = env.readTextFile(input); val words = input flatMap { line => line.split("\\W+") } val counts = words groupBy { word => word } count()
11/18/2014 11 Flink - M. Balassi & Gy. Fora
WordCount, Streaming API
DataStream<String> text = env.readTextFile(input); DataStream<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("\\W")) {
- ut.collect(new Tuple2<>(token, 1));
}) .groupBy(0) .sum(1);
11/18/2014 12 Flink - M. Balassi & Gy. Fora
Is there anything beyond WordCount?
11/18/2014 13 Flink - M. Balassi & Gy. Fora
Beyond Key/Value Pairs
DataSet<Page> pages = ...; DataSet<Impression> impressions = ...; DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count"); pages.join(impressions).where("url").equalTo("url") .print() // outputs pairs of matching pages and impressions class Impression { public String url; public long count; } class Page { public String url; public String topic; } // outputs pairs of pages and impressions
11/18/2014 14 Flink - M. Balassi & Gy. Fora
Preview: Logical Types
DataSet<Row> dates = env.readCsv(...).as("order_id", "date"); DataSet<Row> sessions = env.readCsv(...).as("id", "session"); DataSet<Row> joined = dates .join(session).where("order_id").equals("id"); joined.groupBy("date").reduceGroup(new SessionFilter()) class SessionFilter implements GroupReduceFunction<SessionType> { public void reduce(Iterable<SessionType> value, Collector out){ ... } } public class SessionType { public String order_id; public Date date; public String session; }
11/18/2014 15 Flink - M. Balassi & Gy. Fora
Distinguishing Flink
Hybrid batch/streaming runtime
- Batch and stream processing in the same system
- No micro-batches, unified runtime
- Competitive performance
- Code reusable from batch processing to streaming,
making development and testing a piece-of-cake
11/18/2014 17 Flink - M. Balassi & Gy. Fora
Flink Streaming
- Most Data Set operators are also available for
Data Streams
- Temporal and streaming specific operators
– Window/mini-batch operators – Window join, cross etc.
- Support for iterative stream processing
- Connectors for different data sources
– Kafka, Flume, RabbitMQ, Twitter etc.
11/18/2014 18 Flink - M. Balassi & Gy. Fora
Flink Streaming
//Build new model on every second of new data DataStream<Double[]> model= env .addSource(new TrainingDataSource()) .window(1000) .reduceGroup(new ModelBuilder()); //Predict new data using the most up-to-date model DataStream<Integer> prediction = env .addSource(new NewDataSource()) .connect(model) .map(new Predictor());
11/18/2014 19 Flink - M. Balassi & Gy. Fora
Lambda architecture
Source: https://www.mapr.com/developercentral/lambda-architecture
11/18/2014 20 Flink - M. Balassi & Gy. Fora
Lambda architecture in Flink
11/18/2014 21 Flink - M. Balassi & Gy. Fora
Dependability
JVM Heap
Flink Managed Heap
Network Buffers
Unmanaged Heap
(next version unifies network buffers and managed heap)
User Code Hashing/Sor0ng/Caching
- Flink manages its own memory
- Caching and data processing happens in a dedicated
memory fraction
- System never breaks the
JVM heap, gracefully spills Shuffles/Broadcasts
11/18/2014 22 Flink - M. Balassi & Gy. Fora
- serializes data every time
Highly robust, never gives up on you
- works on objects, RDDs may be stored serialized
Serialization considered slow, only when needed
- makes serialization really cheap:
partial deserialization, operates on serialized form Efficient and robust!
Operating on Serialized Data
11/18/2014 23 Flink - M. Balassi & Gy. Fora
Operating on Serialized Data
Microbenchmark
- Sorting 1GB worth of (long, double) tuples
- 67,108,864 elements
- Simple quicksort
11/18/2014 24 Flink - M. Balassi & Gy. Fora
Memory Management
public class WC { public String word; public int count; }
empty page
Pool of Memory Pages
- Works on pages of bytes, maps objects transparently
- Full control over memory, out-of-core enabled
- Algorithms work on binary representation
- Address individual fields (not deserialize whole object)
- Move memory between operations
11/18/2014 25 Flink - M. Balassi & Gy. Fora
Flink from a user perspective
Flink programs run everywhere
Cluster (Batch) Cluster (Streaming) Local Debugging
Fink Run3me or Apache Tez As Java Collec0on Programs
Embedded (e.g., Web Container)
11/18/2014 27 Flink - M. Balassi & Gy. Fora
Migrate Easily
Flink out-of-the-box supports
- Hadoop data types (writables)
- Hadoop Input/Output Formats
- Hadoop functions and object model
Input Map Reduce Output
DataSet DataSet DataSet
Red Join
DataSet
Map
DataSet Output
S
Input
11/18/2014 28 Flink - M. Balassi & Gy. Fora
- Requires no memory thresholds to configure
– Flink manages its own memory
- Requires no complicated network configs
– Pipelining engine requires much less memory for data exchange
- Requires no serializers to be configured
– Flink handles its own type extraction and data representation
- Programs can be adjusted to data automatically
– Flink’s optimizer can choose execution strategies automatically
Little tuning or configuration required
11/18/2014 29 Flink - M. Balassi & Gy. Fora
Understanding Programs
Visualizes the operations and the data movement of programs
Analyze after execution
Screenshot from Flink’s plan visualizer
11/18/2014 30 Flink - M. Balassi & Gy. Fora
Understanding Programs
Analyze after execution (times, stragglers, …)
11/18/2014 31 Flink - M. Balassi & Gy. Fora
Iterations in other systems
Step Step Step Step Step
Client
Loop outside the system
Step Step Step Step Step
Client
Loop outside the system
11/18/2014 32 Flink - M. Balassi & Gy. Fora
Iterations in Flink
Streaming dataflow with feedback
map join red. join
System is iteration-aware, performs automatic optimization
11/18/2014 33 Flink - M. Balassi & Gy. Fora
Automatic Optimization for Iterative Programs
Caching Loop-invariant Data Pushing work „out of the loop“ Maintain state as index
11/18/2014 34 Flink - M. Balassi & Gy. Fora
Performance
Distributed Grep
Filter Term 1
HDFS
Filter Term 2 Filter Term 3
Matches Matches Matches
- 1 TB of data (log files)
- 24 machines with
- 32 GB Memory
- Regular HDDs
- HDFS 2.4.0
- Flink 0.7-incubating-
SNAPSHOT
- Spark 1.2.0-SNAPSHOT
Flink up to 2.5x faster
11/18/2014 36 Flink - M. Balassi & Gy. Fora
Distributed Grep: Flink Execution
11/18/2014 37 Flink - M. Balassi & Gy. Fora
Distributed Grep: Spark Execution
Filter Term 1
HDFS
Matches
Spark executes the job in 3 stages:
Stage 1
Filter Term 2
HDFS
Matches
Stage 2
Filter Term 3
HDFS
Matches
Stage 3
11/18/2014 38 Flink - M. Balassi & Gy. Fora
Spark in-memory pinning
Filter Term 1
HDFS
Filter Term 2 Filter Term 3
Matches Matches Matches
In- Memory RDD Cache in-memory to avoid reading the data for each filter
JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> file = sc.textFile(inFile).persist(StorageLevel.MEMORY_AND_DISK()); for(int p = 0; p < patterns.length; p++) { final String pattern = patterns[p]; JavaRDD<String> res = file.filter(new Function<String, Boolean>() { … }); }
11/18/2014 39 Flink - M. Balassi & Gy. Fora
Spark in-memory pinning
37 minutes 9 minutes
RDD is 100% in- memory Spark starts spilling RDD to disk
11/18/2014 40 Flink - M. Balassi & Gy. Fora
PageRank
- Dataset:
– Twitter Follower Graph – 41,652,230 vertices (users) – 1,468,365,182 edges (followings) – 12 GB input data
11/18/2014 41 Flink - M. Balassi & Gy. Fora
PageRank results
11/18/2014 42 Flink - M. Balassi & Gy. Fora
Why is there a difference?
- Lets have a look at the iteration times:
Flink average: 48 sec., Spark average: 99 sec.
11/18/2014 43 Flink - M. Balassi & Gy. Fora
PageRank on Flink with Delta Iterations
- The algorithm runs 60 iterations until convergence (runtime
includes convergence check)
100 minutes 8.5 minutes
11/18/2014 44 Flink - M. Balassi & Gy. Fora
Again, explaining the difference
On average, a (delta) interation runs for 6.7 seconds
Flink (Bulk): 48 sec. Spark (Bulk): 99 sec.
11/18/2014 45 Flink - M. Balassi & Gy. Fora
Streaming performance
11/18/2014 46 Flink - M. Balassi & Gy. Fora
Flink Roadmap
- Flink has a major release every 3 months,
with >=1 big-fixing releases in-between
- Finer-grained fault tolerance
- Logical (SQL-like) field addressing
- Python API
- Flink Streaming, Lambda architecture support
- Flink on Tez
- ML on Flink (e.g., Mahout DSL)
- Graph DSL on Flink
- … and more
11/18/2014 47 Flink - M. Balassi & Gy. Fora
11/18/2014 48 Flink - M. Balassi & Gy. Fora