Data At Rest … Data In Motion!
A Lambda Architecture Overview
Data At Rest Data In Motion! A Lambda Architecture Overview When - - PowerPoint PPT Presentation
Data At Rest Data In Motion! A Lambda Architecture Overview When Things Go Wrong http://xkcd.com/327/ Fault T olerance !!!! Fault T olerance Developer Software Hardware Data Collection Three T ypes Of Data Streams Structured
A Lambda Architecture Overview
http://xkcd.com/327/
Developer Software Hardware
Three T ypes Of Data Streams Structured (Databases ...) Semi Structured (JSON, XML, XAML ...) UnStructured (Blogs, E-Mails, Log Files ...)
Fault-tolerant against both hardware failures and human errors Support variety of use cases that include low latency querying as well as updates Linear scale-out capabilities Extensible, so that the system is manageable and can accommodate newer features easily
NEW DATA STREAM IMMUTABLE MASTER DATA PROCESS STREAM PRECOMPUTE VIEWS INCREMENT VIEWS QUERY
View 2 View N View 1 View 1 View 2 View N BATCH RECOMPUTE REAL-TIME INCREMENT MERGE
NEW DATA STREAM IMMUTABLE MASTER DATA PROCESS STREAM PRECOMPUTE VIEWS INCREMENT VIEWS QUERY
View 2 View N View 1 View 1 View 2 View N BATCH RECOMPUTE REAL-TIME INCREMENT MERGE BATCH LAYER SERVING LAYER SPEED LAYER
Batch Layer Managing the master data set, an immutable, append only set of raw data. Pre computing arbitrary query functions, called batch views.
Serving Layer Indexes batch views so that they can be queried in ad hoc with low latency. Merges and reconciles batch and real time views.
Speed Layer Accommodates all requests that are subject to low latency requirements. Using fast and incremental algorithms, deals with recent data only.
Data absorbed into Batch Views Not yet absorbed
Time
Data absorbed into Batch Views Not yet absorbed
Time
Just a few hours of data Now
Times tamp Airpor t Flight Action 2015- 01- 01T10: 00:0 DUB EL123 take-
2015- 01- 01T10: 05:0 HEL SA45 take-
2015- 01- 01T10: 07:0 AMS BA99 take-
2015- 01- LHR LH17 landin g Immutable Master Dataset
Timestamp Airport Flight Action 2015-01- 01T10:00:0 DUB EL123 take-off 2015-01- 01T10:05:0 HEL SA45 take-off 2015-01- 01T10:07:0 AMS BA99 take-off 2015-01- 01T10:09:0 LHR LH17 landing 2015-01- 01T10:10:0 CDG AF03 landing 2015-01- 01T10:11:0 FCO AZ501 take-off
Immutable Master Dataset
Map Reduce
air borne: 2307
Map Reduce Map Reduce
airport load: air borne per airline:
Airport Planes AMS 44 LHR 69 Airline SAS BA
NEW DATA STREAM IMMUTABLE MASTER DATA PROCESS STREAM PRECOMPUTE VIEWS INCREMENT VIEWS QUERY
View 2 View N View 1 View 1 View 2 View N BATCH RECOMPUTE REAL-TIME INCREMENT MERGE
NEW DATA STREAM Hadoop HDFS Apache Kafka Apache Hive Apache Spark
HBase HBase HBase Storm Bolt Storm Bolt Storm Bolt BATCH RECOMPUTE REAL-TIME INCREMENT MERGE