SLIDE 1 VISUALISING REAL TIME TRAFFIC DATA USING ELASTICSEARCH AND C3JS @jettroCoenradie
Trifork Amsterdam Case Study ANWB (Royal Dutch Automobile Association)
SLIDE 2 FACT SHEET
Jettro Coenradie Software engineer @ Trifork specialised in search Twitter @jettroCoenradie @gridshore Gihub https://github.com/jettro Linkedin https://www.linkedin.com/in/jettro Blogs http://www.gridshore.nl http://blog.trifork.com/author/jettro/
SLIDE 3
GOAL
Ideas for combining (open) data Evaluate options and performance
SLIDE 4 WHAT IS ANWB?
- Dutch Automobile Driver Assistance
- Sister from:
FDM (Danmark) ADAC (Germany) AA (England)
SLIDE 5
Algemene Nederlandse Wieler Bond General Dutch Bicycle Association Founded in 1883 as
SLIDE 6 WHAT IS ELASTICSEARCH
- Distributed / Scalable search
- Structured and full-text
- Data analytics
- Log analysis
SLIDE 7
(OPEN) DATA
Real time traffic data Weather data Automobile Assistance data
SLIDE 8
GOAL FOR THE PROJECT
Amount of cars on the roads Traffic intensity on the roads Wrong data
SLIDE 9
SLIDE 10 FLOW OF THE PROJECT
- Get to know the data: Logstash / Kibana
- Start improving data quality
- Present data using our own charts
SLIDE 11
SLIDE 12
TECHNICAL OVERVIEW
Data view Data integration Data Store Tomcat - Spring mvc - c3js Spring Integration xml / csv elasticsearch
SLIDE 13
DEMO
SLIDE 14
SLIDE 15
Index A Index B Index C Shard 1 Shard 2 Shard R 1 Shard R 2 Lucene Lucene Lucene Lucene
SLIDE 16
Strings Numbers Dates Geo points
TIME BASED INDICES
NDW
SLIDE 17
TIME BASED INDICES
NDW-2014-09-15 NDW-2014-09-16 NDW-2014-09-17 mapping-template NDW Alias
SLIDE 18 SCHEMA-LESS
- There is always a schema
- The schema can be dynamic
- Often you want to be specific
Dates / Numbers / Geo locations
Dynamic schema
SLIDE 19
SEARCH
Full text search Structured search Versus
SLIDE 20 STRUCTURED SEARCH
- Can be cached most of the time
- No scoring
- Fast
Filters
SLIDE 21 FILTERS WE USED
- Range filters
- Term filters
- Composite (bool) filters
SLIDE 22
Range Filter Date Range Filter Term Filter
SLIDE 23 AGGREGATIONS
- Create buckets of data
- Compute Metrics
Two types of aggregations
SLIDE 24 Doc Doc Doc Set of documents Condition
Bucket Bucket Bucket Bucket
Term: red, blue, green, yellow Range: 0-10, 10-20, 20-30, 30-40
SLIDE 25
D D Set of documents
SLIDE 26 AGGREGATIONS WE USED
- Date histogram aggregations
- Terms aggregations
- AVG aggregations
SLIDE 27
Date Histogram Aggregation + AVG metric Aggregation
SLIDE 28
Terms Aggregation
SLIDE 29 GEO LOCATIONS
Two types of locations
- Using latitude and longitude
- Using geohash (creating a grid)
SLIDE 30 GEO LAT/LON
- Used for distance based queries
- Used for distance based aggregations
SLIDE 31 GEO HASH
- Uses a hash te represent a square
- More characters means more precision
SLIDE 32
GEOHASH
http://www.bigdatamodeling.org/2013/01/intuitive-geohash.html
SLIDE 33
PERCOLATOR
“The opposite of executing a query and finding results”
SLIDE 34
PERCOLATOR
“Match an (existing) document against stored queries.”
SLIDE 35
PERCOLATOR
Geo polygon filter Zuid-West Noord-West Noord-Oost Zuid { location: [ 3.5123, 46.3412 ] } Zuid-West
SLIDE 36
SLIDE 37 QUESTIONS
@jettroCoenradie