- Cassandra for Time Series Data -
Joris Gillis, June 28, 2017
1
- Cassandra for Time Series Data - Joris Gillis, June 28, 2017 1 - - PowerPoint PPT Presentation
- Cassandra for Time Series Data - Joris Gillis, June 28, 2017 1 Joris Gillis I am a software engineer at TrendMiner and focus on the enterprise scalability of our industrial analytics platform. I studied at the University of Hasselt and the
Joris Gillis, June 28, 2017
1
I am a software engineer at TrendMiner and focus on the enterprise scalability of our industrial analytics platform. I studied at the University of Hasselt and the University of Antwerp in the field of Database theory and Data Mining. My interests are:
Joris Gillis
2
Agenda
3
About TrendMiner
TrendMiner is the Leading Modelling Free Industrial Analytics Platform to Analyze, Monitor and Predict Asset and Process Performance. With a proven track record in the (Petro-) Chemical and Oil & Gas industry to increase overall profitability, by improving production yield, lower costs, avoid unplanned process downtime, increase overall equipment efficiency and reduce safety risks.
4
About TrendMiner
About our company
5
Industry 4.0
CONNECTIVITY ADVANCED MANUFACTURING BIG DATA & ANALYTICS
Internet of Things Wearables Augmented Reality Optimization & Prediction Machine Learning Cyber Security Additive Manufacturing Advanced Materials Autonomous Robotics
Technologies that enable new ways of working and of doing business
6
About TrendMiner
About our software
7
About TrendMiner
8
About TrendMiner
9
About TrendMiner
10
Problem statement
Time series
minutes and 1 second
Complex analyses Plants across the globe
11
What is a Time Series?
12
Agenda
13
Why Cassandra
In-store analytics too limited for our needs Only HTTP interface Big Data => Big Index Horizontal scaling New technology
Why Cassandra
series data
underlying store
equidistant points
Agenda
16
How to model time series in Cassandra?
17
How to model time series in Cassandra?
Partitioning & Clustering
Map<byte[], SortedMap<Clustering, Row Partition Key Clustering columns Other columns 18
How to model time series in Cassandra?
CREATE TABLE temperature ( weatherstation_id uuid, event_time timestamp, temperature float, PRIMARY KEY (weatherstation_id, event_time) ); https://academy.datastax.com/resources/getting-started-time-series-data-modeling 19
How to model time series in Cassandra?
SELECT temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; 20
How to model time series in Cassandra?
CREATE TABLE temperature_by_day ( weatherstation_id uuid, day date, event_time timestamp, temperature float, PRIMARY KEY ((weatherstation_id, date), event_time) ); https://academy.datastax.com/resources/getting-started-time-series-data-modeling 21
How to model time series in Cassandra?
performance
22
Agenda
23
How to configure Cassandra for Time Series Data?
24
How to configure Cassandra for Time Series Data?
How Cassandra Writes Data
Write Data row1 row2 row3
Commit Log
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrudrow1 row2 row3 Index
SSTables http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html Disk Memory Flush MemTable
row1 row2 row3 Index row1 row2 row3 Index
Compaction
25
How to configure Cassandra for Time Series Data?
26
How to configure Cassandra for Time Series Data?
How Cassandra Reads Data (Simplified)
Partition Key Bloom Filter
row1 row2 row3 Index row1 row2 row3 Index row1 row2 row3 Index
27
How to configure Cassandra for Time Series Data?
28
How to configure Cassandra for Time Series Data?
29
How to configure Cassandra for Time Series Data?
T1 155MB T2 165MB T3 155MB T4 150MB
Compaction
T5 600MB
Total size: 625MB 30
How to configure Cassandra for Time Series Data?
Size tiered Compaction Example
T5 600MB T6 155MB T7 165MB
31
How to configure Cassandra for Time Series Data?
32
How to configure Cassandra for Time Series Data?
33
How to configure Cassandra for Time Series Data?
Levelled Compaction Example
L(0) L(1) L(2) 2 rows per SSTable 2 files in L(1) 20 files in L(2)
r1 r2
2
r3 r4
1
r3 r1
r4 r2
1 2 Compaction
Row Partition r1 #0 r2 #2 r3 #-1 r4 #1 r5 #-2 r6 #6 r7 #3 r8 #6 r9 #-1 r10 #-2
34
How to configure Cassandra for Time Series Data?
L(0) L(1) L(2) 2 rows per SSTable 2 files in L(1) 20 files in L(2)
r3 r1
r4 r2
1 2
r5 r6
6
r7 r8
3 6
r5 r3
r1 r4
1
r2 r7
2 3
r6 r8
6 6 Compaction
Row Partition r1 #0 r2 #2 r3 #-1 r4 #1 r5 #-2 r6 #6 r7 #3 r8 #6 r9 #-1 r10 #-2
35
How to configure Cassandra for Time Series Data?
L(0) L(1) L(2) 2 rows per SSTable 2 files in L(1) 20 files in L(2) Compaction
r5 r3
r1 r4
1
r2 r7
2 3
r6 r8
6 6
r2 r7
2 3
r6 r8
6 6
r5 r3
r1 r4
1
Row Partition r1 #0 r2 #2 r3 #-1 r4 #1 r5 #-2 r6 #6 r7 #3 r8 #6 r9 #-1 r10 #-2
36
How to configure Cassandra for Time Series Data?
Row Partition r1 #0 r2 #2 r3 #-1 r4 #1 r5 #-2 r6 #6 r7 #3 r8 #6 r9 #-1 r10 #-2
L(0) L(1) L(2) 2 rows per SSTable 2 files in L(1) 20 files in L(2) Compaction
r2 r7
2 3
r6 r8
6 6
r5 r3
r1 r4
1
r9 r10
r5 r3
r1 r4
1
r2 r7
2 3
r6 r8
6 6
r10 r9
37
How to configure Cassandra for Time Series Data?
Levelled Compaction Configuration Trade-off
Level Max Data size L(1) 1.6GB L(2) 16GB L(3) 160GB L(4) 1.6TB
38
How to configure Cassandra for Time Series Data?
Consistent read performance
39
How to configure Cassandra for Time Series Data?
Time Window Compaction
40
How to configure Cassandra for Time Series Data?
Compaction Comparison
41
How to configure Cassandra for Time Series Data?
42
How to configure Cassandra for Time Series Data?
cassandra-summit-2016
43
Thank you for attending! For more information you can contact: joris.gillis@trendminer.com or Visit our website www.trendminer.com