Processing 10M samples/second to drive smart maintenance in complex IIoT systems
Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google
Processing 10M samples/second to drive smart maintenance in complex - - PowerPoint PPT Presentation
Processing 10M samples/second to drive smart maintenance in complex IIoT systems Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google DEMO Charting library you just saw is open-sourced
Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google
https://github.com/cognitedata/griff-react ▪ High performance charting of large time series ▪ Dynamic data loading ▪ No tight coupling to Cognite TSDB ▪ Uses React and d3 yarn add @cognite/griff-react Or npm i @cognite/griff-react
50 billion devices connected to internet by 2023 according to Statista (2018) [1]. Cognite currently covers 500 000 sensors, each producing one GB every two years
[1] https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/ (2018)
▪ Robustness ▪ High volume of reads and writes ▪ Low latency ▪ Arbitrary granularity aggregates ▪ Efficient backfill ▪ Efficient sequential reads
Surely there must be an
satisfies this!
Single node*
* Often does master - slave, or other read-only replication, but not partitioning
Horizontally scaling
Disclaimer: OpenTSDB experiments from summer 2017 on version 2.3.0
Supercharge your applications
Stream, secure, analyze and drive ML/AI
From DevOps to NoOps
Reduce management effort from weeks to minutes
Achieve your pergormance goals
Single digit ms write latency for performance-critical apps
Serve global audiences
99.99% availability across Google’s dedicated network
NoSQL (no-join) distributed key-value store, designed to scale-out Has only one index (the row-key) Supporus atomic single-row transactions Sparse: Unwrituen cells do not take up any space Column-Family-1 Column-Family-2 Row Key Column- Qualifier-1 Column- Qualifier-2 Column- Qualifier-1 Column- Qualifier-2 r1 r1, cf1:cq1 r1, cf1:cq2 r1, cf2:cq1 r1, cf2:cq2 r2 r2, cf1:cq1 r2, cf1:cq2 r2, cf2:cq1 r2, cf2:cq2
Row Key CF:CQ “r1” Every cell is versioned (default is timestamp on server) Confjgurable garbage collection retains latest N versions (or afuer TTL) Expiration can be set at column-family level
value @ time(latest) value @ time(previous) value @ time(earliest available)
Cloud Bigtable separates processing from storage through use of nodes, each of which provides access to a group of database rows Rebalancing automatically reduces the load on highly active nodes (in this case there is a lot of activity for data group A) User-driven resizing as needed to match data throughput targets, with no downtime
Node
A B C D
Clients Original Setup Processing Storage
Node Node
B C D
Resizing
A A B C D
Rebalancing
Routing Layer
Regional replication
zonal failure Global replication
beyond one region
customers
3 Seoul 3 3 Salt Lake City Current regions and number of zones Future regions and number of zones Mumbai Singapore Sydney Tokyo Osaka Hong Kong 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 Oregon Los Angeles Iowa
Montréal São Paulo Finland Zurich 3 Belgium London Netherlands Taiwan
Recommendations for row key design Recommendations for data column design Use tall and narrow tables Rows can be big but are not infinite (1000 timestamp/value pairs per row is a good rule of thumb) Prefer rows to column versions Keep related data in the same table; keep unrelated data in different tables Design your row key with your queries in mind Store data you will access in a single query in a single column family Ensure that your row key avoids hotspotting Don’t exploit atomicity of single rows Reverse timestamps only when necessary
“Customer1-Sensor1-2018-07-24-01”
Row key
“Customer1-Sensor1-2018-07-24-02” “Customer1-Sensor2-2018-01-01-01” This is the only thing you can lookup, but can also scan forward “Customer1-Sensor2-2018-01-01-02” Group by customer ID, sensor ID first Then chronologically
<hash of sensor id><customer id><sensor id><time bucket>
Row key
Group by sensor ID first Then chronologically
“Sensor1-2018072412”
Row key
“ts:pressure” “val:pressure” “Sensor2-2018072412” “ts:flowrate” “val:flowrate” “Sensor3-2018072412” “val:flowrate” Column family:qualifier 1000, 2000, 3000, ... “val:flowrate”
“Sensor1-2018072412”
Row key
“ts:pressure” “val:pressure” “Sensor2-2018072412” “ts:flowrate” “val:flowrate” “Sensor3-2018072412” “val:flowrate” Column family:qualifier 27.5, 27.8, 28.3... “val:flowrate”
Performance:
▪ Throughput: Handles up to 10M
data points per second
▪ Latency: Data queryable after
200ms (99th percentile)
API node
Kubernetes Engine
Multiple Instances
Cloud Load Balancing
TSDB
Cloud Bigtable
TSDB writer
Kubernetes Engine
Multiple Instances
Aggregates
Cloud Pub/Sub
TSDB aggregator
Kubernetes Engine
Multiple Instances
Sensor source
Ambassador API gateway
Kubernetes Engine
Multiple Instances
Raw queue
Cloud Pub/Sub
Forecasting Clustering
Architecture search.... … to learn a parameterization of normality
Sensor 1 Sensors 2-N Distance to normal
API gateway
Kubernetes Engine
Multiple Instances
TSDB
Cloud Bigtable
Raw queue
Cloud Pub/Sub
TSDB writer
Kubernetes Engine
Multiple Instances
Aggregates
Cloud Pub/Sub
TSDB aggregator
Kubernetes Engine
Multiple Instances
Process Analyze
Periodic run
Cloud scheduler
Make predictions
ML Engine
▪ Ability to query consistent snapshots back in time ▪ High frequency time series ▪ Efficient latest data point query
Cloud Bigtable ▪ cloud.google.com/bigtable ▪ cloud.google.com/bigtable/docs/schema-design-time-series Machine learning ▪ cloud.google.com/products/ai
Session page on conference website O’Reilly Events App