Processing 10M samples/second to drive smart maintenance in complex - - PowerPoint PPT Presentation

processing 10m samples second to drive smart maintenance
SMART_READER_LITE
LIVE PREVIEW

Processing 10M samples/second to drive smart maintenance in complex - - PowerPoint PPT Presentation

Processing 10M samples/second to drive smart maintenance in complex IIoT systems Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google DEMO Charting library you just saw is open-sourced


slide-1
SLIDE 1

Processing 10M samples/second to drive smart maintenance in complex IIoT systems

Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

DEMO

slide-5
SLIDE 5

Charting library you just saw is open-sourced

https://github.com/cognitedata/griff-react ▪ High performance charting of large time series ▪ Dynamic data loading ▪ No tight coupling to Cognite TSDB ▪ Uses React and d3 yarn add @cognite/griff-react Or npm i @cognite/griff-react

slide-6
SLIDE 6

IoT & the data explosion

50 billion devices connected to internet by 2023 according to Statista (2018) [1]. Cognite currently covers 500 000 sensors, each producing one GB every two years

[1] https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/ (2018)

slide-7
SLIDE 7

Time series requirements

▪ Robustness ▪ High volume of reads and writes ▪ Low latency ▪ Arbitrary granularity aggregates ▪ Efficient backfill ▪ Efficient sequential reads

Surely there must be an

  • ff-the-shelf solution that

satisfies this!

slide-8
SLIDE 8

Databases for IoT - two approaches

Single node*

* Often does master - slave, or other read-only replication, but not partitioning

Horizontally scaling

slide-9
SLIDE 9

OpenTSDB experiments

▪ No limit parameter on queries ▪ No batch inserts, so slow backfills ▪ Can lose incoming data points ▪ Aggregates not pre-computed on write

Disclaimer: OpenTSDB experiments from summer 2017 on version 2.3.0

slide-10
SLIDE 10

The case for Cloud Bigtable

▪ Fully managed ▪ 10k writes/s per node (SSD) ▪ Scalable to 100s of PBs ▪ Can scan forward efficiently ▪ Column families and versioning

slide-11
SLIDE 11

A brief introduction to Google Cloud Bigtable

Supercharge your applications

Stream, secure, analyze and drive ML/AI

From DevOps to NoOps

Reduce management effort from weeks to minutes

Achieve your pergormance goals

Single digit ms write latency for performance-critical apps

Serve global audiences

99.99% availability across Google’s dedicated network

slide-12
SLIDE 12

Wide-columnar data model

NoSQL (no-join) distributed key-value store, designed to scale-out Has only one index (the row-key) Supporus atomic single-row transactions Sparse: Unwrituen cells do not take up any space Column-Family-1 Column-Family-2 Row Key Column- Qualifier-1 Column- Qualifier-2 Column- Qualifier-1 Column- Qualifier-2 r1 r1, cf1:cq1 r1, cf1:cq2 r1, cf2:cq1 r1, cf2:cq2 r2 r2, cf1:cq1 r2, cf1:cq2 r2, cf2:cq1 r2, cf2:cq2

slide-13
SLIDE 13

Three-dimensional data space

Row Key CF:CQ “r1” Every cell is versioned (default is timestamp on server) Confjgurable garbage collection retains latest N versions (or afuer TTL) Expiration can be set at column-family level

value @ time(latest) value @ time(previous) value @ time(earliest available)

slide-14
SLIDE 14

Cloud Bigtable separates processing from storage through use of nodes, each of which provides access to a group of database rows Rebalancing automatically reduces the load on highly active nodes (in this case there is a lot of activity for data group A) User-driven resizing as needed to match data throughput targets, with no downtime

Node

A B C D

Clients Original Setup Processing Storage

Node Node

B C D

Resizing

A A B C D

Rebalancing

Routing Layer

Cloud Bigtable - Optimizing throughput

slide-15
SLIDE 15

Regional replication

  • SLA increased to 99.99%
  • Isolate serving and analytics
  • Independently scale clusters
  • Automatic failover in case of a

zonal failure Global replication

  • Increases durability/availability

beyond one region

  • Fastest region-specific access
  • Option for DR replica for regulated

customers

3 Seoul 3 3 Salt Lake City Current regions and number of zones Future regions and number of zones Mumbai Singapore Sydney Tokyo Osaka Hong Kong 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 3 3 Oregon Los Angeles Iowa

  • S. Carolina
  • N. Virginia

Montréal São Paulo Finland Zurich 3 Belgium London Netherlands Taiwan

Cloud Bigtable replication

slide-16
SLIDE 16

Cloud Bigtable for IoT - best practices

Recommendations for row key design Recommendations for data column design Use tall and narrow tables Rows can be big but are not infinite (1000 timestamp/value pairs per row is a good rule of thumb) Prefer rows to column versions Keep related data in the same table; keep unrelated data in different tables Design your row key with your queries in mind Store data you will access in a single query in a single column family Ensure that your row key avoids hotspotting Don’t exploit atomicity of single rows Reverse timestamps only when necessary

slide-17
SLIDE 17

How Cognite stores data in Cloud Bigtable

“Customer1-Sensor1-2018-07-24-01”

Row key

“Customer1-Sensor1-2018-07-24-02” “Customer1-Sensor2-2018-01-01-01” This is the only thing you can lookup, but can also scan forward “Customer1-Sensor2-2018-01-01-02” Group by customer ID, sensor ID first Then chronologically

slide-18
SLIDE 18

Hotspotting

slide-19
SLIDE 19

Improved key schema

<hash of sensor id><customer id><sensor id><time bucket>

Row key

Group by sensor ID first Then chronologically

slide-20
SLIDE 20
slide-21
SLIDE 21

How Cognite stores data in Cloud Bigtable

“Sensor1-2018072412”

Row key

“ts:pressure” “val:pressure” “Sensor2-2018072412” “ts:flowrate” “val:flowrate” “Sensor3-2018072412” “val:flowrate” Column family:qualifier 1000, 2000, 3000, ... “val:flowrate”

slide-22
SLIDE 22

How Cognite stores data in Cloud Bigtable

“Sensor1-2018072412”

Row key

“ts:pressure” “val:pressure” “Sensor2-2018072412” “ts:flowrate” “val:flowrate” “Sensor3-2018072412” “val:flowrate” Column family:qualifier 27.5, 27.8, 28.3... “val:flowrate”

slide-23
SLIDE 23

System performance

Performance:

▪ Throughput: Handles up to 10M

data points per second

▪ Latency: Data queryable after

200ms (99th percentile)

API node

Kubernetes Engine

Multiple Instances

Cloud Load Balancing

TSDB

Cloud Bigtable

TSDB writer

Kubernetes Engine

Multiple Instances

Aggregates

Cloud Pub/Sub

TSDB aggregator

Kubernetes Engine

Multiple Instances

Sensor source

Ambassador API gateway

Kubernetes Engine

Multiple Instances

Raw queue

Cloud Pub/Sub

slide-24
SLIDE 24

Protobuf vs JSON

slide-25
SLIDE 25

Machine learning

slide-26
SLIDE 26

Unsupervised anomaly detection

Forecasting Clustering

slide-27
SLIDE 27

Unsupervised detection with AutoEncoders

Architecture search.... … to learn a parameterization of normality

Sensor 1 Sensors 2-N Distance to normal

slide-28
SLIDE 28

Machine learning architecture

API gateway

Kubernetes Engine

Multiple Instances

TSDB

Cloud Bigtable

Raw queue

Cloud Pub/Sub

TSDB writer

Kubernetes Engine

Multiple Instances

Aggregates

Cloud Pub/Sub

TSDB aggregator

Kubernetes Engine

Multiple Instances

Process Analyze

Periodic run

Cloud scheduler

Make predictions

ML Engine

slide-29
SLIDE 29

Future improvements

▪ Ability to query consistent snapshots back in time ▪ High frequency time series ▪ Efficient latest data point query

slide-30
SLIDE 30

Next steps

Cloud Bigtable ▪ cloud.google.com/bigtable ▪ cloud.google.com/bigtable/docs/schema-design-time-series Machine learning ▪ cloud.google.com/products/ai

slide-31
SLIDE 31

Q&A

slide-32
SLIDE 32

Rate today ’s session

Session page on conference website O’Reilly Events App