Processing 10M samples/second to drive smart maintenance in complex - PowerPoint PPT Presentation

Processing 10M samples/second to drive smart maintenance in complex IIoT systems Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google

Charting library you just saw is open-sourced https://github.com/cognitedata/griff-react ▪ High performance charting of large time series ▪ Dynamic data loading ▪ No tight coupling to Cognite TSDB ▪ Uses React and d3 yarn add @cognite/griff-react Or npm i @cognite/griff-react

IoT & the data explosion 50 billion devices connected to internet by 2023 according to Statista (2018) [1]. Cognite currently covers 500 000 sensors, each producing one GB every two years [1] https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/ (2018)

Time series requirements ▪ Robustness Surely there must be an ▪ High volume of reads and writes off-the-shelf solution that ▪ Low latency satisfies this! ▪ Arbitrary granularity aggregates ▪ Efficient backfill ▪ Efficient sequential reads

Databases for IoT - two approaches Single node* Horizontally scaling * Often does master - slave, or other read-only replication, but not partitioning

OpenTSDB experiments ▪ No limit parameter on queries ▪ No batch inserts, so slow backfills ▪ Can lose incoming data points ▪ Aggregates not pre-computed on write Disclaimer: OpenTSDB experiments from summer 2017 on version 2.3.0

The case for Cloud Bigtable ▪ Fully managed ▪ 10k writes/s per node (SSD) ▪ Scalable to 100s of PBs ▪ Can scan forward efficiently ▪ Column families and versioning

A brief introduction to Google Cloud Bigtable Achieve your Serve global From DevOps Supercharge your pergormance goals audiences to NoOps applications Single digit ms 99.99% availability Reduce management Stream, secure, write latency for across Google’s effort from weeks analyze and drive ML/AI performance-critical apps dedicated network to minutes

Wide-columnar data model NoSQL (no-join) Column-Family-1 Column-Family-2 distributed key-value store, designed to scale-out Row Key Column- Column- Column- Column- Has only one index (the Qualifier-1 Qualifier-2 Qualifier-1 Qualifier-2 row-key) Supporus atomic r1 r1, cf1:cq1 r1, cf1:cq2 r1, cf2:cq1 r1, cf2:cq2 single-row transactions Sparse: Unwrituen cells do not take up any space r2 r2, cf1:cq1 r2, cf1:cq2 r2, cf2:cq1 r2, cf2:cq2

Three-dimensional data space Every cell is versioned (default is Row Key CF:CQ timestamp on server) Confjgurable garbage collection retains latest N versions (or afuer “r1” value @ time(latest) TTL) value @ time(previous) value @ time(earliest available) Expiration can be set at column-family level

Cloud Bigtable - Optimizing throughput Cloud Bigtable separates Rebalancing automatically User-driven resizing as needed processing from storage reduces the load on highly to match data throughput through use of nodes, each of active nodes (in this case targets, with no downtime which provides access to a there is a lot of activity for group of database rows data group A) Clients Routing Layer Processing Node Node Node Storage A B C D A B C D A B C D Original Setup Rebalancing Resizing

Cloud Bigtable replication Regional replication ● SLA increased to 99.99% ● Isolate serving and analytics 3 Finland ● Independently scale clusters Netherlands 3 3 3 Oregon London 3 3 ● Automatic failover in case of a Montréal 3 3 Zurich Salt Lake City Belgium Iowa 4 Seoul 3 Tokyo 3 3 3 Los Angeles 3 N. Virginia zonal failure 3 Osaka S. Carolina 3 Hong Kong 3 Taiwan 3 Global replication Mumbai ● Increases durability/availability 3 Singapore 3 beyond one region ● Fastest region-specific access 3 São Paulo ● Option for DR replica for regulated 3 Sydney customers Current regions Future regions and number of zones and number of zones

Cloud Bigtable for IoT - best practices Recommendations for row key design Recommendations for data column design Use tall and narrow tables Rows can be big but are not infinite (1000 timestamp/value pairs per row is a good rule of thumb) Prefer rows to column versions Keep related data in the same table; keep unrelated data in different tables Design your row key with your queries in mind Store data you will access in a single query in a single column family Ensure that your row key avoids hotspotting Don’t exploit atomicity of single rows Reverse timestamps only when necessary

How Cognite stores data in Cloud Bigtable This is the only thing you can lookup, Row key but can also scan forward Group by “Customer1-Sensor1-2018-07-24-01” customer ID, sensor ID first “Customer1-Sensor1-2018-07-24-02” “Customer1-Sensor2-2018-01-01-01” Then chronologically “Customer1-Sensor2-2018-01-01-02”

Hotspotting

Improved key schema Row key Group by <hash of sensor id><customer id><sensor id><time bucket> sensor ID first Then chronologically

How Cognite stores data in Cloud Bigtable Row key Column family:qualifier 1000, 2000, 3000, ... “Sensor1-2018072412” “ts:pressure” “val:pressure” “Sensor2-2018072412” “ts:flowrate” “val:flowrate” “Sensor3-2018072412” “val:flowrate” “val:flowrate”

How Cognite stores data in Cloud Bigtable Row key Column family:qualifier 27.5, 27.8, 28.3... “Sensor1-2018072412” “ts:pressure” “val:pressure” “Sensor2-2018072412” “ts:flowrate” “val:flowrate” “Sensor3-2018072412” “val:flowrate” “val:flowrate”

System performance Performance: ▪ Throughput : Handles up to 10M Ambassador API Cloud Load gateway data points per second Balancing Kubernetes Engine Sensor Multiple Instances source ▪ Latency : Data queryable after 200ms (99th percentile) API node Raw queue Cloud Kubernetes Engine Pub/Sub Multiple Instances TSDB writer Aggregates Kubernetes Engine Cloud Pub/Sub Multiple Instances TSDB aggregator TSDB Kubernetes Engine Cloud Bigtable Multiple Instances

Protobuf vs JSON

Machine learning

Unsupervised anomaly detection Forecasting Clustering

Unsupervised detection with AutoEncoders Architecture search.... … to learn a parameterization of normality Sensors 2-N Distance to normal Sensor 1

Machine learning architecture Process Analyze Periodic run API gateway Raw queue Cloud scheduler Kubernetes Engine Cloud Pub/Sub Multiple Instances Make predictions ML Engine Aggregates TSDB writer Cloud Kubernetes Engine Pub/Sub Multiple Instances TSDB aggregator Kubernetes Engine TSDB Multiple Instances Cloud Bigtable

Future improvements ▪ Ability to query consistent snapshots back in time ▪ High frequency time series ▪ Efficient latest data point query

Next steps Cloud Bigtable ▪ cloud.google.com/bigtable ▪ cloud.google.com/bigtable/docs/schema-design-time-series Machine learning ▪ cloud.google.com/products/ai

Rate today ’s session Session page on conference website O’Reilly Events App

Processing 10M samples/second to drive smart maintenance in complex - PowerPoint PPT Presentation

Processing 10M samples/second to drive smart maintenance in complex IIoT systems Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google DEMO Charting library you just saw is open-sourced

SMART ENERGY SMART ASSET SMART SMART SMART & CUSTOMER ASSET PURPOSE PEOPLE

First Steps Towards the AEI 10m Prototype Single Arm Test Auto Alignment Sean Leavey and the AEI

Samples Advertising of samples and handing out samples Advertising Education and Assurance

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Smart and Adaptive Cyber-Physical Systems Chapters 1,2 Cyber-Physical Systems Smart mobility

1102 Commercial Drive Heritage Designation 1102 Commercial Drive The Florida Market at 1102

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

Quality of Life - Smart Mobility - Smart Infrastructure - Smart People, Smart Living ARC 590

CONTENTS Smart Schools Bond Act Committees and the Smart Schools Investment Plan Smart Schools

Packet-Level Signatures for Smart Home Devices Rahmadi Trimananda, Janus Varmarken, Athina

COUNCIL MEETING January 31, 2019 AGENDA TEAM INTRODUCTION 5m BACKGROUND 15m ASSESSMENT 10m

2019-20 Roading Programme 2019-20 Roading Programme Maintenance. Budget $5.3 m Sealed

Rituxan Maintenance vs. No Maintenance No maintenance is needed if you respond well initially

Protecting your Photos Mike Richards Typical Installation Laptop and basic desktop System Drive

Google Drive: Share V0 18 Apr 2020 V0 1 V0 2 2020 Google Drive: Share 2020 Google Drive:

Do Dont m make t them g guess How to improve your architecture visualizations OReilly

EVALUATION OF MIDAS FOR PRODUCTION OF ORNAMENTAL COCKSCOMB ( Celosia argentea ) IN FLORIDA

Life Support Systems Microbial Challenges August 24, 2009 Monsi C. Roman NASA/ Marshall Space

Elective II: BBA, BBA (Hon.) &B. Com. Speaking and Presentation Skills Subject Code: 04SL0103

McGill University Best Practices and Approaches New Thinking in Lab Design P .R.B.

SANERGY A concept for a sustainable remediation G.H. Pelgrum, Manager Soil Group August 7, 2008

Converting a biomarker to a surrogate What should it take? AstraZeneca Priority List- case

Presentation with sound Example for Beamer Tatiana University of Leicester August 10, 2009

Processing 10M samples/second to drive smart maintenance in complex - PowerPoint PPT Presentation

Processing 10M samples/second to drive smart maintenance in complex IIoT systems Geir Engdahl - CTO, Cognite Daniel Berqvist - Developer Advocate, Google DEMO Charting library you just saw is open-sourced

SMART ENERGY SMART ASSET SMART SMART SMART &amp; CUSTOMER ASSET PURPOSE PEOPLE

First Steps Towards the AEI 10m Prototype Single Arm Test Auto Alignment Sean Leavey and the AEI

Samples Advertising of samples and handing out samples Advertising Education and Assurance

-Samples [AB98] Hyp: domain S is a smooth curve or surface. S 1 -Samples [AB98] Hyp:

Business Statistics CONTENTS Comparing two samples Comparing two unrelated samples Comparing

Smart and Adaptive Cyber-Physical Systems Chapters 1,2 Cyber-Physical Systems Smart mobility

1102 Commercial Drive Heritage Designation 1102 Commercial Drive The Florida Market at 1102

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

Quality of Life - Smart Mobility - Smart Infrastructure - Smart People, Smart Living ARC 590

CONTENTS Smart Schools Bond Act Committees and the Smart Schools Investment Plan Smart Schools

Packet-Level Signatures for Smart Home Devices Rahmadi Trimananda, Janus Varmarken, Athina

COUNCIL MEETING January 31, 2019 AGENDA TEAM INTRODUCTION 5m BACKGROUND 15m ASSESSMENT 10m

2019-20 Roading Programme 2019-20 Roading Programme Maintenance. Budget $5.3 m Sealed

Rituxan Maintenance vs. No Maintenance No maintenance is needed if you respond well initially

Protecting your Photos Mike Richards Typical Installation Laptop and basic desktop System Drive

Google Drive: Share V0 18 Apr 2020 V0 1 V0 2 2020 Google Drive: Share 2020 Google Drive:

Do Dont m make t them g guess How to improve your architecture visualizations OReilly

EVALUATION OF MIDAS FOR PRODUCTION OF ORNAMENTAL COCKSCOMB ( Celosia argentea ) IN FLORIDA

Life Support Systems Microbial Challenges August 24, 2009 Monsi C. Roman NASA/ Marshall Space

Elective II: BBA, BBA (Hon.) &amp;B. Com. Speaking and Presentation Skills Subject Code: 04SL0103

McGill University Best Practices and Approaches New Thinking in Lab Design P .R.B.

SANERGY A concept for a sustainable remediation G.H. Pelgrum, Manager Soil Group August 7, 2008

Converting a biomarker to a surrogate What should it take? AstraZeneca Priority List- case

Presentation with sound Example for Beamer Tatiana University of Leicester August 10, 2009

SMART ENERGY SMART ASSET SMART SMART SMART & CUSTOMER ASSET PURPOSE PEOPLE

Elective II: BBA, BBA (Hon.) &B. Com. Speaking and Presentation Skills Subject Code: 04SL0103