TimescaleDB: Re-engineering PostgreSQL as a time-series database - PowerPoint PPT Presentation

TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder / CTO, Timescale   Professor of Computer Science, Princeton mike@timescale.com · github.com/timescale · Apache 2 License

Financial & Datacenter Marketing & DevOps Time-series Data is Emerging Web / mobile Events Everywhere Transportation & Logistics Industrial Machines

So, what is time-series data ?

What you have been told Name CPU Tags Host=Name,Region=West Data 1990-01-01 01:02:00 70   1990-01-01 01:03:00 71 1990-01-01 01:04:00 72 1990-01-01 01:04:00 73 1990-01-01 01:04:00 100

What you have been told Name CPU FreeMem Tags Host=Name,Region=West Host=Name,Region=West Data 1990-01-01 01:02:00 70   1990-01-01 01:02:00 800M   1990-01-01 01:03:00 71 1990-01-01 01:03:00 600M 1990-01-01 01:04:00 72 1990-01-01 01:04:00 400M 1990-01-01 01:04:00 73 1990-01-01 01:04:00 200M 1990-01-01 01:04:00 100 1990-01-01 01:04:00 0 2 time-series?

This is wrong

Time-series data has a richer structure Tags Host=Name,Region=West CPU MemFree Temp Data 1990-01-01 01:02:00   70 800M 80 1990-01-01 01:03:00 71 600M 81 1990-01-01 01:04:00 72 400M 82 1990-01-01 01:04:00 73 200M 83 1990-01-01 01:04:00 100 0 120

Fewer queries Tags Host=Name,Region=West CPU MemFree Temp Data 1990-01-01 01:02:00   70 800M 80 1990-01-01 01:03:00 71 600M 81 1990-01-01 01:04:00 72 400M 82 1990-01-01 01:04:00 73 200M 83 1990-01-01 01:04:00 100 0 120 select * where time = x

Complex filters Tags Host=Name,Region=West CPU MemFree Temp Data 1990-01-01 01:02:00   70 800M 80 1990-01-01 01:03:00 71 600M 81 1990-01-01 01:04:00 72 400M 82 1990-01-01 01:04:00 73 200M 83 1990-01-01 01:04:00 100 0 120 select * where temp > 100

Complex aggregates Tags Host=Name,Region=West CPU MemFree Temp Data 1990-01-01 01:02:00   70 800M 80 1990-01-01 01:03:00 71 600M 81 1990-01-01 01:04:00 72 400M 82 1990-01-01 01:04:00 73 200M 83 1990-01-01 01:04:00 100 0 120 avg(mem_free) group by (cpu/10)

Correlations Tags Host=Name,Region=West CPU MemFree Temp Data 1990-01-01 01:02:00   70 800M 80 1990-01-01 01:03:00 71 600M 81 1990-01-01 01:04:00 72 400M 82 1990-01-01 01:04:00 73 200M 83 1990-01-01 01:04:00 100 0 120 How does temp correlate with memfree?

Leverage relations CPU Host Region Data 1990-01-01 01:02:00   70 1 91 1990-01-01 01:03:00 71 2 91 1990-01-01 01:04:00 72 3 93 1990-01-01 01:04:00 73 4 93 1990-01-01 01:04:00 100 5 95 Region stored in separate host metadata table

How is this different than having a time field?

Treat changes as inserts , not overwrites.

You probably have time-series data and don’t even realize it

What database for time-series data?

What database for time-series data? Relational 32% NoSQL 68% https://www.percona.com/blog/2017/02/10/percona-blog-poll-database-engine-using-store-time-series-data/

Why so much NoSQL?

Why not use a simple relational database table ?

Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)

B-tree Insert Pain Insert batch: 8 17 5 1 10 20 1 10 13 24 25 29 IN MEMORY Memory Capacity: 2 NODES WRITE TO DISK

Challenge in scaling up • Indexes write to random parts of B-tree Device, Time DESC – Example: BTREE(device_id, time DESC) • As table grows large – Indexes no longer fit in memory – Writes to random locations in B-tree Device: A Device: Z – Causes swapping Time: 01:01:01 Time: 01:01:01

Why not just use a NoSQL DB ?

NoSQL champion: Log-Structured Merge Trees • Compressed data storage Key-value store with indexed key lookup at • Common approach for time series: high-write rates use key <name, tags, field, time> +

Log-Structured Merge Tree Insert batch: 8 17 5 1 1 2 3 3 4 5 5 7 7 9 9 10 10 11 Older Memtable sstables

Problems with non-KV queries If querying for key 5: stop after finding it in first sstable If querying for data in time range 5-10: 1 1 2 3 have to scan all sstables 3 5 4 5 8 5 7 Requires (in-memory) data structs to 17 7 9 9 efficiently map tags to keys   10 10 11 ⇒ high-cardinality issue Older No secondary index support   ⇒ “tag lock-in” sstables

NoSQL + LSMTs Come at a Cost • Less powerful queries • Weaker consistency (no ACID) • No JOINS • Loss of SQL ecosystem +

Is there a better way ?

TimescaleDB:   Scalable time-series database, full SQL Packaged as a PostgreSQL extension Apache 2 Licensed

TimescaleDB :   Scalable time-series database, full SQL Packaged as a PostgreSQL extension Easy to Use Scalable Reliable • Inherits 20+ years of Supports full SQL High write rates • • PostgreSQL reliability Time-oriented features Time-oriented optimizations • • • Streaming replication, Easy to manage: looks Fast complex queries • • backups, HA clustering like a regular table 100s billions rows / node • One DB for relational   • & time-series data

Time-series workloads are different.

OLTP Time-series • Primarily UPDATEs • Primarily INSERTs • Writes randomly distributed • Writes to recent time interval • Transactions to multiple   • Writes associated with a   primary keys timestamp and primary key

How it works

Time   (older)

Time-space partitioning   (for both scaling up & out) Time   (older) Intervals 1) manually specified 2) automatically adjusted

Time-space partitioning   (for both scaling up & out) Time   (older) Intervals 1) manually specified 2) automatically adjusted Space (hash partitioning)

Time-space partitioning   (for both scaling up & out) Time   Chunk (sub-table) (older) Intervals 1) manually specified 2) automatically adjusted Space (hash partitioning)

Time-space partitioning   (for both scaling up & out) Hypertable Time   Chunk (sub-table) (older) Intervals 1) manually specified 2) automatically adjusted Space (hash partitioning)

The Hypertable Abstraction Hypertable Indexes • Triggers • Constraints • Foreign keys • UPSERTs • Chunks Table mgmt •

PostgreSQL 10 doesn’t solve the problem   (more later)

Not well supported in PG10 Hypertable Indexes • Triggers • Constraints • Foreign keys • UPSERTs • Chunks Table mgmt •

Automatic Space-time Partitioning

Chunking benefits

Chunks are “right-sized” Recent (hot) chunks fit in memory

Single node: Scaling up via adding disks How Benefit Faster inserts • Chunks spread across many disks (elastically!) Parallelized queries either RAIDed or via distinct tablespaces •

Under development Multi-node: Scaling out via existing mechanisms • Chunks spread across servers • Insert/query to any server D e v e l o p m e n t • Distributed query optimizations (push-down LIMITs and aggregates, etc.)

Avoid querying chunks via constraint exclusion SELECT time, temp FROM data   WHERE time > now() - interval ‘7 days’   AND device_id = ‘12345’

Avoid querying chunks via constraint exclusion SELECT time, device_id, temp FROM data   WHERE time > ‘2017-08-22 18:18:00+00’

Avoid querying chunks via constraint exclusion SELECT time, device_id, temp FROM data   Plain Postgres WHERE time > now() - interval ’24 hours’ won’t exclude chunks

Efficient retention policies SELECT time, device_id, temp FROM data   Drop chunks, don’t delete rows WHERE time > now() - interval ’24 hours’ ⇒ avoids vacuuming

Is it just about performance?

Simplify your stack Application Application VS RDBMS NoSQL TimescaleDB   (with JOINS)

Rich Time Analytics

Geospatial Temporal Analysis (with PostGIS)

Data Retention + Aggregations Granularity raw 15 min day 1 week 1 month forever Retention

Enjoy the entire PostgreSQL ecosystem

TimescaleDB: Re-engineering PostgreSQL as a time-series database - PowerPoint PPT Presentation

TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder / CTO, Timescale Professor of Computer Science, Princeton mike@timescale.com github.com/timescale Apache 2 License Financial

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

DC investment trends Ian Sykes Willis T owers Watson Insert your logo here IAPF Annual DC

Resampling statistics and multiple testing STEPHANIE J. SPIELMAN, PHD BIO5312, FALL 2017 While

Tree models with Scikit-Learn Great learners with little assumptions Material:

Corrosion Potential of Soils For geotechnical engineers, its very important subject Metal

Healthy Food, Happier Children Healthy Meals in School and at Home Vision: Hearts of Service *

Learning from Unlabeled Video Carl Vondrick Columbia University Survivor Bias of Video Data

ACORN v3 A Lightweight Authenticated Cipher Hongjun Wu Nanyang Technological University DIAC

ACORN A Lightweight Authenticated Cipher Hongjun Wu Nanyang Technological University DIAC 2014

people people name name id id age age

An Overview of CAESAR Mridul Nandi Indian Statistical Institute, Kolkata SEPTEMBER 2016

before before before before

Mobility in UTB-SOI PFETS: Local Coordinate-Based Modeling with the Density Gradient Method

Nab: a precise study of unpolarized neutron beta decay Dinko Po cani c , (for the Nab

MODULE THREE Creativity is like electricity - you dont have to understand it to harness is

ConCert: A Smart Contract Certification Framework in Coq Danil Annenkov, Jakob Botsch Nielsen and

CSC 151 Spring 2020 Topic: Merge Sort May 4, 2020 Day 39 Self Checks Divide and Conquer

Page 1 Brief ARM History ARM=RISC, ColdFire=CISC? 1978 Acorn started Make

PostMessage Security in Chrome Extensions Arseny Reutov areutov@ptsecurity.com

CSC 311: Introduction to Machine Learning Lecture 1 - Introduction and Nearest Neighbors Roger

Improved Division Property Based Cube Attacks Exploiting Algebraic Properties of Superpoly Qingju

Some Sample LFBs: Netdevice, IPV4, and IPV6 Jamal Hadi Salim <hadi@znyx.com> Sample LFB

Welcome to our HIVE Peer Coaching Webinar! We will get started in a few minutes. In the

E D C C M M or A A K K 1) Move the coins out of E make it deterministic [RBBK0 1]

TimescaleDB: Re-engineering PostgreSQL as a time-series database - PowerPoint PPT Presentation

TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder / CTO, Timescale Professor of Computer Science, Princeton mike@timescale.com github.com/timescale Apache 2 License Financial

Building a scalable time-series database using Postgres Mike Freedman Co-founder / CTO,

DC investment trends Ian Sykes Willis T owers Watson Insert your logo here IAPF Annual DC

Resampling statistics and multiple testing STEPHANIE J. SPIELMAN, PHD BIO5312, FALL 2017 While

Tree models with Scikit-Learn Great learners with little assumptions Material:

Corrosion Potential of Soils For geotechnical engineers, its very important subject Metal

Healthy Food, Happier Children Healthy Meals in School and at Home Vision: Hearts of Service *

Learning from Unlabeled Video Carl Vondrick Columbia University Survivor Bias of Video Data

ACORN v3 A Lightweight Authenticated Cipher Hongjun Wu Nanyang Technological University DIAC

ACORN A Lightweight Authenticated Cipher Hongjun Wu Nanyang Technological University DIAC 2014

people people name name id id age age

An Overview of CAESAR Mridul Nandi Indian Statistical Institute, Kolkata SEPTEMBER 2016

before before before before

Mobility in UTB-SOI PFETS: Local Coordinate-Based Modeling with the Density Gradient Method

Nab: a precise study of unpolarized neutron beta decay Dinko Po cani c , (for the Nab

MODULE THREE Creativity is like electricity - you dont have to understand it to harness is

ConCert: A Smart Contract Certification Framework in Coq Danil Annenkov, Jakob Botsch Nielsen and

CSC 151 Spring 2020 Topic: Merge Sort May 4, 2020 Day 39 Self Checks Divide and Conquer

Page 1 Brief ARM History ARM=RISC, ColdFire=CISC? 1978 Acorn started Make

PostMessage Security in Chrome Extensions Arseny Reutov areutov@ptsecurity.com

CSC 311: Introduction to Machine Learning Lecture 1 - Introduction and Nearest Neighbors Roger

Improved Division Property Based Cube Attacks Exploiting Algebraic Properties of Superpoly Qingju

Some Sample LFBs: Netdevice, IPV4, and IPV6 Jamal Hadi Salim &lt;hadi@znyx.com&gt; Sample LFB

Welcome to our HIVE Peer Coaching Webinar! We will get started in a few minutes. In the

E D C C M M or A A K K 1) Move the coins out of E make it deterministic [RBBK0 1]

Some Sample LFBs: Netdevice, IPV4, and IPV6 Jamal Hadi Salim <hadi@znyx.com> Sample LFB