TimescaleDB:
Re-engineering PostgreSQL as a time-series database
Michael J. Freedman
Co-founder / CTO, Timescale Professor of Computer Science, Princeton
mike@timescale.com · github.com/timescale · Apache 2 License
TimescaleDB: Re-engineering PostgreSQL as a time-series database - - PowerPoint PPT Presentation
TimescaleDB: Re-engineering PostgreSQL as a time-series database Michael J. Freedman Co-founder / CTO, Timescale Professor of Computer Science, Princeton mike@timescale.com github.com/timescale Apache 2 License Financial
Michael J. Freedman
Co-founder / CTO, Timescale Professor of Computer Science, Princeton
mike@timescale.com · github.com/timescale · Apache 2 License
Financial & Marketing Industrial Machines Datacenter & DevOps
Web / mobile Events
Transportation & Logistics
Relational NoSQL
https://www.percona.com/blog/2017/02/10/percona-blog-poll-database-engine-using-store-time-series-data/
32% 68%
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
1 20 10 1 10 13 24 29 25 5
Insert batch:
17 8
Memory Capacity: 2 NODES
IN MEMORY WRITE TO DISK
1 20 10 1 10 13 24 29 25 5
Insert batch:
17 8
Memory Capacity: 2 NODES
IN MEMORY WRITE TO DISK
1 20 10 1 10 13 24 29 25
Insert batch:
8 5 17
Memory Capacity: 2 NODES
IN MEMORY WRITE TO DISK
10 13
1 20 10 1 24 29 25
Insert batch:
8 5 17
Memory Capacity: 2 NODES
IN MEMORY WRITE TO DISK
– Example: BTREE(device_id, time DESC)
– Indexes no longer fit in memory – Writes to random locations in B-tree – Causes swapping
Device: A Time: 01:01:01 Device: Z Time: 01:01:01
Device, Time DESC
Key-value store with indexed key lookup at high-write rates
+
5
Insert batch:
17 8
Memtable
1 3 4 5 7 9 10 1 2 3 5 7 9 10 11
sstables Older
Insert batch:
8 17 5
Memtable
1 3 4 5 7 9 10 1 2 3 5 7 9 10 11
sstables Older
Insert batch: Memtable
1 3 4 5 7 9 10 1 2 3 5 7 9 10 11
sstables Older
5 8 17
If querying for key 5: stop after finding it in first sstable
1 3 4 5 7 9 10 1 2 3 5 7 9 10 11
sstables Older
5 8 17
If querying for data in time range 5-10: have to scan all sstables Requires (in-memory) data structs to efficiently map tags to keys ⇒ high-cardinality issue No secondary index support ⇒ “tag lock-in”
+
Packaged as a PostgreSQL extension Apache 2 Licensed
Scalable
PostgreSQL reliability
backups, HA clustering
Reliable
like a regular table
& time-series data
Easy to Use
Packaged as a PostgreSQL extension
primary keys
timestamp and primary key
Time
(older)
(for both scaling up & out)
Time
(older)
Intervals
1) manually specified 2) automatically adjusted
(for both scaling up & out)
Space Time
(older) (hash partitioning)
Intervals
1) manually specified 2) automatically adjusted
(for both scaling up & out)
Chunk (sub-table)
Space Time
(older) (hash partitioning)
Intervals
1) manually specified 2) automatically adjusted
(for both scaling up & out)
Chunk (sub-table)
Hypertable
Space Time
(older) (hash partitioning)
Intervals
1) manually specified 2) automatically adjusted
Chunks
Hypertable
Chunks
Hypertable
Recent (hot) chunks fit in memory
How Benefit
Chunks spread across many disks (elastically!) either RAIDed or via distinct tablespaces
D e v e l
m e n t
Under development
(push-down LIMITs and aggregates, etc.)
SELECT time, temp FROM data WHERE time > now() - interval ‘7 days’ AND device_id = ‘12345’
SELECT time, device_id, temp FROM data WHERE time > ‘2017-08-22 18:18:00+00’
SELECT time, device_id, temp FROM data WHERE time > now() - interval ’24 hours’
Plain Postgres won’t exclude chunks
SELECT time, device_id, temp FROM data WHERE time > now() - interval ’24 hours’
(with JOINS)
Application Application
Granularity raw 15 min day Retention 1 week 1 month forever
TimescaleDB + PostgreSQL
Prometheus
Remote Storage Adapter
+ pg_prometheus Prometheus Grafana
METRICS / S
(single-row inserts)
I N S E R T S / S
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
(batch inserts)
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
METRICS / S
SPEEDUP
Table scans, simple column rollups
GROUPBYs
Time-ordered GROUPBYs
DELETEs
TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)
PG doesn’t know to use the index
CREATE INDEX ON readings(time); SELECT date_trunc(‘minute’, time) as bucket, avg(cpu) FROM readings GROUP BY bucket ORDER BY bucket DESC LIMIT 10;
Timescale understands
time
CREATE TABLE conditions ( time timestamptz, temp float, humidity float, device text ); SELECT create_hypertable('conditions', 'time', ‘device', 4, chunk_time_interval => interval '1 week’); INSERT INTO conditions VALUES ('2017-10-03 10:23:54+01', 73.4, 40.7, 'sensor3'); SELECT * FROM conditions; time | temp | humidity | device
2017-10-03 11:23:54+02 | 73.4 | 40.7 | sensor3
— Set new chunk time interval SELECT set_chunk_time_interval('conditions', interval '24 hours’); — Set new number of space partitions SELECT set_number_partitions('conditions', 6);
CREATE TABLE conditions ( time timestamptz, temp float, humidity float, device text ); CREATE TABLE conditions_p1 PARTITION OF conditions FOR VALUES FROM (MINVALUE) TO ('g') PARTITION BY RANGE (time); CREATE TABLE conditions_p2 PARTITION OF conditions FOR VALUES FROM ('g') TO ('n') PARTITION BY RANGE (time); CREATE TABLE conditions_p3 PARTITION OF conditions FOR VALUES FROM ('n') TO ('t') PARTITION BY RANGE (time); CREATE TABLE conditions_p4 PARTITION OF conditions FOR VALUES FROM ('t') TO (MAXVALUE) PARTITION BY RANGE (time);
CREATE TABLE conditions_p1_y2017m10w01 PARTITION OF conditions_p1 FOR VALUES FROM ('2017-10-01') TO ('2017-10-07'); CREATE TABLE conditions_p2_y2017m10w01 PARTITION OF conditions_p2 FOR VALUES FROM ('2017-10-01') TO ('2017-10-07'); CREATE TABLE conditions_p3_y2017m10w01 PARTITION OF conditions_p3 FOR VALUES FROM ('2017-10-01') TO ('2017-10-07'); CREATE TABLE conditions_p4_y2017m10w01 PARTITION OF conditions_p4 FOR VALUES FROM ('2017-10-01') TO (‘2017-10-07');
CREATE INDEX ON conditions_p1_y2017m10w01 (time); CREATE INDEX ON conditions_p2_y2017m10w01 (time); CREATE INDEX ON conditions_p3_y2017m10w01 (time); CREATE INDEX ON conditions_p4_y2017m10w01 (time); INSERT INTO conditions VALUES ('2017-10-03 10:23:54+01', 73.4, 40.7, ‘sensor3');
(single-row inserts)
ROWS
ROWS / SEC
CHUNKS
INTERVALS
ROWS
ROWS / SEC
CHUNKS
INTERVALS
– Avoid opening / locking all chunks when
performing constraint exclusion
– SortedAppend to avoid requiring at least one
tuple per chunk during MergeAppend / LIMIT
TimescaleDB 0.5, Cassandra 3.11.0, Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each TimescaleDB row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics) Each Cassandra row has 2 columns (1 key, combo of tags + host + timestamp)
SPEEDUP
Table scales simple column rollups
GROUPBYs
Time-ordered GROUPBYs
Lastpoint
TimescaleDB 0.5, Cassandra 3.11.0, Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each TimescaleDB row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics) Each Cassandra row has 2 columns (1 key, combo of tags + host + timestamp)
@timescale