[PPT] - Databases Have Forgotten About Single Node Performance, A PowerPoint Presentation

SLIDE 1

Databases Have Forgotten  About Single Node Performance, A Wrongheaded Trade Off

Matvey Arye, PhD

Core Database Engineer

mat@timescale.com · github.com/timescale

SLIDE 2

SLIDE 3

How computing used to look

SLIDE 4

How computing looks today

SLIDE 5

@timescale

What happened?

SLIDE 6

Databases changed from  

scaling up to scaling out

SLIDE 7

We entered the MapReduce Era

SLIDE 8

But… MapReduce specializes in long jobs that touch lots of data

Grep: 150 Seconds
Sort: 890 Seconds
Average job completion time of

Google production workloads: 634 Seconds

(Source: Google MapReduce Paper)

SLIDE 9

Why the high-latency?

SLIDE 10

Network Latencies

Connection setup/teardown
TCP ramp-up
Network transfer

SLIDE 11

Consensus is Expensive

Dreaded two-phase commit
Snapshot Isolation even harder
Co-odinators often bottlenecks

SLIDE 12

The “Straggler” Problem

SLIDE 13

The high-latency of distributed databases is nothing new

Core s Twitter_rv uk_2007_5 Spark 128 1784s 8000s+ Giraph 128 200s 8000s+ GraphLab 128 242s 714s Graphx 128 251s 800s Laptop 1 153s 417s Laptop* 1 15s 30s Graph Connected Component Analysis

Source: Scalability! But at what COST? - McSherry et al.

Core s Twitter_r v uk_2007_5 Spark 128 857s 1759s Giraph 128 596s 1235s GraphLab 128 249s 833s Graphx 128 419s 462s Laptop 1 300s 651s Page Rank (20 iterations)

SLIDE 14

But that was 10 years ago

The world has changed

160 GB Spinning Rust 2 Core CPU 4GB RAM 60 TB SSDs 64 Core CPU TBs of RAM Edge computing GPUs, TPUs Then Now

SLIDE 15

Financial & Marketing Industrial Machines Datacenter & DevOps

And data needs have become real-time.

Web / Mobile Events

Transportation & Logistics

SLIDE 16

@timescale

SLIDE 17

Shorter latency requirements mean   we need to focus again on scaling up

SLIDE 18

So how do we improve single-node performance? 

(Lessons from TimescaleDB)

SLIDE 19

Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (premium LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)

SLIDE 20

Insight #1 Partition, even on a single machine.

SLIDE 21

Older

SLIDE 22

Older

SLIDE 23

Older

SLIDE 24

Older

SLIDE 25

Partitioning on a single node.

Chunk (sub-table)

Hypertable

Space Time  (older)

SLIDE 26

Chunks are “right-sized”

Recent (hot) chunks fit in memory

SLIDE 27

144K

METRICS / S

TimescaleDB vs. PostgreSQL

(single-row inserts)

14.4K

INSERTS / S

TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)

SLIDE 28

>20x

TimescaleDB vs. PostgreSQL

(batch inserts)

TimescaleDB 0.5, Postgres 9.6.2 on Azure standard DS4 v2 (8 cores), SSD (LRS storage) Each row has 12 columns (1 timestamp, indexed 1 host ID, 10 metrics)

1.11M

METRICS / S

SLIDE 29

Other benefits to partitioning

SLIDE 30

Avoid querying chunks via constraint exclusion

SELECT time, device_id, temp FROM data  WHERE time > ‘2017-08-22 18:18:00+00’

SLIDE 31

Efficient retention policies

Drop chunks, don’t delete rows ⇒ avoids vacuuming

SLIDE 32

Insight #2 Partition, even on a single machine  across many disks

SLIDE 33

Single node: Scaling up via adding disks

Faster inserts
Parallelized queries

How Benefit

Chunks spread across many disks (elastically!) either RAIDed or via distinct tablespaces

SLIDE 34

Elasticity on a single node

SLIDE 35

Insight #3 Query Optimizations

SLIDE 36

Avoid querying chunks via constraint exclusion (more)

SELECT time, device_id, temp FROM data  WHERE time > now() - interval ’24 hours’

Won’t exclude chunks in plain PostgreSQL

SLIDE 37

Example Query Optimization

CREATE INDEX ON readings(time); SELECT date_trunc(‘minute’, time) as bucket, avg(cpu) FROM readings GROUP BY bucket ORDER BY bucket DESC LIMIT 10;

Will this use the index?

SLIDE 38

Example Query Optimization

Timescale understands time

CREATE INDEX ON readings(time); SELECT date_trunc(‘minute’, time) as bucket, avg(cpu) FROM readings GROUP BY bucket ORDER BY bucket DESC LIMIT 10;

SLIDE 39

How to find bottlenecks?

4/20/2018

ut.full.10concurrency.cache.withwhere.svg

file:///Users/arye/Downloads/out.full.10concurrency.cache.withwhere.svg 1/1

Flame Graph

Function: postgres`standard_planner (110 samples, 90.91%) postgres`0x10f97ed8e li.. postgres`PostmasterMain pos.. postgr.. postgres`.. po.. postgres`PortalRun post.. postgres`build_simple_rel libsystem_kernel.d.. postgres`RelationGetNumberOfBlock.. postgresàdd_base_rels_to_query postgresàdd_base_rels_to_query po.. postgres`mdnblocks po.. post.. postgres`FileSeek postgres`mdnblocks postgresòp. postgres`grouping_planner postgres`_mdnblocks postgres`standard_ProcessUtility pos.. timescaledb0.10.0dev.so`timescaledb_ddl_command_start postgres`BackendStartup postgres`FillPortalStore postgres`get_relat.. postgres`set_base_rel_sizes postgr.. postgr.. postgres`FileAccess po.. postgres`0x10 postgres`pr. timescaledb0.10.0dev.so`prev_ProcessUtility po.. postgr.. postgresèstimate_rel_size postgres`_mdnblocks postgr.. postgr.. postgresèxec_simple_query pos.. po.. po.. libsys.. po.. postgres`make_one_rel po.. postgr.. postgres`query_planner pos.. postgres`FileAccess libsystem_kerne.. postgres`.. postgres`PostgresMain li.. po.. postgres`RelationGetNumbe.. postgres`pr. pos.. postgres`pr. postgres`LruInsert postgres`FileSeek po.. postgres`relation_excluded_by_constr.. postgresÈxplainQuery pos.. post.. postgres`pg_plan_query postgres`smgrnblocks postgres`build_simple_rel postgres`smgrnblocks pos.. postgr.. postgres`subquery_planner postgres`PortalRunUtility postgres`planner post.. postgres`ServerLoop postgres`LruInsert lib.. postgres`standard_planner postgres`ProcessUtility timescaledb0.10.0dev.so`timescaledb_planner postgres`pr. libdyld.dylib`start postgr.. postgres`set_append_rel_size postgres`0x10f89e519 postgr.. postgres`set_rel_size postgres`get_relation_info postgr.. postgresÈxplainOneQuery po.. po..

SLIDE 40

Insight #4 HA != Horizontally-distributed

SLIDE 41

Single Node + Replicas = High-availability

SLIDE 42

Replication much cheaper than horizontal- distribution

Avoids consensus penalty
Can more easily be asynchronous
Still provides HA

SLIDE 43

Results

SLIDE 44

Results vs Cassandra (5 nodes): Query

Type Scanned (h) Devices metrics Timescale query time (ms) Cassandra Multiplier groupby-hour 24 all 1 27866.50 9.66 groupby-hour 24 all 5 35827.60 37.52 groupby-hour 24 all 10 44912.30 45.05 high-cpu 24 all 1 49,807 32.95 cpu-max 12 8 10 126.90 11.84 cpu-max 12 1 10 20.54 37.41 high-cpu 24 1 1 56 18.65 groupby-minute 12 1 1 33.80 0.91 groupby-minute 1 8 1 23.50 1.97 groupby-minute 12 1 5 27.60 10.25 groupby-minute 1 8 5 20.40 17.90 lastpoint all all 10 266.10 2285.30 groupby-orderby-limit all all 1 75.20 3181.62

Higher multiplier indicates worse Cassandra performance.

SLIDE 45

Insert metrics/s Cassandra

150K

Timescale 745K

TimescaleDB vs. Cassandra  (5 nodes)

SLIDE 46

Results vs Mongo: Query

Type Scanned (h) Devices metrics Timescale query time (ms) Mongo Multiplier groupby-hour 24 all 1 29,968 7.63 groupby-hour 24 all 5 39,157 5.51 groupby-hour 24 all 10 49,058 5.50 high-cpu 24 all 1 51,323 5.24 cpu-max 12 8 10 260 1.14 cpu-max 12 1 10 28 1.32 high-cpu 24 1 1 95 1.17 groupby-minute 12 1 1 62 0.77 groupby-minute 1 8 1 28 1.95 groupby-minute 12 1 5 69 0.94 groupby-minute 1 8 5 26 2.17 lastpoint all all 10 453 101.72 groupby-orderby-limit all all 1 149 1667.25

Higher multiplier indicates worse Mango performance.

SLIDE 47

Insert metrics/s Mongo

807K

Timescale 994K

TimescaleDB

vs. Mongo

SLIDE 48

Lessons for DB designers

Concentrate on scale-up
Consider (insights!):
Partitioning even on single-node
Even across disks
Performance analysis
High-level: Query optimization
Low-level: Profiling
High-availability is possible on single-node

SLIDE 49

Lessons for DB users

Absolute performance as important as scaling numbers.
Don’t go horizontally distributed unless you have to.
HA not same as horizontal scalability.
Replication cheaper than distribution.
SQL and ACID are extremely useful.

SLIDE 50

Open Source (Apache 2.0)

github.com/timescale/timescaledb

Join the Community

slack.timescale.com

SLIDE 51

Timescale is hiring!

Core Database Engineers
R&D Engineers
Solutions Engineers
Evangelists
Customer Success

careers.timescale.com

How computing used to look

How computing looks today

What happened?

scaling up to scaling out

Why the high-latency?

Network Latencies

Consensus is Expensive

But that was 10 years ago

And data needs have become real-time.

Shorter latency requirements mean we need to focus again on scaling up

So how do we improve single-node performance?

Insight #1 Partition, even on a single machine.

144K

14.4K

>20x

1.11M

Other benefits to partitioning

Insight #2 Partition, even on a single machine across many disks

Insight #3 Query Optimizations

Insight #4 HA != Horizontally-distributed

Results

TimescaleDB vs. Cassandra (5 nodes)

TimescaleDB

Open Source (Apache 2.0)

Join the Community

Timescale is hiring!

Shorter latency requirements mean   we need to focus again on scaling up

So how do we improve single-node performance? 

Insight #2 Partition, even on a single machine  across many disks

TimescaleDB vs. Cassandra  (5 nodes)