What the heck is time-series data (and why do I need a time-series - - PowerPoint PPT Presentation

what the heck is time series data
SMART_READER_LITE
LIVE PREVIEW

What the heck is time-series data (and why do I need a time-series - - PowerPoint PPT Presentation

What the heck is time-series data (and why do I need a time-series database?) Ajay Kulkarni | Co-founder/CEO | ajay@timescale.com Fastest growing database category Source: DB Engines In this talk 1. What is time-series data? (hint: its


slide-1
SLIDE 1

What the heck is time-series data

(and why do I need a time-series database?)

Ajay Kulkarni | Co-founder/CEO | ajay@timescale.com
slide-2
SLIDE 2

Fastest growing database category

Source: DB Engines
slide-3
SLIDE 3

In this talk

  • 1. What is time-series data?


(hint: it’s not what you think)

  • 2. Why do I need a time-series

database?

  • 3. Is this just a fad?
slide-4
SLIDE 4

What is time-series data?

slide-5
SLIDE 5

Q: Metrics and Logging?

CPU, free memory, gc pauses, error reports, application instrumentation, etc.

slide-6
SLIDE 6

Q: Financial data?

Stock tick stream, payment records, transaction records
slide-7
SLIDE 7

Q: Event data?

Clickstreams, application events,
  • utages, errors, system status
slide-8
SLIDE 8

Q: IoT data?

Sensor data, machine data, industrial monitoring, smart home, wearables
slide-9
SLIDE 9

Q: Other data?

Logistics tracking, environmental monitoring
slide-10
SLIDE 10

A: All of the above

slide-11
SLIDE 11

So what is time-series data?

slide-12
SLIDE 12

Time-series data has 3 characteristics

  • Capturing and analyzing
measurements/events
  • ver time.
  • 1. Time-centric data
  • Workloads generally
write new data. Rarely update.
  • 2. Primarily

INSERTS

  • Data generally written to
most recent time interval (although delays possible).
  • 3. Writes to recent

interval

slide-13
SLIDE 13

How is this different than having a time field?

Treat changes as inserts, not overwrites.

slide-14
SLIDE 14

You can do more with time-series data

PAST

  • Analyze historical trends.
  • Look at the state of the
system at any point in time.

PRESEN T

  • Real-time monitoring
  • Troubleshoot
problems as they
  • ccur

FUTURE

  • Identify and fix
problems before they
  • ccur, reducing
downtime.
slide-15
SLIDE 15

What does time-series data look like?

(hint: it’s not what you think)

slide-16
SLIDE 16

What you have been told

Name Tags Data CPU Host=Name,Region=West 1990-01-01 01:02:00 70
 1990-01-01 01:03:00 71 1990-01-01 01:04:00 72 1990-01-01 01:04:00 73 1990-01-01 01:04:00 100

slide-17
SLIDE 17

What you have been told

Name Tags Data CPU Host=Name,Region=West 1990-01-01 01:02:00 70
 1990-01-01 01:03:00 71 1990-01-01 01:04:00 72 1990-01-01 01:04:00 73 1990-01-01 01:04:00 100 FreeMem Host=Name,Region=West 1990-01-01 01:02:00 800M
 1990-01-01 01:03:00 600M 1990-01-01 01:04:00 400M 1990-01-01 01:04:00 200M 1990-01-01 01:04:00 0

2 time-series?

slide-18
SLIDE 18

This is wrong

slide-19
SLIDE 19

Time-series data has a richer structure

Tags Data Host=Name,Region=Wes t 1990-01-01 01:02:00
 1990-01-01 01:03:00 1990-01-01 01:04:00 1990-01-01 01:04:00 1990-01-01 01:04:00 CPU 70 71 72 73 100 MemFree 800M 600M 400M 200M Temp 80 81 82 83 120

slide-20
SLIDE 20

Fewer queries

Tags Data Host=Name,Region=Wes t 1990-01-01 01:02:00
 1990-01-01 01:03:00 1990-01-01 01:04:00 1990-01-01 01:04:00 1990-01-01 01:04:00 CPU 70 71 72 73 100 MemFree 800M 600M 400M 200M Temp 80 81 82 83 120

select * where time = x

slide-21
SLIDE 21

Complex filters

Tags Data Host=Name,Region=Wes t 1990-01-01 01:02:00
 1990-01-01 01:03:00 1990-01-01 01:04:00 1990-01-01 01:04:00 1990-01-01 01:04:00 CPU 70 71 72 73 100 MemFree 800M 600M 400M 200M Temp 80 81 82 83 120

where temp > 100

slide-22
SLIDE 22

Complex aggregates

Tags Data Host=Name,Region=Wes t 1990-01-01 01:02:00
 1990-01-01 01:03:00 1990-01-01 01:04:00 1990-01-01 01:04:00 1990-01-01 01:04:00 CPU 70 71 72 73 100 MemFree 800M 600M 400M 200M Temp 80 81 82 83 120

avg(mem_free) group by (cpu/10)

slide-23
SLIDE 23

Correlations

Tags Data Host=Name,Region=Wes t 1990-01-01 01:02:00
 1990-01-01 01:03:00 1990-01-01 01:04:00 1990-01-01 01:04:00 1990-01-01 01:04:00 CPU 70 71 72 73 100 MemFree 800M 600M 400M 200M Temp 80 81 82 83 120 how does temperature correlate with mem_free?

slide-24
SLIDE 24

Leverage relations

Data 1990-01-01 01:02:00
 1990-01-01 01:03:00 1990-01-01 01:04:00 1990-01-01 01:04:00 1990-01-01 01:04:00 CPU 70 71 72 73 100 Host 1 2 3 4 5 Region stored in separate host metadata table Region 91 92 93 94 95

slide-25
SLIDE 25

How to store time-series data

slide-26
SLIDE 26

You can, and some people do

Non time-series Purpose-built for time-series 0 % 15 % 30 % 45 % 60 % 58 % 42 %

Can’t I use a “normal” database?

Source: Percona
slide-27
SLIDE 27

Golden age of time-series databases

slide-28
SLIDE 28

Why do I need a specialized time-series database?

slide-29
SLIDE 29

25GB

data collected per hour by connected cars (McKinsey) “Our Boeing 787s generate half a terabyte of data per flight”

  • Virgin Atlantic IT director

Problem: Time-series data piles up very quickly

slide-30
SLIDE 30

Time-series databases introduce efficiencies by treating time as a first-class citizen.

slide-31
SLIDE 31 ✓ Primarily INSERTs ✓ Writes to recent time interval ✓ Writes associated with a timestamp and primary key

✗ Primarily UPDATEs ✗ Writes randomly distributed ✗ Transactions to multiple

primary keys

Time Series OLTP

slide-32
SLIDE 32

Time-series databases introduce efficiencies

  • 1. Better write rates to handle ingest scale.
  • 2. Query performance, even at scale.
  • 3. Ease of use via common functions (e.g.,

interpolation)

slide-33
SLIDE 33

Is this just a fad? (No.)

slide-34
SLIDE 34

Why time-series databases will continue to be popular

Operational needs

  • Managing increasingly
complex systems requires: 
 
 real-time monitoring, troubleshooting, 
 better prediction.

Business needs

  • Constant need to make
better data-driven decision faster.
  • More sources of data:
new devices, old devices coming online, new systems.

Tech trends

  • Cheaper storage,
faster processors, more bandwidth
  • Better resources:
cloud computing, data analysis tools
slide-35
SLIDE 35

Crazy idea: Is all data time-series data?

slide-36
SLIDE 36

https://github.com/timescale/timescaledb