TSDB: 1 year in Ganesh Vernekar November, 2019 About me Ganesh - - PowerPoint PPT Presentation

tsdb 1 year in
SMART_READER_LITE
LIVE PREVIEW

TSDB: 1 year in Ganesh Vernekar November, 2019 About me Ganesh - - PowerPoint PPT Presentation

TSDB: 1 year in Ganesh Vernekar November, 2019 About me Ganesh Vernekar Software Engineer, Grafana Labs Prometheus GSoC18 student Prometheus Member, TSDB Maintainer Graduated this year from IIT Hyderabad @_codesome


slide-1
SLIDE 1

Ganesh Vernekar

November, 2019

TSDB: 1 year in

slide-2
SLIDE 2

TSDB: 1 year in | 2

About me

Ganesh Vernekar

Software Engineer, Grafana Labs Prometheus GSoC’18 student Prometheus Member, TSDB Maintainer Graduated this year from IIT Hyderabad @_codesome ganesh@grafana.com

slide-3
SLIDE 3

TSDB: 1 year in | 3

What is TSDB (Time Series DataBase)?

  • Storage engine of Prometheus 2.x
  • Independent repo in the past prometheus/tsdb
  • Now a part of Prometheus repo, inside tsdb directory
slide-4
SLIDE 4

TSDB: 1 year in | 4

Statistics

  • 500+ commits since Prometheus 2.0 release
  • 60+ contributors
  • 771 stars before archiving
slide-5
SLIDE 5

Some selected features/enhancements

TSDB: 1 year in | 5

slide-6
SLIDE 6

TSDB: 1 year in | 6

Backfilling

  • Issue before: cannot have overlapping blocks
  • Vertical query merging and compaction in Feb. 2019
slide-7
SLIDE 7

TSDB: 1 year in | 7

Backfilling

  • No recommended way to backfill yet
  • Community is taking it forward
slide-8
SLIDE 8

TSDB: 1 year in | 8

WAL compression

  • Optional snappy compression for WAL
  • Can save up to 50% of WAL size without compromising on

performance

slide-9
SLIDE 9

TSDB: 1 year in | 9

WAL read optimizations

slide-10
SLIDE 10

TSDB: 1 year in | 10

WAL read optimizations

slide-11
SLIDE 11

TSDB: 1 year in | 11

WAL read optimizations

slide-12
SLIDE 12

TSDB: 1 year in | 12

Allocation/memory optimization for compaction

slide-13
SLIDE 13

TSDB: 1 year in | 13

Allocation/memory optimization for compaction

35% allocations 19% allocations

slide-14
SLIDE 14

TSDB: 1 year in | 14

Allocation/memory optimization for compaction

6.5% allocations (Don’t have the numbers)

slide-15
SLIDE 15

TSDB: 1 year in | 15

Various optimizations for the queries

slide-16
SLIDE 16

TSDB: 1 year in | 16

Various optimizations for the queries

{foo=~”bar|baz”} => {foo=”bar”} or {foo=”baz”} Make Grafana dashboard queries faster

slide-17
SLIDE 17

TSDB: 1 year in | 17

Various optimizations for the queries

slide-18
SLIDE 18

TSDB: 1 year in | 18

Reuse Chunk Iterators

slide-19
SLIDE 19

TSDB: 1 year in | 19

Reuse Chunk Iterators

slide-20
SLIDE 20

TSDB: 1 year in | 20

Reuse Chunk Iterators

slide-21
SLIDE 21

TSDB: 1 year in | 21

Analyse churn

slide-22
SLIDE 22

TSDB: 1 year in | 22

Read Only TSDB

  • Interface/implementation for Read Only Mode TSDB
  • Safe to use this interface with a live tsdb
  • TSDB cli already uses it
slide-23
SLIDE 23

Upcoming features/enhancements

TSDB: 1 year in | 23

slide-24
SLIDE 24

TSDB: 1 year in | 24

Lifting index size limit

slide-25
SLIDE 25

TSDB: 1 year in | 25

Lifting index size limit

  • 32 bit postings currently (reference of series in index)

○ Index size is limited to 64 GiB

  • Not difficult to hit the limit

○ 8M series with block spanning 9 days - 20 GiB

https://grafana.com/blog/2019/10/31/lifting-the-index-size-limit-of-prometheus-with-postings-compression/

slide-26
SLIDE 26

TSDB: 1 year in | 26

64 bit postings

GSoC 2019 work by Alec Wang (GH: naivewong)

  • 64 bit postings - practically unlimited index size
  • Use prefix compression to store postings (48 bit prefix)
  • Same performance and no increase in index size

011001 011010 101100 101101 101110 101111 0110 01 10 1011 00 01 10 11

https://grafana.com/blog/2019/10/31/lifting-the-index-size-limit-of-prometheus-with-postings-compression/

slide-27
SLIDE 27

TSDB: 1 year in | 27

Isolation

  • TSDB has A C and D from ACID, but not I

○ Some progress in the past by Brian and Goutham

  • Temporarily abandoned after that
  • Plan is to add it this year

○ Rebase the work of Brian and Goutham :P

slide-28
SLIDE 28

TSDB: 1 year in | 28

Improved checkpointing of WAL

Potential speedup of WAL replay by a big factor

slide-29
SLIDE 29

Thanks! Questions?

@_codesome