& Bartomiej Potka & Tom Wilkie PromCon 2019 Started by - - PowerPoint PPT Presentation

bart omiej p otka tom wilkie promcon 2019 started by
SMART_READER_LITE
LIVE PREVIEW

& Bartomiej Potka & Tom Wilkie PromCon 2019 Started by - - PowerPoint PPT Presentation

Two Households, Both Alike in Dignity & Bartomiej Potka & Tom Wilkie PromCon 2019 Started by Fabian Reinartz and Bartomiej Potka on Dec 2017 Joined CNCF sandbox in Aug 2019 https://thanos.io Started by Tom Wilkie and Julius


slide-1
SLIDE 1

&

Two Households, Both Alike in Dignity

Bartłomiej Płotka & Tom Wilkie PromCon 2019

slide-2
SLIDE 2

Started by Tom Wilkie and Julius Volz in June 2016 Joined CNCF sandbox Sept 2018 https://github.com/cortexproject/cortex Started by Fabian Reinartz and Bartłomiej Płotka on Dec 2017 Joined CNCF sandbox in Aug 2019 https://thanos.io

slide-3
SLIDE 3

When monitoring a global fleet with Prometheus, I need...

  • 1. Global View
  • 2. Multi-Replica Prometheus (HA)
  • 3. Long Term Storage
slide-4
SLIDE 4

#1 Global View

Queries over data from multiple Prometheus servers

slide-5
SLIDE 5

Thanos: Fanout Queries

#1 Prometheus in each remote cluster has Thanos sidecar. #2 Stateless Querier anywhere fanouts query to certain Prometheuses. #3 Queries see all data. us-west us-east eu-west pull

slide-6
SLIDE 6

Cortex: Centralised Data

#1 Prometheus in separate clusters remote writes metrics. #2 Scalable Cortex cluster stores metrics from multiple Prometheus servers. #3 Queries go to central cluster, cover all data. us-west us-east eu-west push

slide-7
SLIDE 7

#1 Global View Data stays in Prometheus; Fanout query; Centrally write data to a scalable Cortex cluster; query in one place.

slide-8
SLIDE 8

#2 Multi-Replica Prometheus (HA)

No gaps in the graphs caused by Prometheus server restarts

slide-9
SLIDE 9

Thanos: Query time deduplication

#1 Each Prometheus replica scraping the same targets has Thanos sidecar. #2 Thanos Querier resolve gaps in query time. #3 Queries only ever see a single version of each series. us-west-a us-west-b

slide-10
SLIDE 10

Cortex: Resolve Gaps at Write Time

#1 Both Prometheus instances in each cluster remote-write metrics to Cortex. #2 Cortex dedupes samples on ingestion,

  • nly storing data from a

single Prometheus. #3 Queries only ever see a single version of each series. us-west-a us-west-b

slide-11
SLIDE 11

#1 Global View Data stays in Prometheus; Fanout query; Centrally write data to a scalable Cortex cluster; query in one place. #2 Multi-Replica Prometheus (HA) Resolve gaps at query time;

  • nly renders single series

Resolve gaps at write time;

  • nly store single series.
slide-12
SLIDE 12

#3 Long Term Storage

Store data for long term analysis

slide-13
SLIDE 13

#1 Sidecar syncs TSDB blocks with Object Storage

Thanos: TSDB blocks in object store

#2 Thanos allows browsing uploaded blocks, compacting index and downsampling #3 Queriers have access to both fresh and old data

slide-14
SLIDE 14

Cortex: NOSQL index & chunks

#1 Samples from Prometheus are batched up into XOR Chunks in Cortex. #2 Chunks are periodically flushed to an

  • bject store, and an inverted index over the

chunks is written to a NOSQL database. #3 Queries use the index in NOSQL to find relevant chunks, with heavy use of caches.

slide-15
SLIDE 15

#1 Global View Data stays in Prometheus; Fanout query; Centrally write data to a scalable Cortex cluster; query in one place. #2 Multi-Replica Prometheus (HA) Resolve gaps at query time;

  • nly renders single series

Resolve gaps in write time;

  • nly store single series.

#3 Long Term Storage TSDB blocks in object storage NOSQL for index & chunks in object storage

slide-16
SLIDE 16

Future

slide-17
SLIDE 17

Increased Collaboration (I)

Cortex query-frontend can be put in front of Thanos to accelerate queries using parallelisation and caching.

https://grafana.com/blog/2019/09/19/how-to-get-blazin-fast-promql/

slide-18
SLIDE 18

Increased Collaboration (II)

Cortex now embeds Thanos’s code to read & write blocks from

  • bject store for LTS, reduced

dependencies and TCO.

https://github.com/cortexproject/cortex/pull/1695

slide-19
SLIDE 19

Thanks! Questions?

https://thanos.io https://github.com/cortexproject/cortex