Implementing a Cooperative Multi-Tenant Capable Prometheus Users: - - PowerPoint PPT Presentation

implementing a cooperative multi tenant capable
SMART_READER_LITE
LIVE PREVIEW

Implementing a Cooperative Multi-Tenant Capable Prometheus Users: - - PowerPoint PPT Presentation

Jonas Groe Sundrup 10.08.2018 PromCon 2018 Implementing a Cooperative Multi-Tenant Capable Prometheus Users: run small-scale infrastructure arent monitoring experts 1 Target users Goal: Share monitoring infrastructure


slide-1
SLIDE 1

Implementing a Cooperative Multi-Tenant Capable Prometheus

Jonas Große Sundrup 10.08.2018

PromCon 2018

slide-2
SLIDE 2

Target users

Users:

  • run small-scale infrastructure
  • aren’t monitoring experts

Goal: Share monitoring infrastructure

1

slide-3
SLIDE 3

Goal

Drop-dead-simple fjre-and-forget low-resource monitoring/alerting solution that just works

2

slide-4
SLIDE 4

Roadmap

  • 1. Prometheus
  • 2. Alertmanager
  • 3. System Architecture
  • 4. Additional Services

3

slide-5
SLIDE 5

Requirements

  • One Prometheus per machine
  • Multi-tenancy
  • Ansible-compatibility
  • patch-free operation

4

slide-6
SLIDE 6

Getting data into Prometheus

slide-7
SLIDE 7

Deploying scrape targets

~/scrptarg ├── node1.yml ├── node2.yml ├── db.yml └── mail.yml

5

slide-8
SLIDE 8

Deploying scrape targets

~/scrptarg ├── node1.yml ├── node2.yml ├── db.yml └── mail.yml

  • job_name: alice

file_sd_configs:

  • files:
  • /home/alice/scrptarg/*.json
  • /home/alice/scrptarg/*.yml
  • /home/alice/scrptarg/*.yaml

refresh_interval: 5m scheme: https basic_auth: username: prometheus password: <secret>

6

slide-9
SLIDE 9

Deploying rules

~/rules ├── normal.yml └── the-apocalypse.yml

7

slide-10
SLIDE 10

Getting data out of Prometheus

slide-11
SLIDE 11

up

8

slide-12
SLIDE 12

up{job=“jonas”}

9

slide-13
SLIDE 13

up{job=“jonas”}

10

slide-14
SLIDE 14

Injecting a label

  • your_metric{job="hello",instance="mybox"}
  • your_metric{instance="mymachine"}
  • your_metric/my_metric{code="42"}
  • avg_over_time(your_metric[1h]) offset 1d

11

slide-15
SLIDE 15

Injecting a label

  • your_metric{job="hello",instance="mybox"}
  • your_metric{instance="mymachine"}
  • your_metric/my_metric{code="42"}
  • avg_over_time(your_metric[1h]) offset 1d

11

slide-16
SLIDE 16

Injecting a label

  • your_metric{job="hello",instance="mybox"}
  • your_metric{instance="mymachine"}
  • your_metric/my_metric{code="42"}
  • avg_over_time(your_metric[1h]) offset 1d

11

slide-17
SLIDE 17

Injecting a label

  • your_metric{job="hello",instance="mybox"}
  • your_metric{instance="mymachine"}
  • your_metric/my_metric{code="42"}
  • avg_over_time(your_metric[1h]) offset 1d

11

slide-18
SLIDE 18

A PromQL-parser

12

slide-19
SLIDE 19

Retrieving Data in PromQL

  • sum(http_requests_total{code="200"})
  • avg_over_time(your_fancy_metric[1h])

13

slide-20
SLIDE 20

Rules

rules:

  • alert: Endpoint down

expr: up == 0 for: 10m annotations: summary: "Must be short, slide isn't wide"

14

slide-21
SLIDE 21

Rules

rules:

  • alert: jonas: Endpoint down

expr: up{job="jonas"} == 0 for: 10m labels: job: jonas annotations: summary: "Must be short, slide isn't wide"

15

slide-22
SLIDE 22

Alertmanager

slide-23
SLIDE 23

Alertmanager

16

slide-24
SLIDE 24

Alertmanager

17

slide-25
SLIDE 25

Silences

18

slide-26
SLIDE 26

Architecture overview

PAP PAP

Alert- manager Blackbox- Exporter

19

slide-27
SLIDE 27

Blackbox-Exporter

slide-28
SLIDE 28

Blackbox-Targets

~/blackboxtargets/ ├── http_2xx_ipv4 │ └── websites.yml └── tcp_connect_ipv6 └── tcp.yml

20

slide-29
SLIDE 29

Blackbox-Exporter: Confjguration

  • job_name: jonas-blackbox-http_2xx-ipv4

params: module:

  • http_2xx_ipv4

metrics_path: /probe file_sd_configs:

  • files:
  • /home/jonas/blackbox/http_2xx_ipv4/*.json
  • /home/jonas/blackbox/http_2xx_ipv4/*.yml
  • /home/jonas/blackbox/http_2xx_ipv4/*.yaml

refresh_interval: 5m relabel_configs: ...

21

slide-30
SLIDE 30

Blackbox-Exporter: Relabelling

relabel_configs: ...

  • target_label: job

replacement: jonas action: replace ...

22

slide-31
SLIDE 31

Blackbox-Exporter: Relabelling

relabel_configs: ...

  • target_label: blackbox_module

replacement: http_2xx action: replace

  • target_label: ip_version

replacement: ipv4 action: replace

23

slide-32
SLIDE 32

Conclusion

Features:

  • User separation: yes
  • low memory profjle: yes
  • Ansible-compatibility: yes
  • Alerting: yes
  • Ease of use: yes

Limitations:

  • No resource isolation
  • Only one set of target

credentials

  • Preselected Featureset

24

slide-33
SLIDE 33

https://github.com/cherti/promauthproxy

jonas@grosse-sundrup.com

25