Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere - - PowerPoint PPT Presentation

monitoring kubernetes with prometheus
SMART_READER_LITE
LIVE PREVIEW

Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere - - PowerPoint PPT Presentation

Monitoring Kubernetes with Prometheus Henri Dubois-Ferriere @henridf Percona Live, 2018-11-06 Hello. Henri Dubois-Ferriere Technical Director, Sysdig Doing observability for many many years, from network to web apps via many startups.


slide-1
SLIDE 1

Percona Live, 2018-11-06

Monitoring Kubernetes with Prometheus

Henri Dubois-Ferriere @henridf

slide-2
SLIDE 2

Hello.

Henri Dubois-Ferriere Technical Director, Sysdig

Doing “observability” for many many years, from network to web apps via many startups. PhD in CS from EPFL Repatriate from San Francisco to Switzerland

slide-3
SLIDE 3

Outline

  • Kubernetes
  • Prometheus
  • Kubernetes metrics & sources
  • Deployment
slide-4
SLIDE 4

Monitor why?

  • Know about outages before users tell me
  • Understand my production environment (or try…)
  • Plan/trend/forecast
slide-5
SLIDE 5

Kubernetes

slide-6
SLIDE 6

Kubernetes

  • Container orchestration system
  • aka “OS for your cluster”
  • Abstracts away the underlying infra
  • declarative APIs with control loops
slide-7
SLIDE 7

https://commons.wikimedia.org/wiki/File:Kubernetes.png

slide-8
SLIDE 8

Prometheus

slide-9
SLIDE 9

Prometheus

❏ Started at SoundCloud in 2012 ❏ Motivated by challenges with monitoring dynamic environments ❏ Made public 2015, now second CNCF “graduate”

slide-10
SLIDE 10

More than a TSDB

https://prometheus.io/assets/architecture.png

slide-11
SLIDE 11

It’s all about the pull

  • Prom scrapes targets to get metrics
  • Nice side effect: know when target down
  • Needs to know what to scrape
slide-12
SLIDE 12

What should Prometheus scrape?

  • Service discovery provides answer
  • Azure, Consul, GCE, K8S, EC2, ...
  • Can also watch a file containing target list
slide-13
SLIDE 13

Dimensional data model

Query: http_requests_total{code=”200”, method=”get”}

Selector (aka filter) Metric name

slide-14
SLIDE 14

Query:

http_requests_total{code=”200”, method=”get”}

Response: http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920

Label/value pairs (aka dimensions)

Dimensional data model

slide-15
SLIDE 15

Query:

http_requests_total{code=”200”, method=”get”}

Response: http_requests_total{code="200", method=”get”, route="/api/users"} 1528706829.115 1741 http_requests_total{code="200", method=”get”, route="/api/objects"} 1528706829.115 1920

Timestamp value

Dimensional data model

slide-16
SLIDE 16

Metadata discovery

  • SD also provides metadata
  • Metadata can be mixed in with metrics
  • Powerful relabelling feature for label manipulation at

ingest

slide-17
SLIDE 17

Instrumentation

slide-18
SLIDE 18

Off-the-shelf or write your own

slide-19
SLIDE 19

Kubernetes metrics

slide-20
SLIDE 20

Monitoring resources and methods

  • For resources like memory, queues, CPUs, disks…
  • USE Method: Utilization, Saturation, Errors
  • http://www.brendangregg.com/usemethod.html
  • For services
  • “RED” Method: Request rate, Error rate, Duration
  • https://www.weave.works/blog/the-red-method-key-metrics-for-micr
  • services-architecture/
slide-21
SLIDE 21
  • Host metrics
  • CPU
  • Memory
  • Disk
  • Network
  • ...
  • Not K8S specific, but useful as referential and for totals

node_exporter: node metrics

slide-22
SLIDE 22
  • Runs in kubelet (usually, for now..)
  • Resource stats about running containers
  • Mostly container and node-level labels…
  • (k8s: plus namespace and pod_name)

cAdvisor: container metrics

slide-23
SLIDE 23

Sample cAdvisor metric queries

Percent of total cluster memory used:

sum(container_memory_rss) / sum(machine_memory_bytes)

Memory used by kubernetes namespace:

sum(container_memory_rss) by (namespace)

Top 5 pods by network I/O:

topk(5, sum by (pod_name) (rate(container_network_transmit_bytes_total[5m])))

slide-24
SLIDE 24

$ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...

Kube-state metrics

slide-25
SLIDE 25

$ kubectl get deploy my-app -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: my-app ... spec: replicas: 4 ... status: replicas: 4 ...

Kube-state metrics

kube_deployment_spec_replicas{deployment="my-app", ...} Metrics created by kube-state-metrics With label set from this deployment kube_deployment_status_replicas{deployment="my-app", ...}

slide-26
SLIDE 26

Sample kube-state-metrics queries

Deployments with issues

kube_deployment_spec_replicas != kube_deployment_status_replicas_available

Top 10 longest-running pods (“reverse uptime”)

topk(10, sort_desc(time() - kube_pod_created))

slide-27
SLIDE 27
  • API Server
  • etcd3
  • kube-dns
  • scheduler, controller-manager

Kube core service metrics

slide-28
SLIDE 28

Metrics recap

Deployment mode How many Metrics about node_exporter daemonset 1 per node node resources cAdvisor inside kubelet 1 per node container resources kube-state-metrics deployment singleton k8s object state etcd, Api Server, controller manager, ... core service singleton or HA group Itself

slide-29
SLIDE 29

Deploying

slide-30
SLIDE 30
  • Monitoring runs inside thing being monitored?
  • Yes. It’s fine really. Really, it’s fine.
  • (And being outside has own challenges)

Monitoring from the inside

slide-31
SLIDE 31
  • Metrics services
  • node_exporter
  • kube-state-metrics
  • (cAdvisor usually enabled out of box)
  • Prometheus running
  • Storage
  • Read access to API server (for service discovery)
  • Service discovery config for above
  • Service discovery config for apps/services

Deployment outline

slide-32
SLIDE 32

helm fetch stable/prometheus vi prometheus/values.yaml # configure install helm upgrade -i # or manually deploy yaml

Helm-based install

slide-33
SLIDE 33

Prometheus operator

  • Use Kubernetes API facilities to make Prometheus “native”
  • new Prometheus-related objects: `kubectl get prometheus`
  • PrometheusRule, ServiceMonitor, AlertManager,

AlertingSpec, ...

  • Prometheus configuration abstracted via all these objects
  • Young but promising
  • Consider more direct route first (hand-rolled or Helm), and Operator once

more familiar with challenges of direct route

slide-34
SLIDE 34

Thank You.

Henri Dubois-Ferriere @henridf

slide-35
SLIDE 35

Pointers

  • Prometheus SD for Kubernetes:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

  • KSM metrics: https://github.com/kubernetes/kube-state-metrics/tree/master/Documentation
  • Prometheus Helm chart: https://github.com/helm/charts/tree/master/stable/prometheus
  • Prometheus operator: https://github.com/coreos/prometheus-operator
  • “A deep dive into Kubernetes metrics” blog series:

https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-66936addedae