monitoring kubernetes with omd labs edition and prometheus
play

Monitoring Kubernetes with OMD Labs Edition and Prometheus - PowerPoint PPT Presentation

Monitoring Kubernetes with OMD Labs Edition and Prometheus Michael Kraus - FOSDEM 2017 About me Doing monitoring for 12 years, mainly with plain old Nagios, open-source only. About me Michael Kraus Senior Monitoring Consultant @ ConSol.


  1. Monitoring Kubernetes with OMD Labs Edition and Prometheus Michael Kraus - FOSDEM 2017

  2. About me

  3. Doing monitoring for 12 years, mainly with plain old Nagios, open-source only. About me Michael Kraus Senior Monitoring Consultant @ ConSol.

  4. Background

  5. Implementation of Kubernetes PoC at $customer : Why We have … already running some ● Kubernetes in a monitoring instances there. classical enterprise but no idea about ● monitoring Kubernetes.

  6. Natural choice for kubernetes monitoring: Integrated service ● With discovery Labels are retained ● Enter Prometheus between Kubernetes and Prometheus

  7. There are excellent tutorials and blog posts available as a starting point, for example by coreos.com/blog/ ● How ( Fabian Reinartz ) robustperception.io/blog/ ● Where to start ( Brian Brazil ) … many examples on ● GitHub

  8. Implementation

  9. ● kubernetes_sd_configs - role: endpoints ● kubernetes_sd_configs - role: node ● Implementation kubernetes_sd_configs - role: pod Prometheus kubernetes_sd prometheus-kubernetes.yml from prometheus/examples.

  10. Metrics: ● apiserver_* ● container_cpu_* Implementation ● container_fs_* ● deployment_* Prometheus ● etcd_* kubernetes_sd ● kubelet_* ● ...

  11. Prometheus exporter for hardware and OS metrics exposed by the kernel. Implementation node_exporter ● DaemonSet ● prometheus.io/scrape: 'true'

  12. Metrics: ● node_cpu ● node_disk_* Implementation ● node_filesystem_* ● node_netstat_* node_exporter ● node_vmstat_* ● ...

  13. “... focused … on the health of the various objects inside, such as deployments, nodes and pods.” Implementation kube-state-metrics ● prometheus.io/scrape: 'true'

  14. Metrics: ● kube_deployment_* Implementation ● kube_node_* ● kube_pod_* kube-state-metrics ● kube_resource_quota ● ...

  15. Based on minikube: github.com/ kubernetes/minikube Implementation Demo environment Sample config: github.com/ m-kraus/kubernetes-monitoring

  16. Demo

  17. What we also need: persistent storage ● Implementation Alertmanager ● Grafana ● What else? Pushgateway ● ... ●

  18. But we have that already

  19. Monitoring in one package. completely open-source ● Classical based on Nagios / Icinga ● bundles “best practices” of monitoring ● many years of experience OMD Labs Edition no root required ● "Musterlösung" at $customer for monitoring projects:

  20. Nagios Icinga1 Icinga2 Shinken Naemon Thruk Mod-Gearman PNP4Nagios LMD NagVis Apache MySQL InfluxDB Nagflux Classical Prometheus Dokuwiki monitoring Grafana FreeTDS JMX4Perl check_webinject check_logfiles OMD Labs Edition Jolokia check_mysql_health coshsh check_mssql_health rrdcache check_nsc_web check_curly check_nwc_health check_multi check_oracle_health Ansible

  21. omd create <MYSITE> Classical omd cp <PROD> <STAGE> monitoring OMD sites and commads omd update <STAGE> omd version

  22. omd create <MYSITE> omd cp <PROD> <STAGE>

  23. Classical monitoring https://labs.consol.de/omd/ OMD Labs Edition

  24. Why not scrape Kubernetes directly from OMD: hard to access pods inside ● Implementation Kubernetes hard to access API from ● Connecting OMD outside Kubernetes API secured via TLS and ● token only (easily) available from a serviceaccount

  25. Getting the metrics from Kubernetes to OMD: federation ● Implementation - job_name: 'kube_federation' Connecting OMD metrics_path: '/federate' honor_labels: true params: 'match[]': -'{job=~"^kubernetes.+"}'

  26. OMD

  27. Demo

  28. Issues

  29. “... Not quite the purpose of federation.” Brian Brazil Issues www.robustperception.io/ federation-what-is-it-good-for/ Federation Let’s try it anyway ... ●

  30. "Accessing metrics without authentication is ok for a PoC, but not allowed in production..." Issues internal audit Securing How to secure (federated) ● Prometheus?

  31. "Should Nagios, Alertmanager or both notify?" “Do we need to define our checks and alerts both, in Issues Nagios and Prometheus?” Integration How to route alerts ● How to ease or centralize ● configuration

  32. “How can we store (some) of our graphs for a longer period of time?” Issues Long-term storage InfluxDB ? ●

  33. “Our kubernetes cluster died. We had no monitoring until it up again...” operations team Issues external monitoring of ● Coverage crucial components machine health ○ important services ○ important API queries ○

  34. Thanks for watching

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend