hawkular metrics
play

Hawkular Metrics Metric Storage & Alerting Stefan Negrea About - PowerPoint PPT Presentation

Hawkular Metrics Metric Storage & Alerting Stefan Negrea About Me Co-Creator of Hawkular Metrics 2 Hawkular Metrics Hawkular Demo & Alerting Introduction to Hawkular Metrics 3 Pre-History 2006 JBoss Operations Network 1.0


  1. Hawkular Metrics Metric Storage & Alerting Stefan Negrea

  2. About Me Co-Creator of Hawkular Metrics 2

  3. Hawkular Metrics Hawkular Demo & Alerting Introduction to Hawkular Metrics 3

  4. Pre-History ● 2006 JBoss Operations Network 1.0 ● 2008 Project RHQ ○ JBoss Operations Network 2.0 ○ Metrics stored in Postgres 4

  5. Pre-History 5

  6. Pre-History ● 2012 - 2013 RHQ Storage Nodes ○ Cassandra based ○ Store metrics ● 2014 RHQ Metrics 6

  7. = + Hawkular It’s a hawk with a monocular. Hawks are known to have a very sharp vision and very good hunters, they can catch preys anticipating their movements at a very fast speed. The goal is to be able to monitor and catch anomalies in fast pace environments. All* projects are Apache License 2.0 7

  8. History ● 2014 Hawkular organization formed ● 2014 Hawkular Alerting started ● 02/2015 RHQ Metrics joins Hawkular org ● 12/2015 Hawkular Metrics integrated in OpenShift Origin v3 ● 10/2016 Hawkular Metrics includes Hawkular Alerting 8

  9. Hawkular Metrics Hawkular Metrics is a storage engine for metric data metric data = a measurement taken at a specific time storage engine = store metrics efficiently for their useful lifetime 9

  10. Supported Metrics Memory usage ● Gauge (metric1, 4.5, 1493301898245) ○ number (metric1, 5.6, 1493301898246) (metric1, 1.2, 1493301898247) ○ varies (not monotonic) ○ rate of change Number of visitors ● Counter (metric2, 4, 1493301898248) ○ integer (metric2, 5, 1493301898249) (metric2, 9, 1493301898250) ○ monotonic (increasing or decreasing (metric2, 0, 1493301898251) ○ rate of change ○ support for reset 10

  11. Supported Metrics Server status ● Availability (metric3, UP, 1493301898253) ○ Availability of a resource (metric3, DOWN, 1493301898254) (metric3, UP, 1493301898255) ○ up, down, or unknown ○ can compute interesting stats based on values ● String Value of configuration key ‘k’ (metric4, “k=v”, 1493301898256) ○ just that (metric4, “k=t”, 1493301898257) ○ possible uses: logs, events, config (metric4, “k=1”, 1493301898258) (metric4, “k=4”, 1493301898259) 11

  12. 12

  13. Cassandra - Storage Management & Support ● Highly available, fault tolerant ● No specialized node roles ● Minimal configuration Performance & Scalability ● Optimized for writes ● Data compression ● Indexing 13

  14. Cassandra - Storage ● CQL based ● Partitioning & indexing of data based on usage ● Use built-in compression & TTL ● Use the Datastax driver fully async ● Support for latest C* 3.0.x release ● Keep updating to latest stable ● Use multiple tables for indexing 14

  15. App Layer ● REST API with JSON ● JAX-RS 2.0 (async spec) ● Fully async = JAX-RS 2.0 async + RX Java + async C* driver ● Stateless** server (Metrics, mostly) ● Minimal clustering via Infinispan ● Schema Management ● Easy to use ○ packaged distribution with WildFly ○ download and run, only JDK required 15

  16. Performance - Sample C* - 4 CPU, 4GB Hawkular - 4 CPU, 4GB message sizes: 10 datapoints: 2592 req/sec => 25920 datapoints/sec 100 datapoints: 365 req/sec => 36500 datapoints/sec 5000 datapoints: 7.6 req/sec => 38000 datapoints/sec C*, 8 CPU, 8GB Hawkular, 8 CPU, 4GB message sizes: 10 datapoints: 4655 req/sec => 46550 datapoints/sec 100 datapoints: 604 req/sec => 60400 datapoints/sec 5000 datapoints: 15 req/sec => 75000 datapoints/sec 16

  17. Features ● Multi-tenant ○ tenant id required on each request (HAWKULAR-TENANT header) ○ no way to get data from multiple tenants at once ● Can insert data without pre-creating metrics ● Data is compressed using Gorilla compression ○ 2 hour time window ○ further reduces disk footprint ○ LZ4 enabled in Cassandra ○ Load testing: ■ 5000 data points/sec for 5 days = 26GB ■ 83M data points ~ 1GB of disk space 17

  18. Features ● Bulk insertion endpoint for metrics and data ● Tagging support for metrics and single data points ○ key, value; multi-tag support ■ tag1 = d ○ metrics queryable via TQL (tag query language) ■ AND, OR, NOT ■ grouping ■ wildcard matching ■ a1 = 'd' OR ( a1 != 'ab' AND c1 ) 18

  19. Features - Simple REST API ● Endpoint for each metric type ○ /gauges, /availability, /counters, /strings ○ Each metric type has almost identical endpoints ● Raw data - /gauges/raw ● Raw data for single metric - /strings/{metric_id}/raw ● Query time aggregation ○ multiple metrics - /availability/stats ○ single metric - /counter/{metric_id}/stats ● Bulk operations - /metrics 19 ** String metrics do not have stats (yet?)

  20. Features - Aggregation & Rate ● Query Time Aggregation ○ Combine multiple metrics and get statistical data ○ Gauge and counter: average, median, percentile, sum ○ Availability: ratios for uptime and downtime, downtime duration ○ Time Slicing: first group data, then compute stats ○ Single or multiple metrics ● Rate ○ available for gauges and counters ○ rate of change of the values for the timespan ○ ex: how fast is the number of total requests increasing 20

  21. Metrics + Alerting ● Natural fit: collect data and then alert on anomalies ● Two ways to alert on metric data ○ Dedicated API for setting up alerts, incoming data is filtered and processed by the alerting engine ○ Metrics Alerter that queries single or multiple metrics, no need to predefine alerts triggers ahead of time. 21

  22. Alerting Features ● Single and group Triggers ● Template triggers ● Complex conditions ● Dampening ● Auto-resolve/auto-disable triggers ● Pluggable notifiers 22

  23. 23

  24. Roadmap - 2017 ● Automatic & persisted aggregation ● Management capabilities for the Cassandra cluster ● Query language ● Performance improvements ○ already have a good baseline, but can do better ○ read/write 24

  25. Demo

  26. Demo ● Install ccm ○ https://github.com/pcmanus/ccm ● Start a single node C* cluster ○ ccm create -v 3.0.12 -n 1 -s hawkular ● Download, extract and start Hawkular Metrics ○ https://origin-repository.jboss.org/nexus/content/groups/public/org/ha wkular/metrics/hawkular-metrics-wildfly-standalone/0.26.1.Final/ ○ bin/standalone -b 0.0.0.0 ● Download, extract and start Grafana ● Download, install, and configure the Hawkular plugin for Hawkular ○ https://grafana.com/plugins/hawkular-datasource/installation ○ https://github.com/hawkular/hawkular-grafana-datasource i. pick a tenant id of your choice 26

  27. Demo ● Install the Hawkular Metrics python client via pip ○ pip install hawkular-client ● Install psutil to collect CPU stats ○ pip install psutil ● Create an custom agent (using python client) ○ make sure you use the same tenant id configured with Grafana ○ pre-create and tag a metric for each CPU ○ collect CPU usage every 10 seconds ○ send the data to Hawkular Metrics 27

  28. Demo #! /usr/bin/env python3 import psutil, time from hawkular.metrics import HawkularMetricsClient, MetricType client = HawkularMetricsClient(tenant_id='test') cpu_percent = psutil.cpu_percent(interval = 1, percpu = True) for index, cpu in enumerate(cpu_percent) : client.create_metric_definition(MetricType.Gauge, 'cpu%s' % index, cpu = 'cpu%s' % index) while True : cpu_percent = psutil.cpu_percent(interval = 1, percpu = True) for index, cpu in enumerate(cpu_percent) : client.push(MetricType.Gauge, 'cpu%s'% index, float(cpu)) time.sleep(10) 28

  29. Resources ● Web - http://www.hawkular.org/ ● Github - https://github.com/hawkular ● Metrics Documentation - http://www.hawkular.org/tags/metrics.html ● Alerting Documentation - http://www.hawkular.org/tags/alerts.html ● Twitter - https://twitter.com/hawkular_org 29

  30. Thank you! hawkular.org #hawkular (on freenode) snegrea@redhat.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend