Hawkular Metrics Metric Storage & Alerting Stefan Negrea About - - PowerPoint PPT Presentation
Hawkular Metrics Metric Storage & Alerting Stefan Negrea About - - PowerPoint PPT Presentation
Hawkular Metrics Metric Storage & Alerting Stefan Negrea About Me Co-Creator of Hawkular Metrics 2 Hawkular Metrics Hawkular Demo & Alerting Introduction to Hawkular Metrics 3 Pre-History 2006 JBoss Operations Network 1.0
2
About Me
Co-Creator of Hawkular Metrics
3
Introduction to Hawkular Metrics Hawkular Demo Hawkular Metrics & Alerting
4
Pre-History
- 2006 JBoss Operations Network 1.0
- 2008 Project RHQ
○ JBoss Operations Network 2.0 ○ Metrics stored in Postgres
5
Pre-History
6
Pre-History
- 2012 - 2013 RHQ Storage Nodes
○ Cassandra based ○ Store metrics
- 2014 RHQ Metrics
7
Hawkular
It’s a hawk with a monocular. Hawks are known to have a very sharp vision and very good hunters, they can catch preys anticipating their movements at a very fast speed. The goal is to be able to monitor and catch anomalies in fast pace environments. All* projects are Apache License 2.0
= +
8
History
- 2014 Hawkular organization formed
- 2014 Hawkular Alerting started
- 02/2015 RHQ Metrics joins Hawkular org
- 12/2015 Hawkular Metrics integrated in OpenShift
Origin v3
- 10/2016 Hawkular Metrics includes Hawkular Alerting
Hawkular Metrics is a storage engine for metric data metric data = a measurement taken at a specific time storage engine = store metrics efficiently for their useful lifetime
Hawkular Metrics
9
- Gauge
○ number ○ varies (not monotonic) ○ rate of change
- Counter
○ integer ○ monotonic (increasing or decreasing ○ rate of change ○ support for reset
Supported Metrics
10
Memory usage (metric1, 4.5, 1493301898245) (metric1, 5.6, 1493301898246) (metric1, 1.2, 1493301898247) Number of visitors (metric2, 4, 1493301898248) (metric2, 5, 1493301898249) (metric2, 9, 1493301898250) (metric2, 0, 1493301898251)
- Availability
○ Availability of a resource ○ up, down, or unknown ○ can compute interesting stats based
- n values
- String
○ just that ○ possible uses: logs, events, config
Supported Metrics
11
Server status (metric3, UP, 1493301898253) (metric3, DOWN, 1493301898254) (metric3, UP, 1493301898255) Value of configuration key ‘k’ (metric4, “k=v”, 1493301898256) (metric4, “k=t”, 1493301898257) (metric4, “k=1”, 1493301898258) (metric4, “k=4”, 1493301898259)
12
Management & Support
- Highly available, fault tolerant
- No specialized node roles
- Minimal configuration
Performance & Scalability
- Optimized for writes
- Data compression
- Indexing
Cassandra - Storage
13
- CQL based
- Partitioning & indexing of data based on
usage
- Use built-in compression & TTL
- Use the Datastax driver fully async
- Support for latest C* 3.0.x release
- Keep updating to latest stable
- Use multiple tables for indexing
Cassandra - Storage
14
- REST API with JSON
- JAX-RS 2.0 (async spec)
- Fully async = JAX-RS 2.0 async + RX Java
+ async C* driver
- Stateless** server (Metrics, mostly)
- Minimal clustering via Infinispan
- Schema Management
- Easy to use
○ packaged distribution with WildFly ○ download and run, only JDK required
App Layer
15
C* - 4 CPU, 4GB Hawkular - 4 CPU, 4GB message sizes: 10 datapoints: 2592 req/sec => 25920 datapoints/sec 100 datapoints: 365 req/sec => 36500 datapoints/sec 5000 datapoints: 7.6 req/sec => 38000 datapoints/sec C*, 8 CPU, 8GB Hawkular, 8 CPU, 4GB message sizes: 10 datapoints: 4655 req/sec => 46550 datapoints/sec 100 datapoints: 604 req/sec => 60400 datapoints/sec 5000 datapoints: 15 req/sec => 75000 datapoints/sec
Performance - Sample
16
- Multi-tenant
○ tenant id required on each request (HAWKULAR-TENANT header) ○ no way to get data from multiple tenants at once
- Can insert data without pre-creating metrics
- Data is compressed using Gorilla compression
○ 2 hour time window ○ further reduces disk footprint ○ LZ4 enabled in Cassandra ○ Load testing: ■ 5000 data points/sec for 5 days = 26GB ■ 83M data points ~ 1GB of disk space
Features
17
- Bulk insertion endpoint for metrics and data
- Tagging support for metrics and single data points
○ key, value; multi-tag support ■ tag1 = d ○ metrics queryable via TQL (tag query language) ■ AND, OR, NOT ■ grouping ■ wildcard matching ■ a1 = 'd' OR ( a1 != 'ab' AND c1 )
Features
18
- Endpoint for each metric type
○ /gauges, /availability, /counters, /strings ○ Each metric type has almost identical endpoints
- Raw data - /gauges/raw
- Raw data for single metric - /strings/{metric_id}/raw
- Query time aggregation
○ multiple metrics - /availability/stats ○ single metric - /counter/{metric_id}/stats
- Bulk operations - /metrics
** String metrics do not have stats (yet?)
Features - Simple REST API
19
- Query Time Aggregation
○ Combine multiple metrics and get statistical data ○ Gauge and counter: average, median, percentile, sum ○ Availability: ratios for uptime and downtime, downtime duration ○ Time Slicing: first group data, then compute stats ○ Single or multiple metrics
- Rate
○ available for gauges and counters ○ rate of change of the values for the timespan ○ ex: how fast is the number of total requests increasing
Features - Aggregation & Rate
20
- Natural fit: collect data and then alert on
anomalies
- Two ways to alert on metric data
○ Dedicated API for setting up alerts, incoming data is filtered and processed by the alerting engine ○ Metrics Alerter that queries single or multiple metrics, no need to predefine alerts triggers ahead of time.
Metrics + Alerting
21
- Single and group Triggers
- Template triggers
- Complex conditions
- Dampening
- Auto-resolve/auto-disable triggers
- Pluggable notifiers
Alerting Features
22
23
- Automatic & persisted aggregation
- Management capabilities for the Cassandra cluster
- Query language
- Performance improvements
○ already have a good baseline, but can do better ○ read/write
Roadmap - 2017
24
Demo
- Install ccm
○ https://github.com/pcmanus/ccm
- Start a single node C* cluster
○ ccm create -v 3.0.12 -n 1 -s hawkular
- Download, extract and start Hawkular Metrics
○ https://origin-repository.jboss.org/nexus/content/groups/public/org/ha wkular/metrics/hawkular-metrics-wildfly-standalone/0.26.1.Final/ ○ bin/standalone -b 0.0.0.0
- Download, extract and start Grafana
- Download, install, and configure the Hawkular plugin for Hawkular
○ https://grafana.com/plugins/hawkular-datasource/installation ○ https://github.com/hawkular/hawkular-grafana-datasource i. pick a tenant id of your choice
Demo
26
- Install the Hawkular Metrics python client via pip
○ pip install hawkular-client
- Install psutil to collect CPU stats
○ pip install psutil
- Create an custom agent (using python client)
○ make sure you use the same tenant id configured with Grafana ○ pre-create and tag a metric for each CPU ○ collect CPU usage every 10 seconds ○ send the data to Hawkular Metrics
Demo
27
Demo
28
#! /usr/bin/env python3 import psutil, time from hawkular.metrics import HawkularMetricsClient, MetricType client = HawkularMetricsClient(tenant_id='test') cpu_percent = psutil.cpu_percent(interval = 1, percpu = True) for index, cpu in enumerate(cpu_percent) : client.create_metric_definition(MetricType.Gauge, 'cpu%s' % index, cpu = 'cpu%s' % index) while True : cpu_percent = psutil.cpu_percent(interval = 1, percpu = True) for index, cpu in enumerate(cpu_percent) : client.push(MetricType.Gauge, 'cpu%s'% index, float(cpu)) time.sleep(10)
- Web - http://www.hawkular.org/
- Github - https://github.com/hawkular
- Metrics Documentation - http://www.hawkular.org/tags/metrics.html
- Alerting Documentation - http://www.hawkular.org/tags/alerts.html
- Twitter - https://twitter.com/hawkular_org
Resources
29