Monitoring Swift
OpenStack Summit, Austin 2016
Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016
Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. - - PowerPoint PPT Presentation
Monitoring Swift OpenStack Summit, Austin 2016 Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016 2 | SwiftStack Confidential Overview Problems Swift key monitoring concepts - Usage intelligence - What to
OpenStack Summit, Austin 2016
Adam Takvam, Sr. Systems Engineer Martin Lanner, Engagement Manager April 28, 2016
Overview
Prometheus + Grafana
Properties of Swift
Anatomy of a Monitoring Solution
pushed or advertises them
agents and provides an API with access to aggregated metric values
format for easy comprehension of system state
alerts when metrics fall out of an acceptable range
Forms of Monitoring
I/O, network, auditing cycles, replicator timing
Monitoring Lifecycle
Developing a Monitoring Strategy
Examples of monitoring methods
Key concepts for monitoring Swift
Load balancer health checks against Swift proxy servers
demo@demo:~$ curl http://swift.swiftstack.oss/healthcheck OK
| SwiftStack Confidential/healthcheck endpoint
Example:
Audit trails with ELK
| SwiftStack ConfidentialObject size distribution
| SwiftStack ConfidentialDistribution of CRUD operations over time
| SwiftStack ConfidentialZabbix triggers for Swift
| SwiftStack ConfidentialZabbix node memory usage
| SwiftStack ConfidentialZabbix drive utilization events
| SwiftStack ConfidentialDisk I/O
| SwiftStack ConfidentialObject Replicator Operations
| SwiftStack ConfidentialPrometheus + Grafana trending and forecasting
| SwiftStack ConfidentialAlerting
ALERT StorageCritical24Hours IF sum(predict_linear(node_filesystem_free{ job='swiftstack',mountpoint=~"/srv/node/.*” }[1d]), 24*3600) < sum(node_filesystem_size{ job="swiftstack",mountpoint=~"/srv/node/.*” }) * 0.2 FOR 1h LABELS { group="storage_admin“ severity="critical“ }
| SwiftStack ConfidentialTranslation: Send a critical alert to all members of the storage_admin group if the total available storage capacity is projected to be less than 20% of the total storage capacity within the next 24 hours and that forecast has held true for at least 1 hour, recalculating every 5 minutes (per server config / not shown).
Example:
Thank you!
| SwiftStack Confidential