Alarming Alarming Julien Danjou Nick Barcet Eoghan Glynn - - PowerPoint PPT Presentation

alarming
SMART_READER_LITE
LIVE PREVIEW

Alarming Alarming Julien Danjou Nick Barcet Eoghan Glynn - - PowerPoint PPT Presentation

Ceilometer Heat ceilometer Alarming Alarming Julien Danjou Nick Barcet Eoghan Glynn jd__@Freenode // @juldanjou nijaba@Freenode // @nijaba eglynn@Freenode julien@danjou.info nick@enovance.com eglynn@redhat.com Speakers Nick


slide-1
SLIDE 1

Julien Danjou

jd__@Freenode // @juldanjou julien@danjou.info

Ceilometer Heat

Alarming

Alarming

Nick Barcet

nijaba@Freenode // @nijaba nick@enovance.com

Eoghan Glynn

eglynn@Freenode eglynn@redhat.com

ceilometer

slide-2
SLIDE 2

Speakers

  • Nick Barcet co-founded the Ceilometer project at the

Folsom summit and led the project through incubation

  • Julien Danjou has been a core Ceilometer contributor

from the outset, taking over the PTL reins for Havana

  • Eoghan Glynn drove the addition of the Alarming

feature to Ceilometer over the Havana cycle

slide-3
SLIDE 3

Two seemingly disjoint projects intersect

  • Heat is a template-driven orchestration engine

○ automates complex deployments via declarative configuration

  • Ceilometer is a metering infrastructure

○ collects data measuring resource usage and performance

  • Appear on the surface to have minimal commonality ...
slide-4
SLIDE 4

Ceilometer Workflow

  • Collect from OpenStack components
  • Transform metering data if necessary
  • Publish meters to any destination (including

Ceilometer itself)

  • Store received meters
  • Aggregate samples via a REST API

Collect Transform Publish Store Aggregate

slide-5
SLIDE 5

Heat Workflow

{ "AWSTemplateFormat" : "2010-09-09", "Parameters": { "VolumeSize" : { … } } "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } } } } my_stack.template

slide-6
SLIDE 6

Heat Workflow

{ "AWSTemplateFormat" : "2010-09- 09", "Parameters": { "VolumeSize" : { … } } "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } } } }

consumed by

Heat Engine

slide-7
SLIDE 7

Heat Workflow

consumed by

Heat Engine

interacts with

{ "AWSTemplateFormat" : "2010-09- 09", "Parameters": { "VolumeSize" : { … } } "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } } } }
slide-8
SLIDE 8

Heat Workflow

{ "AWSTemplateFormat" : "2010-09- 09", "Parameters": { "VolumeSize" : { … } } "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } } } }

consumed by

Heat Engine

interacts with spins up Instance Volume my_stack

slide-9
SLIDE 9

Heat Autoscaling v1.0

reports load push-stats CW-lite my_stack

slide-10
SLIDE 10

Heat Autoscaling v1.0

Heat Engine

my_stack reports load scales out stack Instance Instance Instance

slide-11
SLIDE 11

Heat Autoscaling v1.0

Heat Engine

my_stack reports load scales out stack Instance Instance Instance Instance

slide-12
SLIDE 12

Ceilometer to the rescue!

  • compute agent already collects most

relevant stats from outside the instance

  • API service exposes aggregation over the

evaluation window

  • define new API exposing alarm lifecycle
  • provide new service to evaluate alarms

against their defined rules

  • additional service driving asynchronous

notifications when alarms fire

slide-13
SLIDE 13

How it all hangs together

{ "AWSTemplateFormat" : "2010-09- 09", "Parameters": { "VolumeSize" : { … } } "Mappings": { "Flavor2Arch" : { "tiny": {"Arch" : "64" }, ... }, "Resources": { "MyInstance" : { "Type" : "AWS::EC2::Instance", "Properties" : { “Volumes” : […] } } } }, "Outputs": { "DNS" : { "Value" : { … } } } }

added to template ● alarms bounding busy/idleness of

instances

  • membership of autoscale group

represented via user metadata

  • alarm actions refer to scale

up/down policies

  • action URLs are pre-signed
  • policies define adjustment step size

& cooldown period

slide-14
SLIDE 14

How it all hangs together

"CPUAlarmHigh": { "Type": "OS::Metering::Alarm", "Properties": { "meter_name": "cpu_util", "description": "Scale-up if CPU > 50%", "evaluation_periods": "1", "period": "60", "statistic": "avg", "comparison_operator": "gt", "alarm_actions":[…"ScaleUpPolicy", "AlarmUrl"…], "matching_metadata": { "metadata.user_metadata.server_group": "WebServerGroup" }}}

slide-15
SLIDE 15

How it all hangs together

Heat Engine

injects user metadata Instance my_stack

slide-16
SLIDE 16

How it all hangs together

Heat Engine

injects user metadata Instance my_stack

API service

creates alarms

Ceilometer

slide-17
SLIDE 17

How it all hangs together

Heat Engine

injects user metadata Instance my_stack

API service Compute Agent

creates alarms monitors instances

Ceilometer

slide-18
SLIDE 18

How it all hangs together

Heat Engine

injects user metadata Instance my_stack

API service Compute Agent

creates alarms

Alarm evaluator

monitors instances triggers alarm

Ceilometer

slide-19
SLIDE 19

How it all hangs together

Heat Engine

injects user metadata my_stack

API Compute Alarms

alarming scales out stack Instance Instance Instance

Ceilometer

slide-20
SLIDE 20

How it all hangs together

Heat Engine

injects user metadata my_stack

API Compute Alarms

alarming scales out stack Instance Instance Instance Instance Instance

Ceilometer

slide-21
SLIDE 21

How it all hangs together

Heat Engine

my_stack Instance

API service Compute Agent Alarm evaluator

reports samples provides alarm rules queries stats Meter store

Ceilometer

slide-22
SLIDE 22

Lessons learned

Keys to successful intra-project interactions:

  • buy-in from stakeholders on both sides
  • early validation and proof-points
  • protect consuming project from churn during

the development cycle

  • split deliverables into bite-sized separately

consumable chunks

slide-23
SLIDE 23

Future directions

  • wider metering coverage (RAM utilization)
  • constraints based on time-of-day/day-of-week
  • exclusion of low-quality datapoints
  • IPMI/SNMP-based monitoring of baremetal
  • Keystone trusts for credential delegation
slide-24
SLIDE 24

Further questions?

  • Chat on Freenode:

○ #openstack-metering ○ #heat

  • Mail the dev list:

○ openstack-dev@lists.openstack.org

  • Harangue us via Launchpad:

○ https://launchpad.net/ceilometer/+filebug