Demonstrating At Scale Monitoring Of OpenStack Cloud Using - PowerPoint PPT Presentation

Open Infrastructure Summit 2019 Demonstrating At Scale Monitoring Of OpenStack Cloud Using Prometheus Anandeep Pannu Pradeep Kilambi apannu@redhat.com prad@redhat.com 1

● ● ● ● ● ●

Definitions 3

○ ○ ○ ○ ○

Implications for Open Infrastructure 7

● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

Critical Monitoring Features

● Portability across different footprints ● HA, scaling, persistence available for free ● Re-use platform capabilities - eg. Prometheus ● Users integrate for capabilities they want ● Stringent SLAs can be met ● Plug-in different OSS components with the same API ● For each API, SLAs achieved can be optimized ○ E.g Fault management uses message bus directly ● Metrics meta-data and declarative metrics for every component, so metrics can be incorporated automatically ● Data sensing, collection and processing ○ Either, some or all processed at the Edge ● Centralized access to reports, alerts ● Integration with Analytics

Service Assurance Framework Architecture

Architecture Overview On-site infrastructure platform

○ ■ ○ ■ ■ ○ ■

3rd Party Prometheus Operator Integrations MGMT Cluster Metrics APIs Events Dispatch Routing Message Distribution Bus (AMQP 1.0) V V V M M M syslog /proc pid kernel Application Components cpu mem net Prometheus-based K8S (VM, Container); Monitoring hardware Controller, Compute, Ceph, RHEV, OpenShift Nodes (All Infrastructure Nodes)

● Collectd container -- Host / VM metrics collection framework ○ Collectd 5.8 with additional OPNFV Barometer specific plugins not yet in collectd project ● Intel RDT, Intel PMU, IPMI ● AMQP1.0 client plugin ● Procevent -- Process state changes ● Sysevent -- Match syslog for critical errors ● Connectivity -- Fast detection of interface link status changes ○ Integrated as part of TripleO (OSP Director)

write_syslog write_kafka write_prometheus amqp_09 amqp1

AMQ 7 Interconnect - Native AMQP 1.0 Message Router ● Large Scale Message Networks ○ Offers shortest path (least cost) message routing Client Server Client B ○ Used without broker ○ High Availability through redundant path topology and re-route (not clustering) Server C ○ Automatic recovery from network partitioning failures ○ Reliable delivery without requiring storage ● QDR Router Functionality ○ Apache Qpid Dispatch Router QDR ○ Dynamically learn addresses of messaging endpoints Client ○ Stateless - no message queuing, end-to-end transfer Server A High Throughput, Low Latency Low Operational Costs

Prometheus Operator ● ● ● ● ●

● ○ ○ ○ ● ○ ○ ● ○

Evolution

Site 10 Site 2 Site 1 compute compute compute compute compute compute compute compute compute compute compute compute ceph ceph ceph ceph ceph ceph ceph ceph ceph ceph ceph ceph cntrl 1 cntrl 1 cntrl 1 cntrl 2 cntrl 2 cntrl 2 cntrl 3 cntrl 3 cntrl 3 OS Networks AMQP OS Networks AMQP AMQP OS Networks Remote Site(s) Layer 3 Network to Remote Sites Central Site Grafana Prometheus Operator++ Cluster compute compute QDR QDR QDR compute compute ceph S S S ceph G G G ceph Prometheus ceph cntrl 1 cntrl 2 cntrl 3 AMQP OS Networks

DCN Use Case Deployment Stack OPTIONAL OPTIONAL Undercloud AZ0 AZ0 +Container Registry Compute Nodes Ceph Cluster 0 (Local Ephemeral) Controller Nodes Primary Site L3 Routed AZ1 AZ2 AZ3 AZ4 AZn Compute Nodes Compute Nodes Compute Nodes Compute Nodes Compute Nodes (Local Ephemeral) (Local Ephemeral) (Local Ephemeral) (Local Ephemeral) (Local Ephemeral) DCN Site 1 DCN Site 2 DCN Site 3 DCN Site 4 DCN Site n

Configuration & Deployment

TripleO Integration Of client side components Collectd and QDR profiles are integrated as part of the TripleO ● Collectd and QDRs run as containers on the openstack nodes ● Configured via heat environment file ● Each node will have a qpid dispatch router running with collectd ● agent Collectd is configured to talk to qpid dispatch router and send ● metrics and events Relevant collectd plugins can be configured via the heat template file ●

TripleO Client side Configuration environments/metrics-collectd-qdr.yaml ## This environment template to enable Service Assurance Client side bits resource_registry: OS::TripleO::Services::MetricsQdr: ../docker/services/metrics/qdr.yaml OS::TripleO::Services::Collectd: ../docker/services/metrics/collectd.yaml parameter_defaults: CollectdConnectionType: amqp1 CollectdAmqpInstances: notify: notify: true format: JSON presettle: true telemetry: format: JSON presettle: false

TripleO Client side Configuration params.yaml cat > params.yaml <<EOF --- parameter_defaults: CollectdConnectionType: amqp1 CollectdAmqpInstances: telemetry: format: JSON presettle: true MetricsQdrConnectors: - host: qdr-white-normal-sa-telemetry.apps.dev7.nfvpe.site port: 443 role: edge sslProfile: tlsProfile verifyHostname: false EOF

Client side Deployment Using overcloud deploy with collectd & qdr configuration and environment templates cd ~/tripleo-heat-templates git checkout master cd ~ cp overcloud-deploy.sh overcloud-deploy-overcloud.sh sed -i 's/usr\/share\/openstack-/home\/stack\//g' overcloud-deploy-overcloud.sh ./overcloud-deploy-overcloud.sh -e /usr/share/openstack-tripleo-heat-templates/environments/metrics-collectd-qdr.yaml -e /home/stack/params.yaml

● ● ● ●

There are 3 core components to the telemetry framework: ● Prometheus (and the AlertManager) ● Smart Gateway ● QPID Dispatch Router Each of these components has a corresponding Operator that we'll use to spin up the various application components and objects.

To deploy telemetry framework from the script, simply run the following command after cloning the telemetry-framework repo[1] into the following directory. cd ~/src/github.com/redhat-service-assurance/telemetry-framework/deploy/ ./deploy.sh CREATE [1] https://github.com/redhat-service-assurance/telemetry-framework

Deploying Service Assurance Framework From Operator to Application Operators Custom Service Resources Assurance Framework

avg_over_time(sa_collectd_cpu_percent{type=~”system|user”}[1m] ) > 75 and avg_over_time(sa_collectd_cpu_percent{type=~”system|user”}[1m] ) < 90 Critical CPU Usage Alert: avg_over_time(sa_collectd_cpu_percent{type=~”system|user”}[1m] ) > 90

Architecture Demo Service Assurance framework

https://telemetry-framework.readthedocs.io/en/master/ ● ● https://quay.io/repository/redhat-service-assurance/smart-gateway-operator?tab=info https://github.com/redhat-service-assurance ● ●

○ ○

○ ○ ○ ○ ○ ○ ○

○ ○ ○

Target /Metrics HTTP PromQL Visualize Target HTTP /Metrics Prometheus Server ● ● ● ● ● ● ●

● ● ● ● ●

Demonstrating At Scale Monitoring Of OpenStack Cloud Using - PowerPoint PPT Presentation

Open Infrastructure Summit 2019 Demonstrating At Scale Monitoring Of OpenStack Cloud Using Prometheus Anandeep Pannu Pradeep Kilambi apannu@redhat.com prad@redhat.com 1 Definitions 3 4

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Moving SNE to the Cloud RP1i3 Sudesh Jethoe http://www.openstack.org/assets/openstack-logo/

SUSE OpenStack Cloud 126 126 What is SUSE OpenStack Cloud You know of Cloud Compute right?

Bringing Private Cloud to Australia OpenStack on VMware OpenStack Summit 2013 Introduction

Build your own Web Portal using OpenStack APIs and Services OpenStack Summit in Austin 2016

BUILD YOUR FIRST OPENSTACK APPLICATION WITH OPENSTACK PYTHONSDK VICTORIA MARTINEZ DE LA CRUZ

OpenStack Horizon: Controlling the Cloud using Django David Lapsley @devlaps,

Openstack Swift at Scale In the Beginning Cloud Files 2.0 5 developers 4 ops 9

Future of OpenStack Looking Forward to 2019 Alan.Clark@suse.com What and Why OpenStack

Get a Python job, Work on OpenStack ! about:me Release Manager for OpenStack Chair of

the network, with OpenStack based small scale MEC hosts OpenStack Summit, Vancouver GERGELY

Running Kubernetes on OpenStack and Bare Metal OpenStack Summit Berlin, November 2018 Ramon

OpenStack Charms Project Update, OpenStack Summit Berlin Frode Nordahl (fnordahl) Ryan Beisner

Coordination and Leadership challenges in producing OpenStack Thierry Carrez (@tcarrez) Release

OpenStack Networking Project Update, OpenStack Summit Sydney Miguel Lavalle, IRC mlavalle

What is OpenStack ? Hello! I am Thierry Carrez I work for the OpenStack Foundation. You can

Physical and Behavioral Health Integration: State Policy Approaches to Support Key Infrastructure

Final Rule Medicaid HCBS Disabled and Elderly Health Programs Group Center for Medicaid and CHIP

Application Orientation Oct. 24, 2019 Neighborhood Resources Department Introductions Name

Spire ENERGY EFFICIENCY PORTFOLIO 1 Sp ir e | E n er gy E fficien cy We see no lim it to what

The Home and Community Based Services Settings Rule: An Opportunity to Support Meaningful

Individually-Based Limitations Supporting Health and Safety of Individuals in Home and

Overview of Changes to SHOP Enrollment April 26, 2018 Centers for Medicare & Medicaid

Hong Kongs Securities and Futures market - Current Issues Mr Andrew Sheng Chairman

Demonstrating At Scale Monitoring Of OpenStack Cloud Using - PowerPoint PPT Presentation

Open Infrastructure Summit 2019 Demonstrating At Scale Monitoring Of OpenStack Cloud Using Prometheus Anandeep Pannu Pradeep Kilambi apannu@redhat.com prad@redhat.com 1 Definitions 3 4

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

Moving SNE to the Cloud RP1i3 Sudesh Jethoe http://www.openstack.org/assets/openstack-logo/

SUSE OpenStack Cloud 126 126 What is SUSE OpenStack Cloud You know of Cloud Compute right?

Bringing Private Cloud to Australia OpenStack on VMware OpenStack Summit 2013 Introduction

Build your own Web Portal using OpenStack APIs and Services OpenStack Summit in Austin 2016

BUILD YOUR FIRST OPENSTACK APPLICATION WITH OPENSTACK PYTHONSDK VICTORIA MARTINEZ DE LA CRUZ

OpenStack Horizon: Controlling the Cloud using Django David Lapsley @devlaps,

Openstack Swift at Scale In the Beginning Cloud Files 2.0 5 developers 4 ops 9

Future of OpenStack Looking Forward to 2019 Alan.Clark@suse.com What and Why OpenStack

Get a Python job, Work on OpenStack ! about:me Release Manager for OpenStack Chair of

the network, with OpenStack based small scale MEC hosts OpenStack Summit, Vancouver GERGELY

Running Kubernetes on OpenStack and Bare Metal OpenStack Summit Berlin, November 2018 Ramon

OpenStack Charms Project Update, OpenStack Summit Berlin Frode Nordahl (fnordahl) Ryan Beisner

Coordination and Leadership challenges in producing OpenStack Thierry Carrez (@tcarrez) Release

OpenStack Networking Project Update, OpenStack Summit Sydney Miguel Lavalle, IRC mlavalle

What is OpenStack ? Hello! I am Thierry Carrez I work for the OpenStack Foundation. You can

Physical and Behavioral Health Integration: State Policy Approaches to Support Key Infrastructure

Final Rule Medicaid HCBS Disabled and Elderly Health Programs Group Center for Medicaid and CHIP

Application Orientation Oct. 24, 2019 Neighborhood Resources Department Introductions Name

Spire ENERGY EFFICIENCY PORTFOLIO 1 Sp ir e | E n er gy E fficien cy We see no lim it to what

The Home and Community Based Services Settings Rule: An Opportunity to Support Meaningful

Individually-Based Limitations Supporting Health and Safety of Individuals in Home and

Overview of Changes to SHOP Enrollment April 26, 2018 Centers for Medicare &amp; Medicaid

Hong Kongs Securities and Futures market - Current Issues Mr Andrew Sheng Chairman

Overview of Changes to SHOP Enrollment April 26, 2018 Centers for Medicare & Medicaid