Automation + Machine Learning = Hands Free NFV A Word On Automation - - PowerPoint PPT Presentation

automation machine learning
SMART_READER_LITE
LIVE PREVIEW

Automation + Machine Learning = Hands Free NFV A Word On Automation - - PowerPoint PPT Presentation

01.11.2017 Automation + Machine Learning = Hands Free NFV A Word On Automation through ML for Openstack NFV PRAKASH RAMCHANDRAN MICHAEL TIEN JAYANTHI A GOKHALE Agenda NFV Automation Challenges? Practical Viewpoint from Dell Labs?


slide-1
SLIDE 1

Automation + Machine Learning = Hands Free NFV

A Word On “Automation through ML for Openstack NFV ”

PRAKASH RAMCHANDRAN MICHAEL TIEN JAYANTHI A GOKHALE

01.11.2017

slide-2
SLIDE 2

NFV Automation Challenges?

Automation of NFV Service & NFV Infrastructure

What’s new in Standards ?

ZSM the new NFV Zero Touch Evolving Standards

What can ML Bring to automation ?

Manual to ML driven Intelligent Automation

Practical Viewpoint from Dell Labs?

Redfish

Practical Viewpoint from Dell Labs?

Swordfish

What’s next?

The Industry in moving towards E2E Orchestration and Management

Agenda

slide-3
SLIDE 3

@OpenStack

NFV Automation Challenges

Dell

  • penstack
  • penstack

OpenStackFoundation

slide-4
SLIDE 4

NFV Adoption Challenge

  • Lack of end-to-end automation
  • Lack interoperability NFVI/VNF and VNF/VNF
  • Unpredictable datacenter planning
  • Lack of service awareness
  • Not easy consumable API
  • Limited programmability service-to-service
  • Not zero touch-free
  • Requires various resources to maintain (IT,

DevOps, Operations)

  • Requires multiple POCs
slide-5
SLIDE 5

Today’s Datacenter Challenge

Automation - the key to unlocking future efficiencies

slide-6
SLIDE 6

@OpenStack

What’s new in Standards

Dell

  • penstack
  • penstack

OpenStackFoundation

slide-7
SLIDE 7

ETSI ISG ZSM

slide-8
SLIDE 8

Zero Touch Networking Service and Management

  • M2M Communications, Provisioning, and

Management

  • Dynamic service chain data mapping
  • Dynamic policy enhancement and

enforcement

  • Continuous Data Collector and Analytic
  • Auto reactive + proactive self-healing
  • Real-time datacenter capacity scheduler
  • Autonomous end-to-end orchestration

lifecycle management

  • Intelligent service-state awareness
  • ptimization
  • Smart API

Zero touch NFV provides true next generation NFVaaS or VNFaaS

slide-9
SLIDE 9

What is Machine Learning?

Machine learning is a way for infrastructure or platforms to understand and progressively learn from input data to validate models to understand the behavior of system to attain desirable outcomes. (e.g.. Overcoming FCAPS in Telco terms) Why Does Automation need ML? In our case Anomaly detection of systems, networking and network functions is the goal based on FCAPS. This can be done by supervised or unsupervised or dynamic learning Basic requirements for this is Closed loop Control mechanism.

slide-10
SLIDE 10

Self healing within Layers (Local Policy) & ML for Cross-layer (Global Policy)

What’s new in Standards / Opensource for NFV Stack

Connected Vehicle Application

Service / Application OSS/BSS

TR 188 004 Open Policy Agent Network Slicing

NS NFVO/SDNO

ONAP, SDNC Container workload

VNF VNFM

EMS, VNF SDK Kata,CNI,NVME

VM, VN,VS VIM

Containerized OS

slide-11
SLIDE 11

@OpenStack

What can ML Bring to Automation

Gokhale Jayanthi

  • penstack
  • penstack

OpenStackFoundation

slide-12
SLIDE 12

Traditional Manual Deploy Cycle

The changing landscape of Infrastructure

  • Bare metal
  • Hypervisor
  • VM – Booting, Secure Booting, Booting

from Volume

  • Container CRI & CNI
  • Light weight VM – Kata - Intel
  • VM in Container – Unnamed yet - Redhat
slide-13
SLIDE 13

Why NFV Automation needs innovation?

  • NFV and SDN integrated clouds are growing from centralized to geographically scattered and

massively distributed clouds

  • Thus Orchestration, Management and Maintenance has become more challenging and requires

more attention to distributed , hybrid clouds and need of hour is to accelerated service velocity.

  • Automation is a prime solution to Provision and Maintain complete environment.
  • With a mix of Intelligent Infrastructure & Machine Learning it is possible to target dynamic cloud

management.

  • We focus here on Service & Infrastructure Management automation.
  • We share our experience dealing with compute (Redfish), storage(swordfish) and Networking (SDN-

WAN) and how we add closed loops and benefit derived form Data Collection and Analysis with ML.

  • Leads to Hands Free or Zero Touch NFV.
slide-14
SLIDE 14

Some Statistics

  • 80% of outages impacting mission critical applications are caused by

people and process issues

  • 50% of these are caused by change, configuration, handoff, release

integration, re deployable application services etc

  • Though the number of downtime hours is reduced, cost of downtime is now

50X

  • Automate the Deployment Process, Intelligently
slide-15
SLIDE 15

The Learning Input Points

slide-16
SLIDE 16

Intelligent Infrastructure Deployment

  • Identify smart ways to create, manage and orchestrate federation environments.
  • ML can be utilized to train AI systems to recognize demand and deployment patterns in the

context of various Service Level Objective metrics, called dimensions, like

  • Number of VM instances
  • Network demand
  • Migration metrics
  • Latency measures
  • SLA parameters of throughput
  • Number and type of SLA violations
  • Cluster sizes
  • AZs
  • ML can be used to devise optimized containers, container sizing, planning of microservices
  • Results in true Agile Infrastructure provisioning.
slide-17
SLIDE 17

Intelligent Infrastructure

  • Service Providers can easily and efficiently accommodate the demands of

mixed workloads from a single platform.

  • Leveraging the QoS capabilities, policies can be provisioned and enforced to

isolate each workload while running simultaneously within a shared infrastructure.

  • ML needs vast amounts of real time performance data generated by a QOS

monitor and network telemetry data, providing early recognition of developing performance issues, before they negatively impact human

  • experience. The

ML provides information to fine tune / redeploy the infrastructure to optimize the QOS metrics.

slide-18
SLIDE 18

Automated Deployment Process

  • Static Deployment
  • Templated. Flavours can be used to select based on requirements.
  • Dynamic
  • Dynamically determine deployment context and deployment parameters.

Define the deployment plan. Once defined, it remains static.

  • Smart / Intelligent Deployment
  • ML and AI driven deployment to optimise the Service Level Objectives. The

deployment plan is predicted, evaluated, customized and optimized.

  • TOSCA document used to describe the services and applications to be

deployed on the cloud the deployment description

slide-19
SLIDE 19

Advantages

  • Eliminate manual intervention out of the deployment process

(application and infrastructure)

  • Reduce complexity. Can now consider major and minor driving factors

to strategise deployment plan

  • Global and local optimization is possible
slide-20
SLIDE 20

Automating the process

slide-21
SLIDE 21

TOSCA

  • Topology & Orchestration Specification of Cloud Applications
  • Standardised language to describe
  • Detailing of the application & infrastructure in a portable manner
  • Defines the structure and composition of applications and their infrastructure
  • Defines the relationships
  • Specifies state and behaviour (deploy, shutdown, restart etc)
  • Relate this with the cloud infrastructure management policies (and associated

SLAs)

  • Model that specifies applications, virtual and physical infrastructure.
  • Stores the info in a ‘service template’ in yaml which is processed at

deploy-time and perform virtual & physical deployment

slide-22
SLIDE 22

Application Topology

  • Defined at 3 levels
  • Infrastructure (cloud and DC objects)
  • Platform / Middleware (App Containers)
  • Application modules and their configuration
slide-23
SLIDE 23

Service Orchestration

  • Should address to
  • Cloud Infra Orchestration
  • Container Orchestration
  • Network Orchestration
  • Application Orchestration (including Legacy Applications)
slide-24
SLIDE 24

TOSCA supported ML

Models

ML METEOS

Candidate Model Params Gather Metrics Conductor

Modify Template Re deploy Revise & Select Build & Update models Ceilometer Logs

slide-25
SLIDE 25

Training System

  • Pruned Decision Tree
  • Neural Network
  • Hyper parameter optimization using cross validation (Random Forests)
slide-26
SLIDE 26

Metrics, a few examples

  • Number of instances
  • Instance size
  • Demand of Load
  • Inter arrival request time
  • Delay time / Latency to service a request
  • Workload latency
  • Throughput time for service
  • Telemetry data
  • Network demand
  • Number of SLA violations
  • Number of containers
  • Cost of number of replication sets
slide-27
SLIDE 27

Technology Stack

  • Apache Kafka
  • WEKA
  • Scala
  • Python & Java languages
  • Docker
  • Kubernetes
  • Kata
slide-28
SLIDE 28

@OpenStack

Practical Viewpoint from Dell Labs

Michael Tien

  • penstack
  • penstack

OpenStackFoundation

slide-29
SLIDE 29

Redfish – the next-generation systems management standard for an evolving IT environment

  • DMTF Scalable Platform Management Forum has created an
  • pen industry standard specification and schema for simple,

modern, and secure management of scalable platform hardware

  • A secure, multi-node, RESTful management interface built upon

HTTPS in JSON format based upon OData v4

  • Schema-based but human-readable; usable by client

applications and browser-based GUIs

  • Covers key use cases and customer requirements
slide-30
SLIDE 30

What Redfish can do today?

Provides a common interface across platforms and vendors supporting ▪ Reset, reboot, and power control servers ▪ Inventory server hardware and firmware versions ▪ Monitor health status of server ▪ Access system logs ▪ Alert on server health status changes

slide-31
SLIDE 31

Delivering the benefits of Redfish - 14G iDRAC9 with Lifecycle Controller

slide-32
SLIDE 32

New for 14G iDRAC9 RESTful API with Redfish

  • iDRAC RESTful API enables modern,

secure, scalable management automation

  • Conformant with Redfish 1.2
  • BIOS configuration
  • Secure boot configuration
  • Firmware inventory and update
  • Enhanced iDRAC RESTful API extensions
  • Profile-driven server configuration

and update

  • iDRAC configuration
slide-33
SLIDE 33

Modern tools for Redfish management automation

import requests import json system = requests.get('https://<iDRAC IP>/redfish/v1/Systems/System.Embedded.1',verify=False,auth=('root',’ passwd')) storage = requests.get('https://<iDRAC IP>/redfish/v1/Systems/System.Embedded.1/Storage/Controllers/RAID.Int egrated.1- 1',verify=False,auth=('root’,’curry')) systemData = system.json() storageData = storage.json() print "Model: {}".format(systemData[u'Model']) print "Manufacturer: {}".format(systemData[u'Manufacturer']) print "Service tag {}".format(systemData[u'SKU']) print "Serial number: {}".format(systemData[u'SerialNumber']) print "Hostname: {}".format(systemData[u'HostName']) print "Power state: {}".format(systemData[u'PowerState']) print "Asset tag: {}".format(systemData[u'AssetTag']) print "Memory size: {}".format(systemData[u'MemorySummary'][u'TotalSystemMemoryGiB']) print "CPU type: {}".format(systemData[u'ProcessorSummary'][u'Model']) print "Number of CPUs: {}".format(systemData[u'ProcessorSummary'][u'Count']) print "System status: {}".format(systemData[u'Status'][u'Health']) print "RAID health: {}".format(storageData[u'Status'][u'Health'])

Server inventory with Python scripting Server storage health status via Postman plug-in

slide-34
SLIDE 34
  • IT developers are seeking
  • Fast, reliable, and repeatable outcomes
  • On-demand runtime environment creation
  • Consistent staging and production

environment

  • Emerging solutions utilize orchestration tools and

RESTful programming

  • “Infrastructure as a Code”
  • Complete version control covering code,

configuration, and data

  • Aligns development and operations
  • Overriding goal
  • “desired state” management for deployment,

update, and configuration drift control iDRAC

New for 14G iDRAC9 RESTful API with Redfish

slide-35
SLIDE 35
  • Server Configuration Profiles (SCP) enable

RESTful configuration of PowerEdge BIOS, iDRAC/LC, PERC controllers, NICs, and HBAs

  • API provides for export, preview, and import
  • perations to replicate existing and create

custom server configurations

  • SCP files can be stored on CIFS, NFS, or

HTTP/S network shares or streamed within API

  • SCP XML and JSON file formats
  • Firmware update from network-based

repository

  • Zero-touch Auto Configuration via CIFS, NFS
  • r HTTP/S network share

New for 14G iDRAC9 RESTful API with Redfish

New for 14G

slide-36
SLIDE 36

What’s next for Redfish?

Dell EMC and the DMTF driving development of Redfish with significant additions planned  “Swordfish” external storage standards  Network switch API standards  Environmental APIs for power and HVAC  Interoperability with Open Compute Project, OpenStack, and

  • rchestration solutions

 Expanded automation developer tooling

slide-37
SLIDE 37

SNIA Adding to Redfish Resource Map

Block storage Provisioning with class of service control Volume Mapping and Masking Replication Capacity and health metrics File system storage Adds File System and File Share Leverages all other concepts – provisioning with class of service, replication, … Additional content Object drive storage

slide-38
SLIDE 38

Profiles define sets of required functionality to support: Basic Swordfish support

  • Hosted service configuration
  • Integrated service configuration

Add-on functionality:

  • Local replication
  • Remote replication

Certification Conformance Requirements (in Plans) EnergyStar Requirements: Orthogonal to functionality profiles – Energy and power metrics – Controls for on-demand instrumentation

slide-39
SLIDE 39

SNIA Adding Storage to Redfish : Swordfish

(Hosted Service Configuration) Block storage Provisioning with class of service control Volume Mapping and Masking Replication Capacity and health metrics File system storage Adds File System and File Share Leverages all other concepts – provisioning with class of service, replication, … Additional content Object drive storage

slide-40
SLIDE 40

SNIA Adding Storage to Redfish : Swordfish

(Integrated Service Configuration) Block storage Provisioning with class of service control Volume Mapping and Masking Replication Capacity and health metrics File system storage Adds File System and File Share Leverages all other concepts – provisioning with class of service, replication, … Additional content Object drive storage

slide-41
SLIDE 41

@OpenStack

THANKS.

Questions?

  • penstack
  • penstack

OpenStackFoundation