Achieving Five Nines of VNF Reliability in Telco-grade OpenStack - - PowerPoint PPT Presentation

achieving five nines of vnf reliability in telco grade
SMART_READER_LITE
LIVE PREVIEW

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack - - PowerPoint PPT Presentation

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud Panel Discussion 1:50 PM 2:30 PM on Wednesday, April 27, 2016 Kandan Kathirvel, AT&T; Eoin Walsh, Intel; Rimma Iontel, Red Hat Inc. and Fausto Marzi, Ericsson.


slide-1
SLIDE 1

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud

Panel Discussion

Kandan Kathirvel, AT&T; Eoin Walsh, Intel; Rimma Iontel, Red Hat Inc. and Fausto Marzi, Ericsson. Moderated by Haseeb Akhtar, Ericsson 1:50 PM – 2:30 PM on Wednesday, April 27, 2016

slide-2
SLIDE 2

Page 2

Converting PNF to VNF without cloud awareness is not optimal

Compute Datacenter / Central Office

Application Orchestration Software Software defined Network

Same Purpose built and dedicated for field of use

N/w Fabric

Purpose built & dedicated (Most cases) Physical Connection – no SDN

Physical Network Function Virtual Network Function

Same

Host OS & Virtualization

Not Virtualized (most cases) Vendor provided OS Mostly Manual or Contained Purpose build software

Operationally Perfected over decades

Early adoption and rapid evolution

Same

Multi-tenant (Common framework) Commodity hardware (any Vendor) Cloud Provided (Common) Common automation Rapidly evolving New & Evolving

Requires significant innovation

slide-3
SLIDE 3

Page 3

VNF availability depends on Cloud & VNF resiliency

High Risk of Application Outages Low Risk of Application Outages Cloud Aware Applications

Most of VNFs

Few

1 2 3 4 5

Openstack Region

Geo Location 1 (DC1) Geo Location 2 (DC2) Geo Location 1 (DC1) Geo Location 4 (DC4)

99.99% (52.56 mins down/year) 99.9% (8.76 hrs down/year) 99.999%(5.26 Mins down/year) VNF Availability Some VNFs current state VNF HA in a region VNF - Single VM VNF HA in 2 regions at same DC VNF HA across 2 DCs VNF HA across 4 DCs Optimal

VNF

Single Instance of OpenStack region is about 99.9% (8.76 hours unplanned downtime per year)

Openstack Region Openstack Region1 Openstack Region2 Single DC Single DC Single DC

Few

Optimal

2 Regions at a DC

slide-4
SLIDE 4

Page 4

Proposed OpenStack Enhancements

Few

  • Hitless upgrades – reduce overall platform downtime
  • Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
  • Multi-location awareness & workload placement
  • Resiliency/Stability testing framework in OpenStack Rally – measure and report
  • Auto healing framework for OpenStack Controllers

VNF evolution

  • Support HA both locally and Globally
  • Leverage OpenStack/Cloud Platform resiliency features ex: anti-affinity to place VMs on

different servers

slide-5
SLIDE 5

Page 5

Open APIs

VIM

Compute Platform Resource Monitoring & Reporting Virtualization Enhanced Platform Awareness

Service Catalog Service Orchestration Service Assurance VNF Manager

SDN Controller

Network Orchestration Security

Services

Virtual Resource Monitoring & Reporting

Descriptor Repositories OSS/BSS

VNFC

Analytics

Storage Network vCompute vStorage vNetwork

VNF VNFC VNFC VNF VNFC

NFV Ready Architecture

slide-6
SLIDE 6

Page 6

Proposed OpenStack Enhancements

Few

  • Hitless upgrades – reduce overall platform downtime
  • Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
  • Multi-location awareness & workload placement
  • Resiliency/Stability testing framework in OpenStack Rally – measure and report
  • Auto healing framework for OpenStack Controllers
  • Automated provisioning and monitoring (Ceilometer, Heat and Ironic)
  • Intelligent workload placement (Nova scheduler)
slide-7
SLIDE 7

Page 7

Open APIs

VIM

Compute Platform Resource Monitoring & Reporting Virtualization Enhanced Platform Awareness

Service Catalog Service Orchestration Service Assurance VNF Manager

SDN Controller

Network Orchestration Security

Services

Virtual Resource Monitoring & Reporting

Descriptor Repositories OSS/BSS

VNFC

Analytics

Storage Network vCompute vStorage vNetwork

VNF VNFC VNFC VNF VNFC

NFV Ready Architecture

slide-8
SLIDE 8

Page 8

Proposed OpenStack Enhancements

Few

  • Hitless upgrades – reduce overall platform downtime
  • Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
  • Multi-location awareness & workload placement
  • Resiliency/Stability testing framework in OpenStack Rally – measure and report
  • Auto healing framework for OpenStack Controllers
  • Automated provisioning and monitoring (Ceilometer, Heat and Ironic)
  • Intelligent workload placement (Nova scheduler)
  • Tools to measure, monitor and report end-to-end platform SLA
slide-9
SLIDE 9

Page 9

Compute Node HA – Local

Prerequisite:

Compute nodes shared storage

Disaster Workflow:

Disaster is detected Compute node evacuation Users connect to the new service

Risks:

Node Fencing

Active HA Controller

Compute Node 1

VM VM VM

HA Agent Compute Node n HA Agent Compute Node 2

VM VM VM

HA Agent

VM VM VM

Control Node 1 Control Node 2 Control Node 3

Database

Corosync + Pacemaker HA HA HA

VM VM VM VM VM VM VM VM VM

slide-10
SLIDE 10

Page 10

Compute Node HA – Global

BGP

AS XXXXX

Internet

DC1 DC2 1.1.1.0/24 1.1.2.0/24 VM1 Evacuation VM1

FIP: 100.100.1.15 Freezer-dr-api FIP: 100.100.1.15 Freezer-dr-api

Prerequisite:

Data replication

Operational Workflow:

Floating IPs retrieved from Nova Announce IPs with BGP or OSPF

Disaster Workflow:

Disaster is detected Compute node evacuation On the other Compute Nodes the floating IPs are retrieved Floating IP announced with BGP or OSPF Users connect to the new service

Risks:

Node and DC Fencing

slide-11
SLIDE 11

Page 11

Proposed OpenStack Enhancements

Few

  • Hitless upgrades – reduce overall platform downtime
  • Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
  • Multi-location awareness & workload placement
  • Resiliency/Stability testing framework in OpenStack Rally – measure and report
  • Auto healing framework for OpenStack Controllers
  • Automated provisioning and monitoring (Ceilometer, Heat and Ironic)
  • Intelligent workload placement (Nova scheduler)
  • Global and Local Compute HA management
slide-12
SLIDE 12

Page 12

Thanks! We need to build together.