Achieving Five Nines of VNF Reliability in Telco-grade OpenStack - - PowerPoint PPT Presentation

▶

Jun 04, 2023 104 likes •240 views

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud Panel Discussion 1:50 PM 2:30 PM on Wednesday, April 27, 2016 Kandan Kathirvel, AT&T; Eoin Walsh, Intel; Rimma Iontel, Red Hat Inc. and Fausto Marzi, Ericsson.

SLIDE 1

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud

Panel Discussion

Kandan Kathirvel, AT&T; Eoin Walsh, Intel; Rimma Iontel, Red Hat Inc. and Fausto Marzi, Ericsson. Moderated by Haseeb Akhtar, Ericsson 1:50 PM – 2:30 PM on Wednesday, April 27, 2016

SLIDE 2

Page 2

Converting PNF to VNF without cloud awareness is not optimal

Compute Datacenter / Central Office

Application Orchestration Software Software defined Network

Same Purpose built and dedicated for field of use

N/w Fabric

Purpose built & dedicated (Most cases) Physical Connection – no SDN

Physical Network Function Virtual Network Function

Same

Host OS & Virtualization

Not Virtualized (most cases) Vendor provided OS Mostly Manual or Contained Purpose build software

Operationally Perfected over decades

Early adoption and rapid evolution

Same

Multi-tenant (Common framework) Commodity hardware (any Vendor) Cloud Provided (Common) Common automation Rapidly evolving New & Evolving

Requires significant innovation

SLIDE 3

Page 3

VNF availability depends on Cloud & VNF resiliency

High Risk of Application Outages Low Risk of Application Outages Cloud Aware Applications

Most of VNFs

Few

1 2 3 4 5

Openstack Region

Geo Location 1 (DC1) Geo Location 2 (DC2) Geo Location 1 (DC1) Geo Location 4 (DC4)

99.99% (52.56 mins down/year) 99.9% (8.76 hrs down/year) 99.999%(5.26 Mins down/year) VNF Availability Some VNFs current state VNF HA in a region VNF - Single VM VNF HA in 2 regions at same DC VNF HA across 2 DCs VNF HA across 4 DCs Optimal

VNF

Single Instance of OpenStack region is about 99.9% (8.76 hours unplanned downtime per year)

Openstack Region Openstack Region1 Openstack Region2 Single DC Single DC Single DC

Few

Optimal

2 Regions at a DC

SLIDE 4

Page 4

Proposed OpenStack Enhancements

Few

Hitless upgrades – reduce overall platform downtime
Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
Multi-location awareness & workload placement
Resiliency/Stability testing framework in OpenStack Rally – measure and report
Auto healing framework for OpenStack Controllers

VNF evolution

Support HA both locally and Globally
Leverage OpenStack/Cloud Platform resiliency features ex: anti-affinity to place VMs on

different servers

SLIDE 5

Page 5

Open APIs

VIM

Compute Platform Resource Monitoring & Reporting Virtualization Enhanced Platform Awareness

Service Catalog Service Orchestration Service Assurance VNF Manager

SDN Controller

Network Orchestration Security

Services

Virtual Resource Monitoring & Reporting

Descriptor Repositories OSS/BSS

VNFC

Analytics

Storage Network vCompute vStorage vNetwork

VNF VNFC VNFC VNF VNFC

NFV Ready Architecture

SLIDE 6

Page 6

Proposed OpenStack Enhancements

Few

Hitless upgrades – reduce overall platform downtime
Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
Multi-location awareness & workload placement
Resiliency/Stability testing framework in OpenStack Rally – measure and report
Auto healing framework for OpenStack Controllers
Automated provisioning and monitoring (Ceilometer, Heat and Ironic)
Intelligent workload placement (Nova scheduler)

SLIDE 7

Page 7

Open APIs

VIM

Compute Platform Resource Monitoring & Reporting Virtualization Enhanced Platform Awareness

Service Catalog Service Orchestration Service Assurance VNF Manager

SDN Controller

Network Orchestration Security

Services

Virtual Resource Monitoring & Reporting

Descriptor Repositories OSS/BSS

VNFC

Analytics

Storage Network vCompute vStorage vNetwork

VNF VNFC VNFC VNF VNFC

NFV Ready Architecture

SLIDE 8

Page 8

Proposed OpenStack Enhancements

Few

Hitless upgrades – reduce overall platform downtime
Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
Multi-location awareness & workload placement
Resiliency/Stability testing framework in OpenStack Rally – measure and report
Auto healing framework for OpenStack Controllers
Automated provisioning and monitoring (Ceilometer, Heat and Ironic)
Intelligent workload placement (Nova scheduler)
Tools to measure, monitor and report end-to-end platform SLA

SLIDE 9

Page 9

Compute Node HA – Local

Prerequisite:

Compute nodes shared storage

Disaster Workflow:

Disaster is detected Compute node evacuation Users connect to the new service

Risks:

Node Fencing

Active HA Controller

Compute Node 1

VM VM VM

HA Agent Compute Node n HA Agent Compute Node 2

VM VM VM

HA Agent

VM VM VM

Control Node 1 Control Node 2 Control Node 3

Database

Corosync + Pacemaker HA HA HA

VM VM VM VM VM VM VM VM VM

SLIDE 10

Page 10

Compute Node HA – Global

BGP

AS XXXXX

Internet

DC1 DC2 1.1.1.0/24 1.1.2.0/24 VM1 Evacuation VM1

FIP: 100.100.1.15 Freezer-dr-api FIP: 100.100.1.15 Freezer-dr-api

Prerequisite:

Data replication

Operational Workflow:

Floating IPs retrieved from Nova Announce IPs with BGP or OSPF

Disaster Workflow:

Disaster is detected Compute node evacuation On the other Compute Nodes the floating IPs are retrieved Floating IP announced with BGP or OSPF Users connect to the new service

Risks:

Node and DC Fencing

SLIDE 11

Page 11

Proposed OpenStack Enhancements

Few

Hitless upgrades – reduce overall platform downtime
Policy driven live/offline migration inclusive of SR-IOV, CPU pinning and Huge pages support
Multi-location awareness & workload placement
Resiliency/Stability testing framework in OpenStack Rally – measure and report
Auto healing framework for OpenStack Controllers
Automated provisioning and monitoring (Ceilometer, Heat and Ironic)
Intelligent workload placement (Nova scheduler)
Global and Local Compute HA management

SLIDE 12

Page 12

Achieving Five Nines of VNF Reliability in Telco-grade OpenStack Cloud

Panel Discussion

Converting PNF to VNF without cloud awareness is not optimal

VNF availability depends on Cloud & VNF resiliency

Proposed OpenStack Enhancements

VNF evolution

NFV Ready Architecture

Proposed OpenStack Enhancements

NFV Ready Architecture

Proposed OpenStack Enhancements

Compute Node HA – Local

Compute Node HA – Global

Proposed OpenStack Enhancements

Thanks! We need to build together.