Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) - PowerPoint PPT Presentation

April 2019 Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) akhil.jain@india.nec.com Eric Kao (VMware) ekcs.openstack@gmail..com

Definitions ● Network Function (NF): A functional building block in a network ○ packet inspection, CDNs, virus scanner, ... ● Network Function Virtualization (NFV): Realizing NFs as virtual appliances ● Virtual Network Function (VNF): A network function realized as virtual appliances

Fault Management ● Basic fault recovery is standard ● Complexities beyond the stardard cases: ○ Diversity of fault scenarios ○ Diversity of VNFs ○ Each combination may call for a different fault management response

Fault Scenarios ● Sequence of fault signals over time ● Isolated vs widespread ● Existing or predicted ● Fault types ○ Hard failure ○ Stability ○ Degraded performance ● Fault domains ○ Networking, Host, Storage, Application, etc

Context ● Current & anticipated loads ● VNF capacity ● Physical infra capacity ● Example considerations: ○ If load << VNF capacity, ignore certain fault prediction signals ○ If load ~= VNF capacity, preemptively scale-out ■ When physical infra limited, may need to scale-in a less loaded or less critical VNF to make room

VNF characteristics ● Stateful vs stateless ● Monolithic vs microservices ● Interactions, topology, service function chaining ● SLAs ● Business/user impact

Solution: Policy-driven fault management ● Fine-grained monitoring & alarming ○ Monasca, Prometheus, ... ● Rich Context ○ Infra managers: Nova, Kubernetes, … ○ NFV orchestrator: Tacker, ONAP, ... ○ application-level statistics: load, latency, throughput ○ Arbitrary data sources ● Expressive policy framework ○ Congress

webhook action Infra Alarm Services Managers Congress Policy Service data action Contextual Orchestrators Data Fault Management Policies

Congress Architecture ● Data ○ Get data from webhooks and APIs ○ Store data as tables and JSON ● Policy ○ Datalog/SQL rules transform data into decisions ● Action ○ Decisions can trigger API calls

Advantages ● Extensible ○ Arbitrary sources of data as needed by use case ● Expressive ○ Not limited by fixed vocabulary or set of properties ● Declarative ○ Well understood declarative language for expressing clear and manageable policies ○ Avoid procedural code

Example: preemptive scale out policy ● Predictive fault signal ● Possible response: ○ Ignore ■ failure occur ■ instances go down ■ load increases ■ autoscaling policy adjusts ● Drawback: ○ Degraded service for a time

Example: preemptive scale out policy ● Estimate service disruption/degradation ● Preemptively scale out as appropriate ● Minimize risk of degraded service

Example: preemptive scale out policy Alarms on hosts Instances data

Example: preemptive scale out policy Alarms on hosts Instances affected Instances data

Example: preemptive scale out policy Alarms on hosts Instances affected Instances VNFs data affected VNFs data

Example: preemptive scale out policy Alarms on hosts Instances affected Instances VNFs data affected VNFs predicted data load VNFs load data

Example: preemptive scale out policy Alarms on hosts Instances affected Instances VNFs data affected VNFs predicted scale out data load decisions VNFs load data

Example: preemptive scale out policy Alarms on hosts Instances affected Instances data instances_affected (instance_id) :- hosts_alarmed (alarmed_host), nova:servers (server_id=instance_id, host_name=alarmed_host)

Example: preemptive scale out policy predicted scale out load decisions scale_out (vnf_id) :- predicted_VNF_load (vnf_id, predicted_load), predicted_load > 0.9

Demo background ● Demonstrate the interaction between services ○ Setup VNFs with Tacker ○ Configure Congress to receive Monasca webhook ○ Configure Monasca to send webhook ○ Raise Monasca Alarm ○ See result of actions triggered by Congress policy

Summary ● Fault management is complex ○ Diversity of scenarios -> Diversity of response ● Solution ○ Fine-grained monitoring ○ Contextual data ○ Expressive policy ● Congress ○ Pluggable data sources ○ Expressive policy language ○ Triggers API calls

General purpose policy triggers ● Trigger API calls based on policy+data ○ Adv. fault management policies ○ Adv. autoscaling policies ○ Generic integration glue

Feedback welcome! Mailing lists use [congress] prefix openstack-discuss@lists.openstack.org Eric Kao <ekcs.openstack@gmail.com>

Akhil Jain <akhil.jain@india.nec.com> Q&A Eric Kao <ekcs.openstack@gmail.com> Thank you! openstack @OpenStack openstack OpenStackFoundation

Conceptual policy dataflow Alarms Technical Business Data Impact Impact Fault VNFs Biz Topology Mgmt Fault Data Decisions Mgmt Feasibility VNFs & Risks Tech Data

Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) - PowerPoint PPT Presentation

April 2019 Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) akhil.jain@india.nec.com Eric Kao (VMware) ekcs.openstack@gmail..com Definitions Network Function (NF): A functional building block in a network packet

CI/CD in ETSI NFV environment PIERRE LYNCH (IXIA/KEYSIGHT), GERGELY CSATARI (NOKIA) ETSI NFV

NFV glavado@whitestack.com Whitestack Jose Miguel Guzmn jmguzman@whitestack.com Whitestack

Distributed NFV January 2014 Presented by Yuri Gittik Head of Strategic Developments and

NetBricks: Taking the V out of NFV Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia

S-NFV: Securing NFV states by using SGX Ming-Wei Shih Mohan Kumar Taesoo Kim Ada

Crossing the SDN/NFV Deployment Chasm Initiation -> Ideation -> Implementation? NFV white

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Container service chaining Martin ual INTRO AGENDA ETSI NFV MANO IETF SFC

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Automation + Machine Learning = Hands Free NFV A Word On Automation through ML for Openstack

Vehicular Technology and OpenStack NFV A Word On The Edge Cloud PRAKASH RAMCHANDRAN

ETSI NFV Specs Requirements vs OpenStack Reality FRANK ZDARSKY, GERGELY CSATARI NFV? Network

SDN & NFV REAL LIFE EXPERIENCE WHAT THE OPERATORS WANT WHY SDN & NFV Agility and

CoCo: Compact and Optimized Consolidation of Modularized Service Function Chains in NFV Zili Meng

GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin

Microboxes: High Performance NFV with Customizable, Asynchronous TCP Stacks and Dynamic

COMP9313: Big Data Management Introduction to MapReduce and Spark Motivation of MapReduce

Hybrid Fault-Tolerant Consensus in Asynchronous and Wireless Embedded Systems 22nd International

Model Checking of Fault-Tolerant Distributed Algorithms Igor Konnov joint work with Annu

Distributed Real-Time Fault Tolerance on a Virtualized Multi-Core System Eric Missimer*, Richard

Overview Motivation ECE 753: FAULT-TOLERANT About the Course and the Instructor

Defect Prevention and Removal SE 350 Software Process & Product Quality 1 Objectives

Checkin with one Page Replacement condition variable Local vs Global replacement

WP4 Fabric Management 3 rd EU Review Maite Barroso - CERN Maite.Barroso.Lopez@cern.ch DataGrid

Sambuz

Useful Links

Newsletter

Mail Us

Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) - PowerPoint PPT Presentation

April 2019 Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) akhil.jain@india.nec.com Eric Kao (VMware) ekcs.openstack@gmail..com Definitions Network Function (NF): A functional building block in a network packet

CI/CD in ETSI NFV environment PIERRE LYNCH (IXIA/KEYSIGHT), GERGELY CSATARI (NOKIA) ETSI NFV

NFV glavado@whitestack.com Whitestack Jose Miguel Guzmn jmguzman@whitestack.com Whitestack

Distributed NFV January 2014 Presented by Yuri Gittik Head of Strategic Developments and

NetBricks: Taking the V out of NFV Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia

S-NFV: Securing NFV states by using SGX Ming-Wei Shih Mohan Kumar Taesoo Kim Ada

Crossing the SDN/NFV Deployment Chasm Initiation -&gt; Ideation -&gt; Implementation? NFV white

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Container service chaining Martin ual INTRO AGENDA ETSI NFV MANO IETF SFC

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Automation + Machine Learning = Hands Free NFV A Word On Automation through ML for Openstack

Vehicular Technology and OpenStack NFV A Word On The Edge Cloud PRAKASH RAMCHANDRAN

ETSI NFV Specs Requirements vs OpenStack Reality FRANK ZDARSKY, GERGELY CSATARI NFV? Network

SDN &amp; NFV REAL LIFE EXPERIENCE WHAT THE OPERATORS WANT WHY SDN &amp; NFV Agility and

CoCo: Compact and Optimized Consolidation of Modularized Service Function Chains in NFV Zili Meng

GEN: A GPU-Accelerated Elastic Framework for NFV Zhilong Zheng Jun Bi Chen Sun Heng Yu Hongxin

Microboxes: High Performance NFV with Customizable, Asynchronous TCP Stacks and Dynamic

COMP9313: Big Data Management Introduction to MapReduce and Spark Motivation of MapReduce

Hybrid Fault-Tolerant Consensus in Asynchronous and Wireless Embedded Systems 22nd International

Model Checking of Fault-Tolerant Distributed Algorithms Igor Konnov joint work with Annu

Distributed Real-Time Fault Tolerance on a Virtualized Multi-Core System Eric Missimer*, Richard

Overview Motivation ECE 753: FAULT-TOLERANT About the Course and the Instructor

Defect Prevention and Removal SE 350 Software Process &amp; Product Quality 1 Objectives

Checkin with one Page Replacement condition variable Local vs Global replacement

WP4 Fabric Management 3 rd EU Review Maite Barroso - CERN Maite.Barroso.Lopez@cern.ch DataGrid

Sambuz

Useful Links

Newsletter

Mail Us

Crossing the SDN/NFV Deployment Chasm Initiation -> Ideation -> Implementation? NFV white

SDN & NFV REAL LIFE EXPERIENCE WHAT THE OPERATORS WANT WHY SDN & NFV Agility and

Defect Prevention and Removal SE 350 Software Process & Product Quality 1 Objectives