policy driven fault management for nfv eco system
play

Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) - PowerPoint PPT Presentation

April 2019 Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) akhil.jain@india.nec.com Eric Kao (VMware) ekcs.openstack@gmail..com Definitions Network Function (NF): A functional building block in a network packet


  1. April 2019 Policy-Driven Fault Management for NFV Eco System Akhil Jain (NEC) akhil.jain@india.nec.com Eric Kao (VMware) ekcs.openstack@gmail..com

  2. Definitions ● Network Function (NF): A functional building block in a network ○ packet inspection, CDNs, virus scanner, ... ● Network Function Virtualization (NFV): Realizing NFs as virtual appliances ● Virtual Network Function (VNF): A network function realized as virtual appliances

  3. Fault Management ● Basic fault recovery is standard ● Complexities beyond the stardard cases: ○ Diversity of fault scenarios ○ Diversity of VNFs ○ Each combination may call for a different fault management response

  4. Fault Scenarios ● Sequence of fault signals over time ● Isolated vs widespread ● Existing or predicted ● Fault types ○ Hard failure ○ Stability ○ Degraded performance ● Fault domains ○ Networking, Host, Storage, Application, etc

  5. Context ● Current & anticipated loads ● VNF capacity ● Physical infra capacity ● Example considerations: ○ If load << VNF capacity, ignore certain fault prediction signals ○ If load ~= VNF capacity, preemptively scale-out ■ When physical infra limited, may need to scale-in a less loaded or less critical VNF to make room

  6. VNF characteristics ● Stateful vs stateless ● Monolithic vs microservices ● Interactions, topology, service function chaining ● SLAs ● Business/user impact

  7. Solution: Policy-driven fault management ● Fine-grained monitoring & alarming ○ Monasca, Prometheus, ... ● Rich Context ○ Infra managers: Nova, Kubernetes, … ○ NFV orchestrator: Tacker, ONAP, ... ○ application-level statistics: load, latency, throughput ○ Arbitrary data sources ● Expressive policy framework ○ Congress

  8. webhook action Infra Alarm Services Managers Congress Policy Service data action Contextual Orchestrators Data Fault Management Policies

  9. Congress Architecture ● Data ○ Get data from webhooks and APIs ○ Store data as tables and JSON ● Policy ○ Datalog/SQL rules transform data into decisions ● Action ○ Decisions can trigger API calls

  10. Advantages ● Extensible ○ Arbitrary sources of data as needed by use case ● Expressive ○ Not limited by fixed vocabulary or set of properties ● Declarative ○ Well understood declarative language for expressing clear and manageable policies ○ Avoid procedural code

  11. Example: preemptive scale out policy ● Predictive fault signal ● Possible response: ○ Ignore ■ failure occur ■ instances go down ■ load increases ■ autoscaling policy adjusts ● Drawback: ○ Degraded service for a time

  12. Example: preemptive scale out policy ● Estimate service disruption/degradation ● Preemptively scale out as appropriate ● Minimize risk of degraded service

  13. Example: preemptive scale out policy Alarms on hosts Instances data

  14. Example: preemptive scale out policy Alarms on hosts Instances affected Instances data

  15. Example: preemptive scale out policy Alarms on hosts Instances affected Instances VNFs data affected VNFs data

  16. Example: preemptive scale out policy Alarms on hosts Instances affected Instances VNFs data affected VNFs predicted data load VNFs load data

  17. Example: preemptive scale out policy Alarms on hosts Instances affected Instances VNFs data affected VNFs predicted scale out data load decisions VNFs load data

  18. Example: preemptive scale out policy Alarms on hosts Instances affected Instances data instances_affected (instance_id) :- hosts_alarmed (alarmed_host), nova:servers (server_id=instance_id, host_name=alarmed_host)

  19. Example: preemptive scale out policy predicted scale out load decisions scale_out (vnf_id) :- predicted_VNF_load (vnf_id, predicted_load), predicted_load > 0.9

  20. Demo background ● Demonstrate the interaction between services ○ Setup VNFs with Tacker ○ Configure Congress to receive Monasca webhook ○ Configure Monasca to send webhook ○ Raise Monasca Alarm ○ See result of actions triggered by Congress policy

  21. Summary ● Fault management is complex ○ Diversity of scenarios -> Diversity of response ● Solution ○ Fine-grained monitoring ○ Contextual data ○ Expressive policy ● Congress ○ Pluggable data sources ○ Expressive policy language ○ Triggers API calls

  22. General purpose policy triggers ● Trigger API calls based on policy+data ○ Adv. fault management policies ○ Adv. autoscaling policies ○ Generic integration glue

  23. Feedback welcome! Mailing lists use [congress] prefix openstack-discuss@lists.openstack.org Eric Kao <ekcs.openstack@gmail.com>

  24. Akhil Jain <akhil.jain@india.nec.com> Q&A Eric Kao <ekcs.openstack@gmail.com> Thank you! openstack @OpenStack openstack OpenStackFoundation

  25. Conceptual policy dataflow Alarms Technical Business Data Impact Impact Fault VNFs Biz Topology Mgmt Fault Data Decisions Mgmt Feasibility VNFs & Risks Tech Data

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend