Service Quality Management for multi- domain network services Pavle - - PowerPoint PPT Presentation

service quality management for multi domain network
SMART_READER_LITE
LIVE PREVIEW

Service Quality Management for multi- domain network services Pavle - - PowerPoint PPT Presentation

Service Quality Management for multi- domain network services Pavle Vuleti , AMRES eduPERT videoconference, 20 July 2015 What is Service Quality Management? Resource Performance Management (RPM) provides insight into the network and


slide-1
SLIDE 1

Service Quality Management for multi- domain network services

Pavle Vuletić, AMRES eduPERT videoconference, 20 July 2015

slide-2
SLIDE 2

2

Connect | Communicate | Collaborate

What is Service Quality Management?

Resource Performance Management (RPM) – provides insight into the network and network element performance and behavior (e.g. status of the interface, the amount of traffic passing through the interface, CPU load or similar) Service Quality Management (SQM) Correlates network measurement data with the service information and gives it a specific meaning. processes related to SLA verification and assurance Used to check and verify key SLA parameters and customers experience. Increasingly important with the development of virtualized environments where multiple service instances share the same physical infrastructure

slide-3
SLIDE 3

3

Connect | Communicate | Collaborate

SQM supporting tool – main goals

Support multi-domain, multi-instance (and multi-point) network services. MDVPN is a primary target, but also multi domain circuits or other multi- domain services should be able to use this system. Aim: Monitor service end-to-end, capture user’s experience of the service and verify contractual obligations (SLA) - Service Quality Management - SQM. Tool users: service operators to continuously monitor KPI of the service and knowledgeable service users to have an insight into the SLA verification. Allow dynamic service paths - do not depend on the service path and network element access rights along the path

  • Scalability. Monitoring end-to-end m instances of a service, with n end points

each in multiple domains requires in a straightforward solution mn measurement agents. Reduce this number! Simple for use (simple configuration) Accuracy and reliability Always prefer reusing/integrating existing tools if possible than developing new components

slide-4
SLIDE 4

4

Connect | Communicate | Collaborate

MDVPN, an example of multi-domain, multi-instance, multi-point service

slide-5
SLIDE 5

5

Connect | Communicate | Collaborate

What is specified by the SLA?

Service is offered to the end-user – SLA should capture user’s demands and expected service experience ITU-T Y.1540 and Y.1541 specify in detail the definition of specific performance metrics and KPIs SLA parameters are: packet latency, latency variation (jitter) and packet loss rate (PLR) For different services and applications, there are specific performance metrics that must be guaranteed in order to satisfy service perception. Real-time applications demand guarantees of all three metrics, whereas applications like file transfer and web browsing only need guarantees for PLR. Bandwidth measurements (capacity, available bandwidth, TCP throughput and similar) are not a part of the SLA. Bandwidth measurements are typically used before the service is put into production.

slide-6
SLIDE 6

6

Connect | Communicate | Collaborate

SLA models for multi-point services

SLA model depends on the type of the service Point to point Hub and spoke (point to multiple points) Multipoint (mesh SLA) MDVPN offers a very general set of services for end users – all SLA models are applicable for MDVPN services Problem of multipoint services – SLA scalability MEF 10.2 and Y.1563 propose the use of aggregated SLA metrics for multipoint services (e.g. an average value of the delay or jitter of all combinations of paths between service instance end-points). However, even for the aggregated metrics to be properly measured in the multipoint case the number of measurement agents will be equal to the number of end points and the number of measurements on the

  • rder of O(n2)
slide-7
SLIDE 7

7

Connect | Communicate | Collaborate

SLA verification in multi-domain environments - strategies

End-to-end measurements: assume the use of passive and active methods to measure delay, jitter and packet loss between service end points Problem: If service has m service instances where service instance x has nx end points, the total number of measurement agents is ​ 𝑂↓𝑛𝑏 =∑𝑦=1↑𝑛▒​𝑜↓𝑦 Metric composition: described in several standard documents – RFC 5644, RFC 5835, RFC 6049, ITU-T Y.1541 Key measurement parameters are measured in each domain and then end-to-end metrics are estimated from the per-domain measurements Total number of measurement agents in this case can be ​𝑂↓𝑛𝑏 =∑𝑦=1↑𝑒▒​𝑐↓𝑦 , where bx is the number of cross-border connections of the domain x, and d is the number of domains.

slide-8
SLIDE 8

8

Connect | Communicate | Collaborate

Metric composition - issues

this methodology is more scalable, but inherently less accurate especially for jitter measurements [Douardo et. al] because of the several issues like: the measurement of the border link, double measurements on MA links, time synchronization of the measurements, etc. Also there are issues with the exposure of the per-domain data towards the central measurement gathering and calculation device in-instance measurements and changes in service instance topology due to the routing table changes

MA MA MA MA Delay, ¡jitter, ¡loss Delay, ¡jitter, ¡loss Who ¡measures ¡this ¡ link? This ¡link ¡can ¡be ¡ measured ¡twice ¡if ¡ MA ¡is ¡not ¡on ¡the ¡ network ¡element Domain ¡A Domain ¡B What ¡if ¡this ¡is ¡ measured ¡1 ¡min ¡ after ¡Domain ¡A?

slide-9
SLIDE 9

9

Connect | Communicate | Collaborate

Measurement methodologies

Active and passive monitoring Delay, jitter have to be measured actively, while packet loss can be inferred from the passive measurements, although using complex methodology and large resources What can be active probes for SLA verification? Separate measurement points (perfSONAR MP, Atlas probes,...) Network element features (Cisco SLA, Juniper RPM) – not compatible MPLS VPNs do not have standardized method for the performance measurement (features like MPLS BFD not compatible between different vendors – incomplete implementations) Recently concluded IETF l3vpn WG aimed to propose standards for MPLS and MPLS VPN performance monitoring, the extension of RFC 6374: draft-zheng-l3vpn-pm-analysis-03 (expired), July 2014. draft-dong-l3vpn-pm-framework-02 (expired), January 2014. draft-ni-l3vpn-pm-bgp-ext-01 (expired), February 13, 2014

slide-10
SLIDE 10

10

Connect | Communicate | Collaborate

MPLS and MPLS VPN monitoring challenges

Problem is LSP aggregation, especially when PHP is used Drafts propose the new concept of the "VRF-to-VRF Tunnel" (VT). In this concept, each PE router needs to allocate MPLS labels to identify the VRF-to- VRF tunnel between the local VRF and the remote VRFs (labels are called VT labels). It is likely that the functionality that is being developed is going to be a feature for the PE routers, but it does not exist now.

slide-11
SLIDE 11

11

Connect | Communicate | Collaborate

SQM - High level design decisions - 1

Uses standard active measurement architecture, like perfSONAR or IETF LMAP (measurement agent + measurement collector + measurement controller) Measure and monitor only the SLA parameters: delay, jitter, loss No heavyweight, intrusive and unreliable capacity/available bandwidth estimations Relies on reliable and standardized active measurements (owamp): No dependence on the service path and network element access rights along the service instance path Accuracy: Use end-to-end measurements instead of metric composition strategies External devices are needed as there are no interoperable solutions on network elements

slide-12
SLIDE 12

12

Connect | Communicate | Collaborate

High level design decisions - 2

Scalability and simplicity: Small/zero footprint measurement agents (SBCs), Measurement results are not collected on agents „One-click“ configuration of measurement agents Multi-homing measurement agents (one measurement agent can serve multiple service instances with overlapping address spaces)

slide-13
SLIDE 13

13

Connect | Communicate | Collaborate

SQM architecture and the prototype

Based on IETF LMAP architecture Main new components: Service/SLA inventory SQM processing tool OWAMP based measurement agents (not zero footprint at the moment) Reporting, alarming not a part

  • f the short-term goals

controller collector MA MA MA MA MA Resource ¡Performance ¡Management Service ¡Quality ¡Management Service/SLA ¡ Inventory Trouble ¡ticket ¡ system Service ¡quality ¡ reports User ¡Interface Alarm ¡ management ¡ system

slide-14
SLIDE 14

14

Connect | Communicate | Collaborate

Service/SLA inventory

Stores the relevant data about service instances (both transport service and MDVPN services) Stores SLA parameters and thresholds Data model used TMF SID as inspiration Based on MDVPN data model for service requests First implementation was the extension of perfSONAR SLS, current version is built from scratch

slide-15
SLIDE 15

15

Connect | Communicate | Collaborate

SQM component

Gathers SLA data from inventory Gathers measurement data Makes distinction between measurement data belonging to different service instances Displays SLA data Displays temporal graphs of main SLA parameters

slide-16
SLIDE 16

SQM prototype

slide-17
SLIDE 17

17

Connect | Communicate | Collaborate

Prototype setup

slide-18
SLIDE 18

Service inventory / configuration

slide-19
SLIDE 19

19

Connect | Communicate | Collaborate

SLA/Service inventory front page

slide-20
SLIDE 20

20

Connect | Communicate | Collaborate

Creating new service instances

slide-21
SLIDE 21

21

Connect | Communicate | Collaborate

List of NRENs subscribed to the service

slide-22
SLIDE 22

22

Connect | Communicate | Collaborate

Adding new Measurement Agent

slide-23
SLIDE 23

23

Connect | Communicate | Collaborate

Adding new PE Router

slide-24
SLIDE 24

24

Connect | Communicate | Collaborate

Configuring SLA, config files for service instances

slide-25
SLIDE 25

25

Connect | Communicate | Collaborate

slide-26
SLIDE 26

26

Connect | Communicate | Collaborate

Adding NREN contacts

slide-27
SLIDE 27

SQM monitoring

slide-28
SLIDE 28

28

Connect | Communicate | Collaborate

SQM monitoring – front page

slide-29
SLIDE 29

29

Connect | Communicate | Collaborate

L2VPN

slide-30
SLIDE 30

30

Connect | Communicate | Collaborate

L3VPN

slide-31
SLIDE 31

31

Connect | Communicate | Collaborate

loss

slide-32
SLIDE 32

32

Connect | Communicate | Collaborate

delay

slide-33
SLIDE 33

33

Connect | Communicate | Collaborate

jitter

slide-34
SLIDE 34

34

Connect | Communicate | Collaborate

Conclusions

SQM system is capable to monitor any network service from within the service instance. The system is not dependent on the underlying network technology and network topology It is more scalable than the existing platforms for monitoring netswork service instances The architecture of the system is similar (or the same) as perfSONAR, RIPE ATLAS or LMAP – not much sense to have two or more so similar systems in parallel Potential solution: Add SQM capabilities to perfSONAR. Changes to the Esmond archive should not be so big, but larger changes are required on the measurement agent/controller side.

slide-35
SLIDE 35

35

Connect | Communicate | Collaborate

www.geant.net

www.twitter.com/GEANTnews | www.facebook.com/GEANTnetwork | www.youtube.com/GEANTtv

Connect | Communicate | Collaborate

Thank you!