Service Quality Management for multi- domain network services Pavle - - PowerPoint PPT Presentation
Service Quality Management for multi- domain network services Pavle - - PowerPoint PPT Presentation
Service Quality Management for multi- domain network services Pavle Vuleti , AMRES eduPERT videoconference, 20 July 2015 What is Service Quality Management? Resource Performance Management (RPM) provides insight into the network and
2
Connect | Communicate | Collaborate
What is Service Quality Management?
Resource Performance Management (RPM) – provides insight into the network and network element performance and behavior (e.g. status of the interface, the amount of traffic passing through the interface, CPU load or similar) Service Quality Management (SQM) Correlates network measurement data with the service information and gives it a specific meaning. processes related to SLA verification and assurance Used to check and verify key SLA parameters and customers experience. Increasingly important with the development of virtualized environments where multiple service instances share the same physical infrastructure
3
Connect | Communicate | Collaborate
SQM supporting tool – main goals
Support multi-domain, multi-instance (and multi-point) network services. MDVPN is a primary target, but also multi domain circuits or other multi- domain services should be able to use this system. Aim: Monitor service end-to-end, capture user’s experience of the service and verify contractual obligations (SLA) - Service Quality Management - SQM. Tool users: service operators to continuously monitor KPI of the service and knowledgeable service users to have an insight into the SLA verification. Allow dynamic service paths - do not depend on the service path and network element access rights along the path
- Scalability. Monitoring end-to-end m instances of a service, with n end points
each in multiple domains requires in a straightforward solution mn measurement agents. Reduce this number! Simple for use (simple configuration) Accuracy and reliability Always prefer reusing/integrating existing tools if possible than developing new components
4
Connect | Communicate | Collaborate
MDVPN, an example of multi-domain, multi-instance, multi-point service
5
Connect | Communicate | Collaborate
What is specified by the SLA?
Service is offered to the end-user – SLA should capture user’s demands and expected service experience ITU-T Y.1540 and Y.1541 specify in detail the definition of specific performance metrics and KPIs SLA parameters are: packet latency, latency variation (jitter) and packet loss rate (PLR) For different services and applications, there are specific performance metrics that must be guaranteed in order to satisfy service perception. Real-time applications demand guarantees of all three metrics, whereas applications like file transfer and web browsing only need guarantees for PLR. Bandwidth measurements (capacity, available bandwidth, TCP throughput and similar) are not a part of the SLA. Bandwidth measurements are typically used before the service is put into production.
6
Connect | Communicate | Collaborate
SLA models for multi-point services
SLA model depends on the type of the service Point to point Hub and spoke (point to multiple points) Multipoint (mesh SLA) MDVPN offers a very general set of services for end users – all SLA models are applicable for MDVPN services Problem of multipoint services – SLA scalability MEF 10.2 and Y.1563 propose the use of aggregated SLA metrics for multipoint services (e.g. an average value of the delay or jitter of all combinations of paths between service instance end-points). However, even for the aggregated metrics to be properly measured in the multipoint case the number of measurement agents will be equal to the number of end points and the number of measurements on the
- rder of O(n2)
7
Connect | Communicate | Collaborate
SLA verification in multi-domain environments - strategies
End-to-end measurements: assume the use of passive and active methods to measure delay, jitter and packet loss between service end points Problem: If service has m service instances where service instance x has nx end points, the total number of measurement agents is 𝑂↓𝑛𝑏 =∑𝑦=1↑𝑛▒𝑜↓𝑦 Metric composition: described in several standard documents – RFC 5644, RFC 5835, RFC 6049, ITU-T Y.1541 Key measurement parameters are measured in each domain and then end-to-end metrics are estimated from the per-domain measurements Total number of measurement agents in this case can be 𝑂↓𝑛𝑏 =∑𝑦=1↑𝑒▒𝑐↓𝑦 , where bx is the number of cross-border connections of the domain x, and d is the number of domains.
8
Connect | Communicate | Collaborate
Metric composition - issues
this methodology is more scalable, but inherently less accurate especially for jitter measurements [Douardo et. al] because of the several issues like: the measurement of the border link, double measurements on MA links, time synchronization of the measurements, etc. Also there are issues with the exposure of the per-domain data towards the central measurement gathering and calculation device in-instance measurements and changes in service instance topology due to the routing table changes
MA MA MA MA Delay, ¡jitter, ¡loss Delay, ¡jitter, ¡loss Who ¡measures ¡this ¡ link? This ¡link ¡can ¡be ¡ measured ¡twice ¡if ¡ MA ¡is ¡not ¡on ¡the ¡ network ¡element Domain ¡A Domain ¡B What ¡if ¡this ¡is ¡ measured ¡1 ¡min ¡ after ¡Domain ¡A?
9
Connect | Communicate | Collaborate
Measurement methodologies
Active and passive monitoring Delay, jitter have to be measured actively, while packet loss can be inferred from the passive measurements, although using complex methodology and large resources What can be active probes for SLA verification? Separate measurement points (perfSONAR MP, Atlas probes,...) Network element features (Cisco SLA, Juniper RPM) – not compatible MPLS VPNs do not have standardized method for the performance measurement (features like MPLS BFD not compatible between different vendors – incomplete implementations) Recently concluded IETF l3vpn WG aimed to propose standards for MPLS and MPLS VPN performance monitoring, the extension of RFC 6374: draft-zheng-l3vpn-pm-analysis-03 (expired), July 2014. draft-dong-l3vpn-pm-framework-02 (expired), January 2014. draft-ni-l3vpn-pm-bgp-ext-01 (expired), February 13, 2014
10
Connect | Communicate | Collaborate
MPLS and MPLS VPN monitoring challenges
Problem is LSP aggregation, especially when PHP is used Drafts propose the new concept of the "VRF-to-VRF Tunnel" (VT). In this concept, each PE router needs to allocate MPLS labels to identify the VRF-to- VRF tunnel between the local VRF and the remote VRFs (labels are called VT labels). It is likely that the functionality that is being developed is going to be a feature for the PE routers, but it does not exist now.
11
Connect | Communicate | Collaborate
SQM - High level design decisions - 1
Uses standard active measurement architecture, like perfSONAR or IETF LMAP (measurement agent + measurement collector + measurement controller) Measure and monitor only the SLA parameters: delay, jitter, loss No heavyweight, intrusive and unreliable capacity/available bandwidth estimations Relies on reliable and standardized active measurements (owamp): No dependence on the service path and network element access rights along the service instance path Accuracy: Use end-to-end measurements instead of metric composition strategies External devices are needed as there are no interoperable solutions on network elements
12
Connect | Communicate | Collaborate
High level design decisions - 2
Scalability and simplicity: Small/zero footprint measurement agents (SBCs), Measurement results are not collected on agents „One-click“ configuration of measurement agents Multi-homing measurement agents (one measurement agent can serve multiple service instances with overlapping address spaces)
13
Connect | Communicate | Collaborate
SQM architecture and the prototype
Based on IETF LMAP architecture Main new components: Service/SLA inventory SQM processing tool OWAMP based measurement agents (not zero footprint at the moment) Reporting, alarming not a part
- f the short-term goals
controller collector MA MA MA MA MA Resource ¡Performance ¡Management Service ¡Quality ¡Management Service/SLA ¡ Inventory Trouble ¡ticket ¡ system Service ¡quality ¡ reports User ¡Interface Alarm ¡ management ¡ system
14
Connect | Communicate | Collaborate
Service/SLA inventory
Stores the relevant data about service instances (both transport service and MDVPN services) Stores SLA parameters and thresholds Data model used TMF SID as inspiration Based on MDVPN data model for service requests First implementation was the extension of perfSONAR SLS, current version is built from scratch
15
Connect | Communicate | Collaborate
SQM component
Gathers SLA data from inventory Gathers measurement data Makes distinction between measurement data belonging to different service instances Displays SLA data Displays temporal graphs of main SLA parameters
SQM prototype
17
Connect | Communicate | Collaborate
Prototype setup
Service inventory / configuration
19
Connect | Communicate | Collaborate
SLA/Service inventory front page
20
Connect | Communicate | Collaborate
Creating new service instances
21
Connect | Communicate | Collaborate
List of NRENs subscribed to the service
22
Connect | Communicate | Collaborate
Adding new Measurement Agent
23
Connect | Communicate | Collaborate
Adding new PE Router
24
Connect | Communicate | Collaborate
Configuring SLA, config files for service instances
25
Connect | Communicate | Collaborate
26
Connect | Communicate | Collaborate
Adding NREN contacts
SQM monitoring
28
Connect | Communicate | Collaborate
SQM monitoring – front page
29
Connect | Communicate | Collaborate
L2VPN
30
Connect | Communicate | Collaborate
L3VPN
31
Connect | Communicate | Collaborate
loss
32
Connect | Communicate | Collaborate
delay
33
Connect | Communicate | Collaborate
jitter
34
Connect | Communicate | Collaborate
Conclusions
SQM system is capable to monitor any network service from within the service instance. The system is not dependent on the underlying network technology and network topology It is more scalable than the existing platforms for monitoring netswork service instances The architecture of the system is similar (or the same) as perfSONAR, RIPE ATLAS or LMAP – not much sense to have two or more so similar systems in parallel Potential solution: Add SQM capabilities to perfSONAR. Changes to the Esmond archive should not be so big, but larger changes are required on the measurement agent/controller side.
35
Connect | Communicate | Collaborate
www.geant.net
www.twitter.com/GEANTnews | www.facebook.com/GEANTnetwork | www.youtube.com/GEANTtv
Connect | Communicate | Collaborate