Ethernet OAM Victor Olifer (JANET/GEANT JRA1 Task 1) JRA1/TERENA - - PowerPoint PPT Presentation

ethernet oam
SMART_READER_LITE
LIVE PREVIEW

Ethernet OAM Victor Olifer (JANET/GEANT JRA1 Task 1) JRA1/TERENA - - PowerPoint PPT Presentation

Ethernet OAM Victor Olifer (JANET/GEANT JRA1 Task 1) JRA1/TERENA workshop, Copenhagen, 20 November 2012 connect communicate collaborate 1 Agenda Ethernet Service Assurance & Monitoring overview Monitoring standards Service


slide-1
SLIDE 1

connect • communicate • collaborate

Ethernet OAM

Victor Olifer (JANET/GEANT JRA1 Task 1) JRA1/TERENA workshop, Copenhagen, 20 November 2012

1

slide-2
SLIDE 2

connect • communicate • collaborate

Agenda

Ethernet Service Assurance & Monitoring overview

 Monitoring standards  Service assurance standards

Service assurance lab trials CFM/Y.1731 trial

 Multi-domain testbed  OAM agent boxes  CyPortal

JRA1 & JRA2 trial (Year 4 extension)

 Multi-segment connections  Diverse equipment  perfSONAR extensions

2

slide-3
SLIDE 3

connect • communicate • collaborate

Wide-area point-to-point Ethernet connections

Where we can find such connections?

  • GEANT Plus, JANET Lightpath: demand is from big projects, large scientific centres
  • Inter-router connections
  • An offer from commercial providers: they had 20% revenue growth in 2010 over 2009. Mobile

backhaul and multi-site corporates are major users; the reasons – price and flexibility

  • New demand for academic providers might arise from such areas as cloud services, data centres,

HD videoconferences, multi-site university connections

Multi-segment multi-domain connection with:

  • Ethernet UNI (a must);
  • segments of pure Ethernet (optional);
  • segments where Ethernet is tunneled over some other technology,

e.g TDM (SDH, OTN) or MPLS (optional) Ethernet Ethernet over MPLS Ethernet over Transport 3

slide-4
SLIDE 4

connect • communicate • collaborate

Problems with managing Ethernet connections

Until recently Ethernet had no OAM tools (hence cheapest equipment) -> no way to check, monitor and troubleshoot connectivity and performance end-to-end ( a customer view) or within a domain (a provider view). E.g. comparing to IP experience: No ping, traceroute and ICMP diagnostic messages available. Partial solution: we can use MPLS or SDH/OTN OAM to manage tunnels Good news: Ethernet OAM functions started being developed and implemented in equipment since 2007-8 Bad news: We (JANET) don’t have much experience in Ethernet OAM use. The same situation in other NRENs (as far as I know from GEANT3 participants).

4

slide-5
SLIDE 5

connect • communicate • collaborate

Three areas of emerging Ethernet OAM standards

Service assurance

  • Checks whether a connection performs to its specs,

e.g. up to CIR and EIR, after service configuration and activation.

Service monitoring

  • Periodic checks of connection connectivity (continuity)

and performance (delay, loss, throughput, availability)

Service Service trouble shooting

  • When monitoring shows a fault one needs to locate a

faulty point along a path and possible reason(s) of a failure

5

slide-6
SLIDE 6

connect • communicate • collaborate

Service Assurance (1)

  • 1. Service definitions

(topology: e.g. point-to-point, bandwidth profile: CIR, EIR for several CoS):

  • MEF 10.2
  • ITU-T G.8011

Very important as it is often a cause of confusions: e.g. CIR might be measured for UDP payload or Ethernet frames – very different figures for the same data flow

  • 2. Service performance parameters (delay, loss,

throughput, availability):

  • MEF 10.2.1
  • Y.1563

6

slide-7
SLIDE 7

connect • communicate • collaborate

Service Assurance (2)

3.Service Verification

Relatively new (Summer 2011) ITU-T spec Y.1564 “Ethernet service activation test methodology”

  • Defines a simple disruptive on-demand procedure

that tests connectivity and throughput up to CIR & EIR & policing limit by injecting traffic into a connection

  • More suitable for Ethernet than complex and IP-centric RFC2544;

implemented in many traffic generators and boxes

7

slide-8
SLIDE 8

connect • communicate • collaborate

Service Assurance trials

JANET lab trial of SunRise RxT tester

Positive impression, works according the standard, looks worth to try in wide-area tests

CIR Box PIR=CIR+EIR Tester PIR

Just one problem: Y.1564 doesn't’t give an opportunity to detect the situation when real PIR value set up lower than expected (not box bug, just the standard intention)

8

slide-9
SLIDE 9

connect • communicate • collaborate

Service Monitoring

Customer maintenance session level 7 Service provider maintenance session level 5 Operator maintenance sessions level 3

  • IEEE 802.1ag Connectivity Fault Management (CFM) (ratified in 2007):
  • Hierarchical sessions of heartbeat messages

(Continuity Check Messages, CCM) -> up/down status check

  • VLAN-aware
  • MEP (End) and MIP (Intermediate) maintenance points
  • ITU-T Y.1731 (ratified in 2008):

Same as CFM + Performance monitoring (delay, loss, throughput)

9

slide-10
SLIDE 10

connect • communicate • collaborate

Service Troubleshooting

  • CFM:
  • Linktrace (analogy of IP traceroute)
  • Loopback (analogy of IP ping)
  • RDI (Remote Defect Indication)
  • Y.1731:
  • same as CFM + a richer set of diagnostic messages + performance

monitoring (loss, delay, throughput):

  • Alarm Indication Signal (AIS)
  • Lock Signal

10

slide-11
SLIDE 11

connect • communicate • collaborate

Service monitoring trials

JRA1 Task 1 Ethernet OAM trial (2011):

  • 5 NRENs, 5 connections under 6 months monitoring
  • Small Y.1731 agent boxes from Overture
  • CyPortal from Cyan Optics for storing and visualising of monitoring data

Positive results but only for single-segment connections Combined JRA1 Task 1& JRA2 Task 3 Service Assurance & Monitoring trial GN3 Year 4 (2012-2013) - ongoing

11

slide-12
SLIDE 12

connect • communicate • collaborate

JRA 1 Ethernet OAM trial (2011)

  • bjectives

NORDUnet

Essex Uni

JANET LH

Cloud service Cyan OAM portal

SURFnet CESNET PIONIER (PSNC)

OAM Data from Collector

Equipment under test OAM agent (Overture ISG24)

Collector

  • Test CFM/Y.1731 functions in multi-domain and multi-vendor environment

(5 connections)

  • Evaluate Y.1731 agent boxes
  • Evaluate OAM data visualisation system (CyPortal)

Monitored VLAN connections

12

slide-13
SLIDE 13

connect • communicate • collaborate

OAM agent options

Dedicated extra network switch with advanced OAM capabilities OAM capabilities of existing network boxes: routers, switches, muxes Software OAM agent on a dedicated server (e.g. ‘dot1ag-utils’ developed by SARA and presented by Ronald van der Pol at NORDUnet 2011)

  • Pros: uniform, rich OAM functionality, and consistent source of monitoring data
  • Cons: extra boxes overheads (adds complexity, cost – especially for high speed

links, maintenance etc)

  • Pros: no extra equipment, ability to test internal segments
  • Cons: some vendor-specific features, e.g. in CFM MIBs – diverse environment

with possible incompatibilities

  • Pros: end users can ping and trace network elements; no switches needed
  • Cons: currently limited to MEP down functionality, performance depends on

a server performance, time precision might be an issue

13

slide-14
SLIDE 14

connect • communicate • collaborate

ISG24 OAM agent box trial

  • Compact 4 port GE demarcation box, low cost (~ $1000)
  • 2 copper GE and 2 SFP ports (there is 10GE version)
  • Web GUI
  • OAM functions:

 CFM  Y.1731 D(elay)MM and L(oss)M  RFC 2544  PAA – proprietary analogy of Y.1731  Ethernet First Mile 802.2ag

14

slide-15
SLIDE 15

connect • communicate • collaborate

ISG24 CCM (continuity) tests

  • Positive results – properly detected the Up/Down state of all 5 connections

by permanent monitoring over 6 months

  • Compact

web form

  • Detailed web form

15

slide-16
SLIDE 16

connect • communicate • collaborate

ISG 24 DMM (performance) tests

  • Mostly positive results – CFM and PAA Delay Measurement sessions

showed stable and close to expected (from other sources) One Way and Two Ways delays and jitter results

  • We experienced some problems with CFM One Way delay measurements
  • n two connections – will talk later after CyPortal slides

Janet – NORDUnet PAA results: PSNC– CESNET CFM DMM results:

16

slide-17
SLIDE 17

connect • communicate • collaborate

CyPortal: monitoring data storage and visualisation

  • Detailed monitoring data are collected from ISG24 agent boxes and stored

in a cloud-based database

  • Web GUI provides a map of all services;

in red those which current parameters violate SLD

17

slide-18
SLIDE 18

connect • communicate • collaborate

CyPortal: Per- service data

  • Historical graphical presentation of all parameters under monitoring
  • Zooming of a selected time period
  • Setting of SLA limits
  • Flexible reports

18

slide-19
SLIDE 19

connect • communicate • collaborate

Problems encountered

  • 1. Saw-tooth shape of delay between JANET LH and Essex Uni

Level 3 DDM session Level 5 DDM session

  • There was no reason for saw-tooth

shape of Two Way Delay with peaks

  • f about 1 sec showed by MEP Level

5 (ISG24 box)

  • Capturing

and analyzing traffic before and after MEP Level 3 (Ciena 311v box) showed the ‘guilty’ box:

  • MEP Level 3 time-stamped packets
  • f MEP Level 5 instead of their

transparent forwarding – definitely a bug in a box software

19

slide-20
SLIDE 20

connect • communicate • collaborate

Problems encountered (cont.)

  • 2. Inability of ISG boxes to measure CFM One Way Delay on some

connections (LH-Copenhagen, LH-Essex)

PAA: OAD = 10. 903 TWD = 23,004 CFM DMM: OAD = ---- TWD = 23,004 ISG vendor version: too poor synchronization to calculate CFM OWD Seems not to be true: why it is enough for proprietary PAA Needs further investigation !

20

slide-21
SLIDE 21

connect • communicate • collaborate

JRA1 Ethernet OAM trial conclusions

  • Ethernet OAM functions embedded in the carrier grade Ethernet

equipment are mature enough to be used for effective monitoring of health and performance of wide-area Ethernet services from a customer and provider perspectives

  • The use of dedicated Ethernet demarcation boxes with a rich set of

OAM functions (Overture ISG and Accedian MetroNID) proved to be an effective way for monitoring Ethernet services on the end-to-end basis

  • Visualization and data store software like CyPortal is a very useful

element for providing managed Ethernet services

  • We managed to monitor only single-segment connections on the end-

to-end basis – still more to try

21

slide-22
SLIDE 22

connect • communicate • collaborate

Year 4 JRA1/JRA2 Service Assurance & Monitoring trial

  • To carry on the previous trial with extending of an investigation for:
  • multi-segment connections with hierarchical monitoring
  • troubleshooting
  • use of embedded CFM/Y.1731 function in carrier class equipment

(such as Cisco, Juniper, Extreme, Brocade, Alcatel etc)

  • To support new Ethernet OAM functionality in perfSONAR software:
  • perfSONAR protocol and topology extensions to support Eth OAM data

(data storing, searching and fetching)

  • use of the existing GN3 perfSONAR implementation (perfSONAR MDM) with

needed changes

  • standardization under the OGF NMC/NM umbrella

Trial objectives:

Trail term – 1 year, the end in March 2013

22

slide-23
SLIDE 23

connect • communicate • collaborate

GEYSERS

NORDUnet testbed Bristol Uni JANET LH SURFnet CESNET PSNC

Collector

TNO (NL) NORDUnet core SARA

  • Y.1731 agent box (ISG24 from Overture )
  • Y.1731 enabled equipment of the trial participants
  • non-Y.1731 enabled equipment of the trial participants

1000

GN3 Year 4 testbed

23

slide-24
SLIDE 24

connect • communicate • collaborate

Multi-segment Janet – NORDUnet service

Janet ISG24 193.63.63.133 NORDUnet ISG24 109.105.113.183 Janet testbed Ciena 5305 NORDUnet testbed ALU 1850 TSS Customer, level 5, MA/MEG=“jan-nor-400” Provider, level 4,MA/MEG=“jan-nor-400-4” – doesn’t exist yet 134 314 144 344 Operator, level 2,MA/MEG=“nor-400-2” – doesn’t exist yet Operator, level 2, MA/MEG=“janet-400-2” 1152 1153 1101 1102 Inter-node, level 0, MA/MEG=“isg-ciena-400-0” Inter-operator, level 0, MA/MEG=“jan-nor-400-0” doesn’t exist yet 1 3 Inter-node, level 0, MA/MEG=“tss-isg” 1 2

slide-25
SLIDE 25

connect • communicate • collaborate

Multi-segment tests

25

  • Evaluation of different hierarchical schemes:
  • Shared levels (same VLAN ID for domains)
  • Independent levels (C-VID, S-VID)
  • Testing different ways of visualizing of the hierarchical monitoring

information for different types of users – NOC engineers, end users.

  • Location of a failure by:
  • using a hierarchy of CCM sessions;
  • using Linktrace protocol and MIPs

Different types of faults should be emulated:

  • Link fauilre
  • Port failure
  • Route Loops
  • VLAN mismatch
slide-26
SLIDE 26

connect • communicate • collaborate

Year 4 trial team

  • JRA2 Task 3:
  • Roman Lapacz , PSNC
  • Jakub Gutkowski, PSNC
  • Freek Dijkstra, SARA
  • Ronald van der Pol, SARA
  • Richa Malhotra, SURFnet
  • Borgert van der Kluit, TNO
  • Rob Smets, TNO
  • Piotr Zuraniewski, TNO
  • Otto Baijer, TNO
  • JRA1 Task 1:
  • Alberto Colmenero, NORDUnet
  • Victor Olifer, Janet
  • Marcin Garstka, PSNC,
  • Jan Radil, CESNET
  • Michal Hazlinsky, CESNET
  • Mayur Channegowda , Essex Uni

26

slide-27
SLIDE 27

connect • communicate • collaborate

Questions?

27

slide-28
SLIDE 28

connect • communicate • collaborate

Year 3 Partner testbed example - PSNC testbed

28