Failure Localization in All-Optical Networks Jnos Tapolcai - - PowerPoint PPT Presentation

failure localization in all optical networks
SMART_READER_LITE
LIVE PREVIEW

Failure Localization in All-Optical Networks Jnos Tapolcai - - PowerPoint PPT Presentation

Failure Localization in All-Optical Networks Jnos Tapolcai Budapest University of Technology and Economics 1 Motivation The goal is to provide fast link failure (cable cuts) localization in All-Optical Networks Link monitoring a


slide-1
SLIDE 1

Failure Localization in All-Optical Networks

János Tapolcai Budapest University of Technology and Economics

1

slide-2
SLIDE 2
  • The goal is to provide fast link failure (cable cuts)

localization in All-Optical Networks

  • Link monitoring
  • a naive solution by having an active alarm for each link
  • the number of monitors is |E|
  • Alarm storm due to multi-hop lightpaths and multi-layer

networks

Motivation

2 STTL SNFC CHCG NYCM LSAN LSVG SLKC DNVR KSCY TULS CLEV STLS WASH BSTN CHRL DTRT TRNT ATLN IPLS HSTN DLLS ELPS NSVL MIAM MPLS NWOR

slide-3
SLIDE 3
  • Out-of-the band monitoring
  • Using dedicated supervisory lightpath
  • Monitoring-trail/cycle

þ Simpler and more reliable implementation þ Fast failure localization û Bandwidth requirements

  • In-band-monitoring

þ Minimal bandwidth requirements

  • Taping operating connections only

û Less precision on failure localization

  • Combining with out-of-band monitoring
  • Dealing with imprecision of failure localization

How to localize failures?

3

slide-4
SLIDE 4
  • The network topology is known
  • At least 2-connected
  • The goal is to localize single cable cut
  • With minimal number of monitors
  • Linear combination of cover length and # of monitors

γ * (#monitors) + (total cover length)

Localize Single Link Failure with Monitoring Cycles

1 2 3

Alarm code table c2

c1 c0

0-1 0 0 1 0-2 0 1 0 0-3 0 1 1 1-2 1 0 0 1-3 1 0 1 2-3 1 1 0

c0 c1 c2

0-3 0 1 1

#monitors= 3 Cover length = 9 #monitors ≥ ⎡log2(#links+1)⎤

4

slide-5
SLIDE 5
  • If a node has degree 2 the neighboring links can not be

distinguished with cycles:

  • Using monitoring-trails instead of cycles

Optical Link Failure Monitoring with Trails

(M-Trail)

5

2 3 1 4 (b) An m-trail solution t1 t0 t2 (0,1) 1 1 5 (0,2) 1 1 1 7 (0,3) 1 4 (1,2) 1 1 3 (1,3) 1 1 6 (2,4) 1 1 (3,4) 1 2 t2 t1 t0 Link (c) Alarm code table Decimal (a) m-trail R T a b c d e

No optical loopback switching

slide-6
SLIDE 6
  • N. Harvey, M. Patrascu, Y. Wen, S. Yekhanin, and V. Chan,

“Non-Adaptive Fault Diagnosis for All-Optical Networks via Combinatorial Group Testing on Graphs,” in IEEE INFOCOM, 2007, pp. 697–705.

  • Bm-trail is a connected sub-graph
  • Euler constraint is relaxed

Optical Link Failure Monitoring with Bi- directional M-Trails (BM-Trails)

6

Optical loopback switching

slide-7
SLIDE 7

Architecture - Summary

  • A supervisory path (SP) is used to probe status of

a group of fibre segments and components

  • Each SP corresponds to a monitor which may

alarm when any irregularity is identified

  • By collecting all the flooded alarms in a failure

event, the network controller can identify the failed SRLG instantly

  • Objective: achieve fast unambiguous

failure localization (UFL) under any shared risk link group (SRLG) failure event

7

| 2011 | RNDM

slide-8
SLIDE 8

UNAMBIGUOUS FAILURE LOCALIZATION

8

slide-9
SLIDE 9
  • Given: an undirected 2 connected graph
  • 1. bm-trail – connected components
  • 2. m-trail – trail (Euler subgraph)
  • 3. m-cycle – closed trail
  • SRLG:
  • A. Single link
  • B. Dense SRLG: dual, triple link failures
  • C. Sparse SRLG: Some multi-link failures
  • Goal: find a minimum number of m-trail/m-

cycle/bm-trail in the graph, such that there are no pair of SRLGs with exactly the same m-trail/m-cycle/bm-trail passing through.

  • Goal: We assign non-zero alarm codes to the

links, such that each SRLG has unique alarm code, and the in each bit position the 1 bits form.

Unambiguous failure localization (UFL) under any shared risk link group (SRLG) failure event

9

#monitors≥ ⎡log (# SRLGs +1) ⎤

1 2 3

001 010 011 100 101 110

| 2011 | RNDM

slide-10
SLIDE 10
  • Number of bm-trails is ⎡#links/2⎤
  • To distinguish the failure of link e and f we need

an bm-trail terminating in node n.

  • Each bm-trail can terminate at most two nodes,

thus 2*[#bmtrails] ≥ [#nodes] Ring networks single failure

10

f e n

slide-11
SLIDE 11
  • J. Tapolcai1, Bin Wu, Pin-Han Ho, "On Monitoring and Failure

Localization in Mesh All-Optical Networks", in IEEE INFOCOM ’09

  • J. Tapolcai, B. Wu, and Pin-Han Ho, L. Rónyai, “A Novel Approach for

Failure Localization in All-Optical Mesh Networks”, in IEEE/ACM Transactions on Networking, Feb 2011.

  • Ring topology: #mtrails= ⎡#links/2⎤
  • Well-connected topologies (e.g. complete graph):
  • Decompose the graph into disjoint spanning trees

and code them separately

  • Nash-Williams and Tutte: a 2k connected graph

has k disjoint spanning tree #bm-trails= ⎡log2(#links +1)⎤=k More bounds for single link failure

11 | 2011 | RNDM

slide-12
SLIDE 12
  • 2⎡log(élszám+1)⎤ összefüggő gráf (m-tree)
  • Monitorok száma =⎡log(élszám+1)⎤
  • Nash-Williams és Tutte tétele: minden 2k él-összefüggő gráf k

diszjunkt feszítőfát tartalmaz

  • 2⎡log(élszám+1)⎤ összefüggő

Nagyon összefüggő gráf

12 | 2011 | RNDM

slide-13
SLIDE 13
  • b=⎡log(élszám+1)⎤ független feszítőfa
  • i. feszítőfához rendelt kódban az i. bit 1
  • Ekkor az i. bithez tartozó élek garantáltan összefüggőek

lesznek (sőt kifeszítik az egész gráfot)

  • A b hosszú bináris kódokat b vödörbe csoportosítjuk, és

minden vödörben legalább és legfeljebb kód kerül.

  • Indukció: rekurzív konstrukció
  • b=1,2 jó
  • b-re van megoldásunk

Nagyon összefüggő gráf

13

1 2 b

| 2011 | RNDM

slide-14
SLIDE 14
  • Csapjunk a végére 0 bitet b+1 bites kódjaink
  • Csapjunk a végére 1 bitet a maradék b+1 bites kódjaink
  • A második csoportból tegyünk át megfelelő darab kódot az

utolsó vödörbe.

  • Ha b ≥ 3

a független feszítő fák miatt igaz

  • Teljes gráfra igaz ha V ≥ 18

Nagyon összefüggő gráf

14

1 2 b b+1

slide-15
SLIDE 15
  • Randomly generated

5320 network topology

  • with 20, 30, 40, 50, 60

nodes

  • Ring networks and

randomly adding chords

  • 30 random graph series
  • In order to achieve 95%

confidence interwal

Analyzing Different Network Topologies

15

slide-16
SLIDE 16
  • The m-trails are calculated with heuristics

Simulation Results

16

1770

slide-17
SLIDE 17
  • Bin Wu, P.-H. Ho, and K. Yeung, “Monitoring trail: a new paradigm for fast link failure

localization in WDM mesh networks,” in IEEE GLOBECOM ’08

  • The problem has been formulated as an Integer Linear

Program (ILP)

The Concept of Monitoring Trails

17

ILP running *me= 9573.47 sec ~ 2:30hours Gap to the op*mality = 20.41% #monitors = 11 Total cost =98 where

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t0

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t1

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t2

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t3

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t4

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t5

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t6

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t7

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t8

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t9

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

t10

γ=5

slide-18
SLIDE 18

Constraint 1: Every link most have a unique alarm code

  • Unambiguous Failure Localization (UFL)

Constraint 2: The ”1” bits at each bit-position must form a trail

  • S. Ahuja, S. Ramasubramanian, and M. Krunz, “SRLG Failure Localization in

All-Optical Networks Using Monitoring Cycles and Paths,” in IEEE INFOCOM ‘08

  • The heuristic is proposed a structure where constraint 2 is

ensured and the goal is to fulfill constraint 1.

  • Cycle Accumulation (CA)
  • Our concept is provide a structure where constraint 1 is

ensured and our goal is to fulfill constraint 2.

  • Much faster for minimizing the m-trails

The Heuristic Algorithm I.

18

slide-19
SLIDE 19
  • Randomly generate unique alarm

code for each link

  • For each bit position we treat the

m-trail shaping problem separately

  • We start with the smallest bit

position and mark the links that has bit „1” at that position

  • The goal is to shape it as a trail

The Heuristic Algorithm II.

19

This link has no pair. 0001 1 1 0010 0111 1 1 1 1 1110

  • Each link has a pair for bit position i
  • The binary alarm codes are the same except at bit position i
  • One of the links is marked the other is not
  • If we change the alarm codes assigned to these links there would be no

change in other positions

  • Some links might not have a pair
  • Because 0000 alarm code can not be chosen (valid for 1 link only)
  • Its code pair was not assigned to any link (don’t care links)
slide-20
SLIDE 20
  • Greedy code swapping
  • Based on Euler’s theorem

we try to shape the links into a trail

  • Nodes with odd degree

must be reduced to 2

  • The edge set must be

connected

  • We repeat it for every bit

position

  • Until trails are shaped for

every bits

  • If we stuck we generate

new random codes

The Heuristic Algorithm III.

20

This link has no pair. : 0001 1 1 0010 0111 1 1 1 1 1110 1 1 1 1

slide-21
SLIDE 21

The performance of the heuristic compared to ILP

21

slide-22
SLIDE 22
  • The theoretical minimum ⎡log(#links+1)⎤

was almost always achieved if the network had no nodes with degree 2

  • The number of nodes with degree 2

strongly influences the number of m-trails

A Rule of Thumb of Topology Analysis

22

1 2 3 4 5 6 7 8 9 12 14 10 11 13 15 16 17 18 20 19

slide-23
SLIDE 23
  • Counter example
  • Every node has degree 3
  • The number of bm-trails is linear respect the

number of links Lack of 2 degree nodes is not guarantee

23

slide-24
SLIDE 24
  • Previous results on square Lattice
  • N. Harvey, M. Patrascu, Y. Wen, S. Yekhanin, and V. Chan, “Non-Adaptive Fault

Diagnosis for All-Optical Networks via Combinatorial Group Testing on Graphs,” in IEEE INFOCOM, 2007

  • Essentially Optimal solution
  • Close to the information theory lower bound

Two Dimensional Lattice

24

≥ ⎡log2(#links+1)⎤ 4 + ⎡log2(#links+1)⎤ ≥ #mtrails

3x5

  • Planar
  • Nodal degree is 4
  • Large diameter

Moebius ladder

slide-25
SLIDE 25
  • We add two m-trails

Construction of Chocolate Bar Graph for M-trail

25

[11] [10] [01] [00] Unique codes Unique codes Unique codes

slide-26
SLIDE 26
  • We generate n bitvectors r1,r2,…,rn of lenght b

Construction on Chocolate Bar Graph

26

slide-27
SLIDE 27
  • 1. The bitvectors ri are all non-zero and pairwise different
  • 2. The bitvectors ri ⊕ ri+1 are all non-zero and pairwise

different

  • 3. The first coordinate of ri and rn are the same
  • Abstract algebra
  • Galois Field
  • Two operators: addition and multiplication
  • It has q=2b elements

Generating Bitvectors r1,r2,…,rn

27

100 010 001 110 011 111 101 110 011 111 101 100 010 001

slide-28
SLIDE 28
  • We represent the elements as polynomials of degree strictly

less than b in F2

  • E.g. [1 0 1] represented with 1+ x2
  • We select an irreducible polynomial R of degree b over F2.
  • For F8 it can be
  • Operator ⊕ is performed by summing up two polynomials in

modulo R over F2

  • It would be bitwise OR

[1 1 1] ⊕ [1 0 1] = [0 1 0]

  • Operator * is performed by multiplying two polynomials in

modulo R over F2

  • It has a primitive element α such that all the powers are pair

wise different

Constructing Galois Field

28

slide-29
SLIDE 29
  • If for any i ≠ j
  • However 1 ⊕ α ≠ 0, thus αi= αj which is a contradiction.

Verification of the required properties

29

slide-30
SLIDE 30
  • We generalize the chocolate bar construction
  • First part of the code for horizontal axis
  • Second part of the code for vertical axis

Construction for 2 Dimensional Lattice for M- tree

30

slide-31
SLIDE 31

Benchmarks

31

  • Relatively sparse

graphs with low nodal degree

  • Large diameter
  • Benchmarking
  • It is much

stronger than any general method we have examined

  • J. Tapolcai, L. Rónyai, and Pin-Han Ho,

“Optimal Solutions for Single Fault Localization in Mesh Topologies”, in IEEE Infocom 2010 Mini-conf, and extended for IEEE/ACM Transaction on Communications

slide-32
SLIDE 32

MULTIPLE FAILURES

32

slide-33
SLIDE 33
  • Dedicated path protection is the only widely

accepted solution

  • 1+1+1
  • The chance of having a dual failure is very small
  • Shared protection is too complex for multiple failures
  • The network connectivity is often limited
  • Failure Dependent Protection could be a strong

alternative

  • Great capacity efficiency Stub-reuse
  • Great flexibility to sparse topologies

Multiple failures

33

slide-34
SLIDE 34
  • With multi-link SRLGs, each SRLG should be uniquely coded
  • Code of an SRLG is the bitwise OR of the codes of all the

links contained in the SRLG

  • to ensure that “an SRLG failure is a failure event that all the links in the

SRLG failed”

  • Note: routing of m-trails is performed on links in the topology,

but code uniqueness should be ensured for every SRLG

Multiple Failures

34

1 2 3 Alarm code table

c2 c1 c0

0-1 0 0 1 0-2 0 1 0 0-3 0 1 1 1-2 1 0 0 1-3 1 0 1 2-3 1 1 0

c0 c1 c2

0 1 1 0-2 0-1

slide-35
SLIDE 35
  • Non-adaptive
  • 1942 Washington, DC
  • Searching for syphilitic antigen in blood samples with chemical analysis
  • It might be economical to pool the blood samples, since there are only a

few blood samples with syphilitic antigen

  • Annals of Mathematical Statistics
  • Statistical „group test”
  • 1973 Gyula O. H. Katona emphasized the combinatorial aspect of

group testing

  • Strongly union-free sets
  • The alarm code is the characteristic vector of a set
  • We need codes where the bitwise or of any two codes is unique
  • Péter Frankl, Zoltán Füredi, Pál Erdős, Miklós Ruszinkó
  • 1987 F. K. Hwang, V. T. Sós :number of tests O(d2 log |E|), where |

E| is the number of elements

  • 2007 Eppstein, Goodrich, Hirschberg : more efficient code for real

size problems

  • Superimposed Codes

Combinatorial Group Testing (up to d failures)

35

slide-36
SLIDE 36

Dense SRLG

  • The code swapping mechanism was improved
  • Generate CGT codes
  • Code swapping is implemented among codes with

Hamming distance > 1

  • At least Binomial(|E|,2) possible code swaps
  • Special data structure for fast decision on the cost of a

code swapping

  • For each bit position we have
  • 1. A link is added and an other is removed
  • 2. A link is added
  • 3. A link is removed
  • Evaluate the change in the number of m-trails in constant

time

  • Finally, select the best code swapping
  • J. Tapolcai and Pin-Han Ho, et. al. “Failure Localization for Shared Risk Link Groups

in All-Optical Mesh Networks using Monitoring Trails”, accepted in IEEE/OSA Journal of Lightwave Technology in Feb. 2011.

slide-37
SLIDE 37

Dense SRLG - Greedy Code Swapping (GCS)

| 2011 | RNDM

37

slide-38
SLIDE 38
  • Random maximal planar graphs
  • CA – Cycle Accumulation
  • DSTC – Disjoint Spanning Tree Coding
  • Theoretical, often not feasible
  • Found that when d = 3, link monitoring (a

special and trivial solution for the problem) becomes the best solution most of the time

Dense SRLG - Simulation

38

slide-39
SLIDE 39

Sparse SRLG

  • Failure localization for bi-directional m-trails for a

small set of SRLGs

  • Identified necessary and sufficient conditions for

the existence of the problem solution

  • Use a graph partition method and re-invoke the

random code swapping mechanism to solve

  • Proved that the solution also yield the least

cover length

  • P. Babarczi, J. Tapolcai, and Pin-Han Ho, “Adjacent Link Failure Localization

with Monitoring Trails in All-Optical Mesh Networks”, IEEE/ACM Transactions on Networking, vol. 19, no. 3, pp. 907 - 920, 2011

  • P. Babarczi, J. Tapolcai, Pin-Han Ho, "SRLG Failure Localization with

Monitoring Trails in All-Optical Mesh Networks", In Proc. International Workshop on Design Of Reliable Communication Networks (DRCN), 2011 39

slide-40
SLIDE 40

NETWORK-WIDE LOCAL UFL (NWL-UFL)

Signaling free failure localization

40

slide-41
SLIDE 41
  • Drawbacks of the alarm dissemination process
  • Electronic signaling is required
  • Increase the failure localization latency
  • Achieving UFL all-optically at each node (Local-Unambiguous

Failure Localization)

  • Any node along an m-trail can obtain the on-off status of

the m-trail via optical signal tapping

  • Achieve UFL according to these status information
  • Alarm dissemination is no longer needed
  • For every node: Network Wide L-UFL (NWL-UFL)
  • Simplification in our current work
  • It is very complex to consider the general case (open trails)
  • We consider bidirectional m-trails

Signaling free failure localization

41

slide-42
SLIDE 42
  • The number of

alarms is no longer a concern

  • Minimize the

cover length

NWL-UFL solution

42

t3 t1 t4 t2

slide-43
SLIDE 43
  • Stars
  • The cover length ≥ |E|⎡log (|E|+1)⎤
  • in each leaf node at least ⎡log (|E|+1)⎤ m-trails must terminate
  • The cover length ≤ |E|⎡1+log (|E|+1)⎤
  • Random alarm codes of ⎡1+log (|E|+1)⎤ bits, and the

complement for each m-trail is added

  • The lower bound generally true

≥​|𝑊|/2 ⌈​log⁠(|𝐹|+1) ⌉

  • Line graph
  • The cover length is m2

Bounds on cover length

43

  • J. Tapolcai and Pin-Han Ho, et. al. “Network-Wide Local Unambiguous

Failure Localization (NWL-UFL) via Monitoring Trails” review in IEEE/ACM Transactions on Networking

slide-44
SLIDE 44
  • Complete graphs
  • The cover length is (|V|-1)2
  • Lower bound is ≥2|𝐹|(1−​1/|𝑊| )
  • Singular link: a link is traversed by

single m-trail

  • The number of singular link is σ
  • An m-trail may have at most one

singular link

  • 1. Cover length is ≥2|𝐹|−𝜏
  • An m-trail with a singular link must

span the graph

  • 2. Cover length is ≥𝜏(|𝑊|−1)
  • If 𝜏≥​2|𝐹|/|𝑊| 1. holds, otherwise 2

holds

Bounds on cover length

44

slide-45
SLIDE 45
  • Idea: we focus on solutions

where the m-trails are spanning tree

  • We generate b random

spanning trees

  • b is sufficiently large
  • We need to make the alarm

codes unique

  • If there is a code collusion

perform Greedy Link Swapping

  • Until UFL

Random Spanning Tree Assignment (RSTA) and Greedy Link Swapping (GLS)

45

slide-46
SLIDE 46

The GAP to the optimality is ~20%

46

slide-47
SLIDE 47

Performance

47

slide-48
SLIDE 48
  • The idea to use the spare capacity resources for

monitoring until there is a failure in the network

  • Cover length is no longer a concern
  • Shared protection
  • Failure Dependent Protection
  • Each node performs L-UFL parallel
  • We have a longer failure localization phase but

shorter connection setup

  • Still less than 100ms
  • P-cycle based mechanism
  • Much better capacity efficiency
  • Much better flexibility on topology diversity

Integration with Protection mechanism

48

slide-49
SLIDE 49

Example

49

t3 t1 t4 t2 w1

slide-50
SLIDE 50
  • Depends on traffic
  • Depends on the number of WL on each link
  • With 50 WL the cost of m-trail is relatively small

Simulation Results

50

working protection m-trail

slide-51
SLIDE 51

Comparison with other schemes

51

slide-52
SLIDE 52
  • Optical layer rerouting is a strong candidate

for achieving high end-to-end connection availability

  • Capacity efficient
  • Flexible
  • Easy to calculate
  • It should provide fast restoration
  • Failure localization is the main difficulty
  • With alarm dissemination
  • Without alarm dissemination

Conclusions

52