Network Explained Grgory Degueldre Stefan Gulinck Agenda History - - PowerPoint PPT Presentation

network explained
SMART_READER_LITE
LIVE PREVIEW

Network Explained Grgory Degueldre Stefan Gulinck Agenda History - - PowerPoint PPT Presentation

Redesign Belnet Network Explained Grgory Degueldre Stefan Gulinck Agenda History of the Belnet network topology Situation as-is Driving factors (issues and incidents) Actions taken Redesign 08/11/2018 Redesign Belnet


slide-1
SLIDE 1

Redesign Belnet Network Explained

Grégory Degueldre Stefan Gulinck

slide-2
SLIDE 2

Agenda

  • History of the Belnet network topology
  • Situation as-is
  • Driving factors (issues and incidents)
  • Actions taken
  • Redesign

08/11/2018 Redesign Belnet Network Explained

slide-3
SLIDE 3

History of the topology

08/11/2018 Redesign Belnet Network Explained

Belnet < 2016

slide-4
SLIDE 4

History of the topology

08/11/2018 Redesign Belnet Network Explained

slide-5
SLIDE 5

Situation AS-IS

08/11/2018 Redesign Belnet Network Explained

slide-6
SLIDE 6

Issues

  • Roots
  • G8032 bug
  • Ineffective MPLS Fast-Reroute
  • Big increase of traffic on September 2017
  • Bad repartition of bandwidth among the member of a LAG
  • Incidents
  • 20/11 : Fiber cut between DC Evere and Zaventem
  • 09-13/12: Card flapping on r1.brueve

08/11/2018 Redesign Belnet Network Explained

slide-7
SLIDE 7

Issue 1: G8032

08/11/2018 Redesign Belnet Network Explained

  • Redesign of the Network: making it linear.

Huge change in the Design => FRR issue !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Broadcast storm on our Network taking down our Juniper Routers Made it linear But Introduced collateral damages

slide-8
SLIDE 8

Issue 2: Fast-ReRoute (MPLS Redundancy)

  • What is FRR ?
  • Redirection sub 50ms on MPLS layer
  • Dispensable with G.8032 but still implemented.
  • What’s the problem ?
  • Too many VLANs
  • Convergence
  • Path recalculation
  • BGP sessions down with big convergence time
  • Work around:
  • BFD timer change to make the recalculation faster.

08/11/2018 Redesign Belnet Network Explained

Config changed to avoid BGP to flap But Reroute not sub 50ms

slide-9
SLIDE 9

Issue 3: Poor hashing algorithm

08/11/2018 Redesign Belnet Network Explained

  • Yearly traffic increase on backbone
  • Use of cloud services (Office365, etc.)
  • Capacity Mgt : issue with order of 100GE cards.
  • Extra ports in LAG

No big deal…

slide-10
SLIDE 10

Issue 3: Poor hashing algorithm

08/11/2018 Redesign Belnet Network Explained

Repartition done by hashing algorithms

slide-11
SLIDE 11

Issue 3: Poor hashing algorithm

08/11/2018 Redesign Belnet Network Explained

100GE card in Prod (EVE & ZAV & DIE) But Still NOK for other POPs

slide-12
SLIDE 12

Incident 1: Fiber cut Evere - Zaventem

  • 20/11/2017 : Fiber cut
  • Impact: Saturation on bruzav impacting nearly all Belnet customers.
  • Reactions:
  • New direct optical links between brueve and bruzav routers to offload

the LAG.

  • Duplicated VLAN and MPLS path to increase the chance of a better

repartition.

08/11/2018 Redesign Belnet Network Explained

Bought some time waiting for the 100GE

slide-13
SLIDE 13

Incident 2: Card flapping at brueve

  • 9/12 – 13/12
  • Flap of fpc (Juniper card)
  • Impact:
  • Backbone instability for all customers
  • Instability for customers connected on that specific fpc
  • Reactions:
  • Shutdown of the interface from the LAG => stable again but intensification
  • f the issue of LAG repartition
  • All component have been replaced (fpc/mic/XFP/SFP)

08/11/2018 Redesign Belnet Network Explained

slide-14
SLIDE 14

Conclusion

  • The situation is complex and is the result of a lot of design choices

and workaround for encountered bugs/issues.

  • Belnet has done a lot of things to improve the network and to

diminish the impact during incident but there is still to be done

  • Murphy hasn’t help us a lot as everything that could go wrong has

gone wrong.

08/11/2018 Redesign Belnet Network Explained

slide-15
SLIDE 15

Actions taken

  • Redesign of the Network as a Project
  • Project brief is approved as P1
  • COS  Class of service. Guarantuee access to network

management when things go A-wire

  • Further upgrade 100GE card
  • On r1.brudie (central ring)
  • Redundancy on all three routers of central ring
  • Redistribute transit routers more over the network
  • We’ve abandoned G8032

08/11/2018 Redesign Belnet Network Explained

slide-16
SLIDE 16

Still To do...

  • Redesign Network and make it more robust and resilient.

Simplified network Fast recovery and fast convergence Better managed network for capacity management

  • Solve Hashing issue

Testing and chasing third party to have a better hashing algorithm, i.e. 5-tuple hashing

08/11/2018 Redesign Belnet Network Explained

slide-17
SLIDE 17

Redesign

  • Issues:
  • Hashing
  • Fast Reroute
  • Fast route convergence
  • QoS matching
  • Manageability:
  • Readability of Network
  • Capacity Plan
  • Monitoring
  • Cost
  • IP Topology
  • Full-meshed
  • Ring
  • Star
  • Transport Technology
  • Layer 1 (OTN)
  • Layer 2 (ELINE)
  • Layer 2 (ELAN)
  • Onion vs Flat
  • Flexibility vs convergence

08/11/2018 Redesign Belnet Network Explained

slide-18
SLIDE 18

L2 Logical Topology (TO-BE)

08/11/2018 Redesign Belnet Network Explained

slide-19
SLIDE 19

L2 Topology backbone (TO-BE)

08/11/2018 Redesign Belnet Network Explained

slide-20
SLIDE 20

L2 Topology MX104 (TO-BE)

08/11/2018 Redesign Belnet Network Explained

slide-21
SLIDE 21

Onion Approach

  • Full routing table not on MX104 anymore
  • (+) Better convergence time for BGP update
  • (+) Memory usage on MX104
  • MX104 will receive default route from two MX480/MX960
  • (-) Less good decision about traffic routing
  • (-) May require migration
  • f customers with

full routing table

08/11/2018 Redesign Belnet Network Explained

slide-22
SLIDE 22

Capacity study

  • BRUSSELS (BRUDIE, BRUEVE, BRUZAV): 200Gbps
  • 40Gbps:
  • ANTCEN
  • ANTWIL
  • BRUCAM
  • HASDIE
  • LEUHEV
  • LEUGAS
  • LLN
  • 20Gbps: all others

08/11/2018 Redesign Belnet Network Explained

slide-23
SLIDE 23

Thank you

for your attention

slide-24
SLIDE 24