robotron top down network management at scale
play

Robotron: Top-down Network Management at Scale - PowerPoint PPT Presentation

Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung , Xiaozheng Tie, Starsky H.Y. Wong, Hongyi Zeng ACM SIGCOMM 2016 August 25, 2016 Scale of Facebook Community 500 Million 1 Billion 1 Billion 1.7 Billion on


  1. Robotron: Top-down Network Management at Scale Yu-Wei Eric Sung , Xiaozheng Tie, Starsky H.Y. Wong, Hongyi Zeng ACM SIGCOMM 2016 August 25, 2016

  2. Scale of Facebook Community 500 Million 1 Billion 1 Billion 1.7 Billion on Facebook Monthly on Whatsapp Monthly on Instagram Monthly on Messenger Monthly

  3. Network Management at Facebook . . . What’s involved? . . . . . . 1 R 511 . . . . . . . . . • Goals: Build and evolve FB network • Example tasks: circuit/device turnup, network monitoring • Human interactions -> outages . . . . . . . . . ` . . . R 512 1024 . . . . . .

  4. Network Management at Facebook Why is it hard? • Distributed Configurations • Multiple Domains • Versioning • Dependency • Vendor Differences

  5. Network Management at Facebook Early days… 2004-2007 2008 2009 2010 2011 2012 2013 2014 2015 Manual Configuration and Monitoring with ad-hoc scripts

  6. Contribution Robotron started 2004-2007 2008 2009 2010 2011 2012 2013 2014 2015 Manual Configuration and Our Paper Monitoring with ad-hoc scripts • Shed light on • Network management tasks • Robotron’s usage • Evolution of Roboron • Our experiences using Robotron

  7. Overview of Facebook’s Network Lifecycle of user requests Users Internet POPs Backbone Data Centers

  8. Point of Presence (POP) • Standardized topology • Services: LB, Cache • Common tasks • Build/upgrade a cluster • Provisioning new peering circuits Users Internet POPs Backbone Data Centers

  9. Backbone • Irregular, demand-driven topology • Common tasks: • Add/migrate circuits • Add/remove routers Users Internet POPs Backbone Data Centers

  10. Datacenter • Standardized topology • Services: Web, Cache, Database • Common tasks • Build/decomm a cluster • Cluster capacity upgrade Users Internet POPs Backbone Data Centers

  11. Overview of Facebook’s Network Multiple versions of FB cluster architectures co-exist # of clusters (normalized) (normalized) 1 Gen2 Gen1 0.8 POP 0.6 0.4 0.2 0 Time 8 generations 1 Gen3V6 # of clusters (normalized) Gen3 DC 0.8 Gen2V6 Gen2-D Gen2-C 0.6 Gen2-B Gen2-A 0.4 Gen1 0.2 0 Time

  12. Robotron: “Top-Down” Network Management System@FB Overview Network Config Deployment Monitoring Design Generation FBNet DB

  13. FBNet: Modeling the Network Example 4-post POP cluster Internet BB 1 BB 2 PR 2 PR 1 20G 4-post POP PSW c PSW d PSW a PSW b Cluster To Top-of-Rack switches & servers

  14. FBNet: Modeling the Network Object 2001::1 2001::2 ae0 ae1 Circuit et1/1 10G et2/1 PhysicalInterface PSW a PR 1 et1/2 et3/1 10G AggregatedInterface Linecard BgpV6Session eBGP session Networkswitch Linecard V6Prefix PhysicalInterface Circuit Circuit

  15. FBNet: Modeling the Network Value 2001::1 2001::2 ae0 ae1 Circuit et1/1 10G et2/1 PhysicalInterface PSW a PR 1 et1/2 et3/1 speed=10G 10G name=et1/1 AggregatedInterface Linecard BgpV6Session eBGP session name=ae0 Networkswitch Linecard slot=1 name=PSW a V6Prefix PhysicalInterface model=X Circuit prefix=2001::1 name=et1/2 Circuit speed=10G

  16. FBNet: Modeling the Network Relationship 2001::1 2001::2 ae0 ae1 Circuit et1/1 10G et2/1 a_endpoint= PhysicalInterface PSW a PR 1 et1/2 et3/1 z_endpoint= speed=10G 10G name=et1/1 It’s complicated AggregatedInterface linecard= Linecard agg_interface= BgpV6Session eBGP session name=ae0 Networkswitch Linecard slot=1 name=PSW a V6Prefix PhysicalInterface a_prefix= model=X Circuit z_prefix= device= prefix=2001::1 name=et1/2 interface= agg_interface= linecard= Circuit a_endpoint= z_endpoint= speed=10G

  17. FBNet Model Snippet class PhysicalInterface(Interface): linecard = models.ForeignKey(Linecard) agg_interface = models.ForeignKey( AggregatedInterface)

  18. FBNet Model Snippet Related models class PhysicalInterface(Interface): linecard = models.ForeignKey(Linecard) agg_interface = models.ForeignKey( AggregatedInterface)

  19. FBNet Model Snippet Model inheritance class PhysicalInterface(Interface): linecard = models.ForeignKey(Linecard) agg_interface = models.ForeignKey( AggregatedInterface)

  20. FBNet: Architecture API Layer Read API Read API • RPC services Read API Read API Write Service Read Service • Read: fine-grained per- model query • Write: task-based • High Availability: Multiple replicas per DC FBNet

  21. FBNet: Architecture API Layer Read API Read API • 1 primary, multiple secondary Read API Read API Write Service Read Service DBs • Scalability: 1 slave per DC Slave Slave Primary Secondary Replication FBNet Stream

  22. Robotron’s management life cycle Network Config Deployment Monitoring Design Generation FBNet DB

  23. Network Design Design intent à FBNet objects Template for a POP cluster FBNet objects Cluster( devices={ PR: DeviceSpec( PR 1 PR 2 hardware=“Router_Vendor1” num_devices=2) PSW a PSW b PSW c PSW d PSW: DeviceSpec( BackboneRouters: 2 hardware=“Switch_Vendor2” NetworkSwitches: 4 num_devices=4) Circuits: 16 }, PhysicalInterfaces: 32 Link_groups=[ AggregatedInterfaces: 16 LinkGroup( V6Prefixes: 16 a_device=PR, BgpV6Sessions: 8 z_device=PSW, pifs_per_agg=2, 94 objects across 7 ip=V6) models ] )

  24. Config Generation FBNet objects à Device configs FBNet PR 1 PR 2 struct Device { PSW a PSW b PSW c PSW d 1: list<AggregatedInterface> aggs, FBNet objects } Vendor Config Schema struct AggregatedInterface { agnostic 1: string name, Per-device PSW b PSW a 2: i32 number, PR 1 PR 2 objects 3: string v4_prefix, PSW c PSW d 4: string v6_prefix, 5: list<PhysicalInterface> pifs, } struct PhysicalInterface { 1: string name, }

  25. Config Generation FBNet objects à Device configs FBNet PR 1 PR 2 PSW a PSW b PSW c PSW d FBNet objects Vendor {% for agg in device.aggs %} Config Schema agnostic interface {{agg.name}} Per-device PSW b mtu 9192 PSW a PR 1 PR 2 objects no switchport PSW c Vendor 1 PSW d Vendor 2 load-interval 30 {% if agg.v4_prefix %} interface template interface template ip addr {{agg.v4_prefix}} {% endif %} BGP template BGP template {% if agg.v6_prefix %} Vendor MPLS template MPLS template … ipv6 addr {{agg.v6_prefix}} … Specific {% endif %} no shutdown ! Vendor-specific PR 1 config PSW a config PSW b config {% endfor %} Device Configs PSW c config PSW d config PR 2 config

  26. Config Generation FBNet objects à Device configs FBNet PR 1 PR 2 PSW a PSW b PSW c PSW d FBNet objects Vendor {% for agg in device.aggs %} Config Schema agnostic interface {{agg.name}} Per-device PSW b mtu 9192 PSW a PR 1 PR 2 objects no switchport PSW c Vendor 1 PSW d Vendor 2 load-interval 30 {% if agg.v4_prefix %} interface template interface template ip addr {{agg.v4_prefix}} {% endif %} BGP template BGP template {% if agg.v6_prefix %} Vendor MPLS template MPLS template … ipv6 addr {{agg.v6_prefix}} … Specific {% endif %} no shutdown ! Vendor-specific PR 1 config PSW a config PSW b config {% endfor %} Device Configs PSW c config PSW d config PR 2 config

  27. Config Generation FBNet objects à Device configs FBNet PR 1 PR 2 PSW a PSW b PSW c PSW d FBNet objects Vendor {% for agg in device.aggs %} Config Schema agnostic interface {{agg.name}} Per-device PSW b mtu 9192 PSW a PR 1 PR 2 objects no switchport PSW c Vendor 1 PSW d Vendor 2 load-interval 30 {% if agg.v4_prefix %} interface template interface template ip addr {{agg.v4_prefix}} {% endif %} BGP template BGP template {% if agg.v6_prefix %} Vendor MPLS template MPLS template … ipv6 addr {{agg.v6_prefix}} … Specific {% endif %} no shutdown ! Vendor-specific PR 1 config PSW a config PSW b config {% endfor %} Device Configs PSW c config PSW d config PR 2 config

  28. Config Generation FBNet objects à Device configs FBNet PR 1 PR 2 PSW a PSW b PSW c PSW d FBNet objects Vendor {% for agg in device.aggs %} Config Schema agnostic interface {{agg.name}} Per-device PSW b mtu 9192 PSW a PR 1 PR 2 objects no switchport PSW c Vendor 1 PSW d Vendor 2 load-interval 30 {% if agg.v4_prefix %} interface template interface template ip addr {{agg.v4_prefix}} {% endif %} BGP template BGP template {% if agg.v6_prefix %} Vendor MPLS template MPLS template … ipv6 addr {{agg.v6_prefix}} … Specific {% endif %} no shutdown ! Vendor-specific PR 1 config PSW a config PSW b config {% endfor %} Device Configs PSW c config PSW d config PR 2 config

  29. Usage Statistics • # of FBNet model change? • # changed FBNet objects per design change? • Frequency and size of config change?

  30. FBNet Model Changes How much does FBNet model change over time? • Still many changes over time • Reasons: new models, values, relationships

  31. Design Changes How many FBNet object are changed per design change? CDF across design changes 1 0.75 All Interface 0.5 POP/DC Circuit v6 Prefix 0.25 v4 Prefix Device 0 1 10 100 1,000 10,000 # of FBNet objects CDF across design changes 1 0.75 All Backbone Interface 0.5 Circuit v6 Prefix 0.25 v4 Prefix Device 0 1 10 100 1,000 10,000 # of FBNet objects

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend