A New So(ware Architecture for Core Internet Routers Robert Broberg - PowerPoint PPT Presentation

A New So(ware Architecture for Core Internet Routers Robert Broberg September 16, 2011

Disclaimers and Credits • This is research and no product plans are implied by any of this work. • r3.cis.upenn.edu • Early and conInued support from www.vu.nl • A large team has generated this work and I am just one of many spokespersons for them. – any mistakes in this talk are mine.

Agenda Overview of the evoluIon of Core router design A sampling of SW problems encountered during evoluIon An approach to resolving SW problems and conInued evoluIon

Core Router EvoluIon • WAN interconnects of Mainframes over telecommunicaIon infrastructure • LAN/WAN interconnects – CORE routers(1+1 architectures) – Leased telco lines for customers – Dialup aggregaIon • As CORE routers evolved the old migrated to support edge connects • Telco becomes a client of the IP network

Growth driven by increased user demand 1200 10000 Internet traffic 1000 “2x/year” Router Capacity x2.9/18m 1000 800 The demand for increased network Moore’s law system performance/scale is x2/18m 600 relentless... 100 Silicon speed x1.5/18m 400 10 200 DRAM access rate x1.1/18m 0 1 2004 2006 2008 2010 2012 2014

System Scaling Problems

Some of the reasons SW problems were encountered • Routers started as Ightly coupled embedded systems – speeds and feeds were the game with features • CPUs + NPUs + very aware programmers led the game • EvoluIon was very fast – Business customers • leased lines and frame relay – Mid 1990s 64kbit dialup starts – Core bandwidth doubling every year • As IP customer populaIons grew feature demands increased • Model of SW delivery not conducive to resilience of rapid feature deployment

Intent /Goals – build an applicaIon unaware fault tolerant distributed system for routers – always on(200msec failover of apps) – allow for inserIon of new features with no impact to exisIng operaIons – support +/‐ 1 versioning of key applicaIons with zero packet loss – versioning to allow for live feature tesIng

Fault Tolerant RouIng

MoIvaIons • We must be able to do be]er than 1+1 – Low confidence in 1+1 as only tested when actually upgrading/downgrading/crashing • Want 100% confidence in new code – Despite lab Ime, rollout o(en uncovers showstoppers – Rollback can be very disrupIve • Aiming for sub‐200ms ‘outages’ – Want to be able to recover before VOIP calls noIce

Core Routers are built as Clusters but act as a single virtual machine MulIple line cards with potenIally various types of interfaces use NPUs to • route/switch amongst themselves via a data‐plane ( switch fabric ) A separate control plane controls all NPUs programming switching tables • and managing interface state along, rouIng protocols along with environmental condiIons – Control plane CPUs are typically generic and ride the commodity curve The Systems are heterogeneous and large • – Current Cisco CRS3 deployments switch 128tb, have ~150 x86 CPUs for the control plane along with ~1terabyte of memory and scale higher `

VirtualizaIon/VoIng/BGP • BGP state is Ied to TCP connecIon state – loopback interfaces • Process Placement • Versioning • Leader elecIon • HW virtualizaIon – e.g. NPU virtualizaIon???

Approach taken • AbstracIon layers chosen to isolate applicaIons – applicaIons ( e.g. protocols) isolated with wrappers • applicaIon transparent check poinIng!!!! • FTSS used to store state • SHIM used as wrapper – model to allow for voIng • OpImize, opImize, opImize – experiment and prototype • ORCM used for process placement • Protocols isolated by a shim layer – mulIple versions called siblings • 2 levels of operaIon chosen – no use seen for hypervisor – user mode for apps; kernel; abstracIon layer via SHIM + FTSS

Protocol VirtualizaIon • ExisIng protocol code largely untouched • Can run N siblings – Can be different versions – the protocol being virtualized – Allows full tesIng of new code – with seamless switchover and switch back • Currently we run one virtualizaIon wrapper – Protected by storing state into FTSS – Can be restarted thus upgradeable – Designed to know as li]le about protocol as possible • Treats most of it as a ‘bag of bits’ • ‘Run anywhere’ – no RP/LC assumpIons – We don’t care what you call the compute resources

CRS uIlisaIon RP RP LC LC LC LC • The CRS contains many CPUs which we treat as compute nodes in a cluster • If a node fails the others take up its workload • No data is lost on a failure, and the so(ware adapts to re‐establish redundancy

CRS uIlisaIon RP RP Blade server LC LC LC LC • External resources can be added to the system to add redundancy or compute power

Placement of Components • Each compute node runs FTSS and ORCM – both are started by ‘qn’ (system process monitor) FTSS RP • FTSS stores rouIng data redundantly across all the ORCM systems in the router • ORCM manages rouIng processes and distributes them around the router – constraints FTSS can be applied via configuraIon LC ORCM • FTSS can run on other nodes to make use of memory if desired.

BGP VirtualisaIon Reliable TCP endpoint ORCM BGP BGP VirtualisaIon service FTSS BGP BGP (shim) new BGP BGP BGP RIB Distributed dataplane

VirtualisaIon Layer recovery Reliable TCP endpoint ORCM BGP BGP VirtualisaIon service FTSS New shim BGP (shim) BGP RIB Distributed dataplane

IS‐IS VirtualisaIon IS‐IS L2 receiver ORCM IS‐IS IS‐IS VirtualisaIon service IS‐IS (shim) IS‐IS RIB Distributed dataplane

Fault Tolerant State Storage • Distributed Hash Table with intelligent placement of data • You can decide how much replicaIon – 2,3,4,N copies. • More copies ‐ more memory & slower write Imes. • Fewer copies – less simultaneous failures • Virtual Nodes – able to balance memory usage to space on compute node

FTSS distributed storage FTSS RP0 FTSS LC2 FTSS RP1 FTSS LC1 FTSS LC0 Some data – stored redundantly in 2 places

FTSS: losing a node FTSS Data missing is replicated from predecessor FTSS FTSS FTSS FTSS

DHT tuples Key Value Link • Binary data • Binary data • Unique set of binary data • Unique in DHT items • OpImizaIons for use as a list of keys DHT provides opImised rouInes for: • fast parallel store and deleIon of mulIple tuples • fast update of mulIple links within a tuple • OperaIons directly using the link list for storing related data • fast parallel recovery of mulIple, possibly inter‐linked, KVL tuples Copies of the tuples are stored on mulIple nodes for redundancy

DHT use in BGP processing BGP Shim operaIons Hand to BGP Receive Create Pass routes Acknowledge siblings; incoming BGP minimal from lead TCP routes messages message set sibling to RIB produced Unprocessed A]ributes + RIB prefixes messages NLRI Early redundant store Minimal set of Data store for to permit fast incoming BGP re‐syncing with acknowledgement of data RIBs on restart incoming BGP messages DHT

BGP data in DHT (I) Unprocessed Announcements from Source peers incoming peers, minimal set messages ASPATH 1 NLRI + 126 10.0.0.4 + a]rs peer 1 ASPATH 2 192.168.2 NLRI + 127 + a]rs peer 2 2.5 ASPATH 3 NLRI + 128 4.1.0.77 + a]rs peer 3 .. .. .. .. Data within Tuples Tuples links

BGP data in DHT (II) Siblings RIB prefixes 1 10.0.0.4 2 19.1.22.5 3 4.1.0.77 .. .. Links Tuples

DHT use in IS‐IS processing IS‐IS Shim operaIons Receive Create Hand to IS‐IS Pass routes incoming IS‐IS minimal siblings; routes from lead frames message set produced sibling to RIB LSPs RIB prefixes Minimal set of Data store for incoming IS‐IS resyncing with frames RIBs on restart DHT

MulIpath IGP/EGP demo

A New So(ware Architecture for Core Internet Routers Robert Broberg - PowerPoint PPT Presentation

A New So(ware Architecture for Core Internet Routers Robert Broberg September 16, 2011 Disclaimers and Credits This is research and no product plans are implied by any of this work. r3.cis.upenn.edu Early and conInued support from

So#ware(Project Lecture'4 Wouter'Swierstra So#ware(project((Lecture(4 1 Last%&me

CS 5150 So(ware Engineering 2. Steps in the so(ware development process William Y. Arms So(ware

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

So%ware Architecture Beyond the Blueprints Aligning So%ware Architecture with the facets of

So#ware(Project Lecture'3 Wouter'Swierstra So#ware(project((Lecture(3 1 Last%&me

CS 5150 So(ware Engineering 12. System Architecture William Y. Arms Design Design in So:ware

The Internet Structure routers 1 The Internet Structure The AS graph The Internet Structure The

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Toward so)ware engineering in prac0ce Claire Le Goues 15-214 April 27, 2017 1 Learning Goals

CS 5150 So(ware Engineering 18. Reuse and Design Pa9erns William Y. Arms So(ware Reuse It is

Infrastructure Internet Service Providers: An Internet Service Provider, or ISP, is a company that

Open Multi-Core Router H3C SR66 Development Trends of High-end Routers H3C SR66 Open

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

Readings Covered Further Readings Ware: Evaluation Appendix Ware, Appendix C: The Perceptual

What( is (so#ware( sustainability( anyway? ( ( ( NSF(SI2(PI(Mee2ng ,( 17?18(January(2013(

CS 5150 So(ware Engineering 3. Examples of so(ware development processes William Y. Arms

Why Are We Here? The combination of ownership diversity and technology diversity is

Switching and bridging CSCI 466: Networks Keith Vertanen

Last Lecture: Summary Chapter 5: The Data Link Layer Goals: Overview: Our goals: network

draft-briscoe-tsvwg-ecn-encap-guidelines-02 Bob Briscoe , BT John Kaippallimalil, Huawei Pat

A virtual private network (VPN) allows the provisioning of private network services for an

Relay Attacks in EMV Contactless Cards with Android OTS Devices e Vila , Ricardo J. Rodr

Overview/Questions Review: formatting HTML pages Frames Style Sheets 2 1 HTML Frames

What is a Process? Answer 1: a process is an abstraction of a program in execution Answer 2: a

A New So(ware Architecture for Core Internet Routers Robert Broberg - PowerPoint PPT Presentation

A New So(ware Architecture for Core Internet Routers Robert Broberg September 16, 2011 Disclaimers and Credits This is research and no product plans are implied by any of this work. r3.cis.upenn.edu Early and conInued support from

So#ware(Project Lecture'4 Wouter'Swierstra So#ware(project((Lecture(4 1 Last%&amp;me

CS 5150 So(ware Engineering 2. Steps in the so(ware development process William Y. Arms So(ware

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

So%ware Architecture Beyond the Blueprints Aligning So%ware Architecture with the facets of

So#ware(Project Lecture'3 Wouter'Swierstra So#ware(project((Lecture(3 1 Last%&amp;me

CS 5150 So(ware Engineering 12. System Architecture William Y. Arms Design Design in So:ware

The Internet Structure routers 1 The Internet Structure The AS graph The Internet Structure The

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Toward so)ware engineering in prac0ce Claire Le Goues 15-214 April 27, 2017 1 Learning Goals

CS 5150 So(ware Engineering 18. Reuse and Design Pa9erns William Y. Arms So(ware Reuse It is

Infrastructure Internet Service Providers: An Internet Service Provider, or ISP, is a company that

Open Multi-Core Router H3C SR66 Development Trends of High-end Routers H3C SR66 Open

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

Readings Covered Further Readings Ware: Evaluation Appendix Ware, Appendix C: The Perceptual

What( is (so#ware( sustainability( anyway? ( ( ( NSF(SI2(PI(Mee2ng ,( 17?18(January(2013(

CS 5150 So(ware Engineering 3. Examples of so(ware development processes William Y. Arms

Why Are We Here? The combination of ownership diversity and technology diversity is

Switching and bridging CSCI 466: Networks Keith Vertanen

Last Lecture: Summary Chapter 5: The Data Link Layer Goals: Overview: Our goals: network

draft-briscoe-tsvwg-ecn-encap-guidelines-02 Bob Briscoe , BT John Kaippallimalil, Huawei Pat

A virtual private network (VPN) allows the provisioning of private network services for an

Relay Attacks in EMV Contactless Cards with Android OTS Devices e Vila , Ricardo J. Rodr

Overview/Questions Review: formatting HTML pages Frames Style Sheets 2 1 HTML Frames

What is a Process? Answer 1: a process is an abstraction of a program in execution Answer 2: a

So#ware(Project Lecture'4 Wouter'Swierstra So#ware(project((Lecture(4 1 Last%&me

So#ware(Project Lecture'3 Wouter'Swierstra So#ware(project((Lecture(3 1 Last%&me