software routers
play

Software Routers ECE/CS598HPN Radhika Mittal Dataplane - PowerPoint PPT Presentation

Software Routers ECE/CS598HPN Radhika Mittal Dataplane programmability is useful New ISP services intrusion detection, application acceleration Flexible network monitoring measure link latency, track down traffic New protocols


  1. Software Routers ECE/CS598HPN Radhika Mittal

  2. Dataplane programmability is useful • New ISP services • intrusion detection, application acceleration • Flexible network monitoring • measure link latency, track down traffic • New protocols • IP traceback, Trajectory Sampling, … Enable flexible, extensible networks

  3. But routers must be able to keep up with traffic rates!

  4. Can we achieve both high speed and programmability for network routers? • Programmable hardware • Limited flexibility • Higher performance per unit power or per unit $. • More on it in the next class! • Software routers • RouteBrick’s approach • Can SW routers match the required performance? • Possible through careful design that exploits parallelism within and across servers. • Higher power, more expensive.

  5. RouteBricks: Exploiting Parallelism to Scale Software Routers SOSP’09 Mihai Dobrescu and Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia Ratnasamy Acknowledgements: Slides from Sylvia Ratnasamy, UC Berkeley

  6. Router definitions 1 N 2 R bits per second (bps) N-1 3 4 … 5 • N = number of external router `ports’ • R = line rate of a port • Router capacity = N x R

  7. Networks and routers edge (enterprise) UCB MIT AT&T home, small business core edge (ISP) UIUC HP core

  8. Examples of routers (core) Juniper T640 • R= 2.5/10 Gbps • NR = 320 Gbps Cisco CRS-1 • R=10/40 Gbps • NR = 46 Tbps 72 racks, 1MW

  9. Examples of routers (edge) Cisco ASR 1006 • R=1/10 Gbps • NR = 40 Gbps Juniper M120 • R= 2.5/10 Gbps • NR = 120 Gbps

  10. Examples of routers (small business) Cisco 3945E • R = 10/100/1000 Mbps • NR < 10 Gbps

  11. Building routers • edge, core • ASICs • network processors • commodity servers ß RouteBricks • home, small business • ASICs • network, embedded processors • commodity PCs, servers • Click Modular Router: 1-2Gbps

  12. Detour: Click Modular Router • Monolithic routing module in Linux • Difficult to reason about or extend. • Click: modular software router

  13. Detour: Click Modular Router • Element: • Connection between elements: queue push pull • Rules about permitted connections.

  14. Detour: Click Modular Router • Examples:

  15. Detour: Click Modular Router Example: IP Router (stare at it on your own)

  16. Building routers • edge, core • ASICs • network processors • commodity servers ß RouteBricks • home, small business • ASICs • network, embedded processors • commodity PCs, servers • Click Modular Router: 1-2Gbps

  17. A single-server router sockets with cores cores cores mem mem server I/O hub memory point-to-point controllers links ( e.g. , QPI) (integrated) Network Interface ports Cards (NICs) N router links

  18. Packet processing in a server Per packet, cores cores 1. core polls input port mem mem 2. NIC writes packet to memory 3. core reads packet I/O hub 4. core processes packet ( address lookup, checksum, etc. ) 5. core writes packet to port

  19. Packet processing in a server 8x 2.8GHz cores cores Assuming 10Gbps with all 64B packets mem mem à 19.5 million packets per second à one packet every 0.05 µsecs I/O hub à ~1000 cycles to process a packet Today, 200Gbps memory Today, 144Gbps I/O Suggests efficient use of CPU cycles is key! Teaser: 10Gbps?

  20. Lesson#1: multi-core alone isn’t enough `Older’ (2008) Current (2009) cores cores cores cores Shared front- mem mem side bus I/O hub mem `chipset’ mem Memory controller in `chipset’ Hardware need: avoid shared-bus servers

  21. Lesson#2: on cores and ports poll transmit input output cores ports ports How do we assign cores to input and output ports?

  22. Lesson#2: on cores and ports Problem: locking Hence, rule: one core per port

  23. Lesson#2: on cores and ports Problem: inter-core communication, cache misses L3 cache L3 cache L3 cache L3 cache packet (may be) transferred packet transferred between cores packet stays at one core packet always in one cache across caches parallel pipelined Hence, rule: one core per packet

  24. Lesson#2: on cores and ports • two rules: • one core per port • one core per packet • problem: often, can’t simultaneously satisfy both one core per packet one core per port • solution: use multi-Q NICs

  25. Multi-Q NICs • feature on modern NICs (for virtualization) • port associated with multiple queues on NIC • NIC demuxes (muxes) incoming (outgoing) traffic • demux based on hashing packet fields ( e.g. , source+destination address) Multi-Q NIC: incoming traffic Multi-Q NIC: outgoing traffic

  26. Multi-Q NICs • feature on modern NICs (for virtualization) • repurposed for routing queue • rule: one core per port • rule: one core per packet • if #queues per port == #cores, can always enforce both rules

  27. Lesson#2: on cores and ports recap: • use multi-Q NICs • with modified NIC driver for lock-free polling of queues • with • one core per queue (avoid locking) • one core per packet (avoid cache misses, inter-core communication)

  28. Lesson#3: book-keeping cores cores 1. core polls input port mem mem 2. NIC writes packet to memory 3. core reads packet I/O hub 4. core processes packet ports 5. core writes packet to out port and packet descriptors problem: excessive per packet book-keeping overhead • solution: batch packet operations • NIC transfers packets in batches of `k’

  29. Recap: routing on a server Design lessons: 1. parallel hardware • at cores and memory and NICs 2. careful queue-to-core allocation • one core per queue, per packet 3. reduced book-keeping per packet • modified NIC driver w/ batching

  30. Single-Server Measurements • test server: Intel Nehalem (X5560) cores cores • dual socket, 8x 2.80GHz cores mem mem • 2x NICs; 2x 10Gbps ports/NIC I/O hub max 40Gbps 10Gbps additional servers generate/sink test traffic

  31. Single-Server Measurements • test server: Intel Nehalem (X5560) cores cores • dual socket, 8x 2.80GHz cores packet processing mem mem • 2x NICs; 2x 10Gbps ports/NIC Click runtime I/O hub modified NIC driver • software: kernel-mode Click [TOCS’00] • with modified NIC driver (batching, multi-Q) 10Gbps additional servers generate/sink test traffic

  32. Single-Server Measurements • test server: Intel Nehalem (X5560) cores cores packet processing mem mem • software: kernel-mode Click [TOCS’00] Click runtime I/O hub • with modified NIC driver modified NIC driver • packet processing • static forwarding (no header processing) 10Gbps • IP routing • trie-based longest-prefix address lookup • ~300,000 table entries [RouteViews] • checksum calculation, header updates, etc. additional servers generate/sink test traffic

  33. Single-Server Measurements • test server: Intel Nehalem (X5560) cores cores packet processing mem mem • software: kernel-mode Click [TOCS’00] Click runtime I/O hub • with modified NIC driver modified NIC driver • packet processing • static forwarding (no header processing) 10Gbps • IP routing • input traffic additional servers • all min-size (64B) packets generate/sink test traffic (maximizes packet rate given port speed R) • realistic mix of packet sizes [Abilene]

  34. Factor analysis: design lessons 19 pkts/sec (M) 5.9 2.8 1.2 older current Nehalem Nehalem shared-bus Nehalem + `batching’ w/ multi-Q server server NIC driver + `batching’ driver Test scenario: static forwarding of min-sized packets

  35. Single-server performance 40Gbps 36.5 36.5 min-size packets Gbps realistic pkt sizes 9.7 6.35 static forwarding IP routing Bottleneck? Bottleneck?

  36. Recap: single-server performance R NR current servers 1/10 Gbps 36.5 Gbps (realistic packet sizes) current servers 6.35 1 (min-sized packets) (CPUs bottleneck)

  37. Recap: single-server performance With newer servers? (2010) 4x cores, 2x memory, 2x I/O

  38. Recap: single-server performance R NR current servers 1/10 Gbps 36.5 Gbps (realistic packet sizes) current servers 6.35 1 (min-sized packets) (CPUs bottleneck) upcoming servers –estimated 1/10/40 146 (realistic packet sizes) upcoming servers –estimated 1/10 25.4 (min-sized packets)

  39. Practical Architecture: Goal • scale software routers to multiple 10Gbps ports • example: 320Gbps (32x 10Gbps ports) • higher-end of edge routers; lower-end core routers

  40. A cluster-based router today interconnect? 10Gbps

  41. Interconnecting servers Challenges • any input can send up to R bps to any output

  42. A naïve solution N 2 internal links of capacity R R R R R 10Gbps R problem: commodity servers cannot accommodate N x R traffic

  43. Interconnecting servers Challenges • any input can send up to R bps to any output • but need a lower-capacity interconnect • i.e., fewer (<N), lower-capacity (<R) links per server • must cope with overload

  44. Overload drop at input servers? need to drop 20Gbps; (fairly problem: requires global state across input ports) 10Gbps 10Gbps 10Gbps 10Gbps drop at output server? problem: output might receive up to N x R traffic

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend