Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala - - PowerPoint PPT Presentation
Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala - - PowerPoint PPT Presentation
Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala University Robert Olsson Uppsala University 2008-10-10 Over 10 years in production Thr e e m a j or i ns t a l l a t i ons UU c or e r out e r s t owa r
Over 10 years in production
- Thr
e e m a j
- r
i ns t a l l a t i
- ns
- UU
c
- r
e r
- ut
e r s t
- wa
r ds SUNET
- UU
St ude nt Ne t wor k 30. 000 s t ude nt s
- f
t p. s une t . s e
Over 10 years in production
UU facts Over 25.000 registered hosts Dual ISP BGP connect GIGE Local DMZ BGP peering GIGE Ipv4/Ipv6 OSPFv2/OSPFv3 600 netfilter rules 10 Cisco 6500 OSPF-routers Redundant Power 10g planned
Over 10 years in production
Over 10 years in production The SUNET FTP ARCHIVE
ftp ftp ftp ftp DMZ AS15980 AS1653
Juniper Juniper
Bifrost LINUX Bifrost LINUX
Router discovery IRDP
Full Internet routing IPv4, IPv6
10TB
Over 10 years in production
Student Network facts Dual ISP BGP connect GIGE Local DMZ BGP peering GIGE Ipv4 IRDP (ICMP) About 30 netfilter rules 19 netlogin-service boxes for premises Very “innovative” users Well connected 10g planned
Over 10 years in production
Over 10 years in production Student Network Core Router
IP-login installation
at Uppsala University
Approx 1000 outlets
Testing, Verification Development & Research
- St
a r t e d
- ut
a s s i m pl e t e s t i ng.
- Cur
i
- s
i t y, Ope n Sour c e , Col l a bor a t i
- n
- Re
l a t i ve l y f r e e dom , t he i de a t
- us
e i n
- wn
i nf r a s t r uc t ur e . No ne e d f
- r
e xt e r na l f undi ng.
- OS
wa s i nt e nde d f
- r
de s kt
- ps
.
Building Blocks
Hardware: PC Motherbord/CPU/Memory Network Interfaces GIGE/10g WiFi etc Software Operating System Linux/BSD/Microsoft Applications Routing Daemons Quagga/XORP IP-login/netlogon Network Cable, Fiber, Copper Equipment, Switches
Testing, Verification Development & Research
No ne e d f
- r
t e s t ne t wor k. W e c
- ul
d t e s t i n
- wn
i nf r a s t r uc t ur e . ( Or SLU) W e c
- ul
d wor k
- n
c
- m pl
i c a t e d i s s ue s
- NAPI
3 ye a r s
- Pkt
ge n 2 ye a r s
- f
i b_t r i e 1ye a r
- TRASH
1 ye a r
- Ha
r dwa r e Te s t i ng M a ny ye a r s
Tested device
Flexible netlab at Uppsala University
* Raw packet performance * TCP * Timing * Variants
sink device linux
El cheapo-- High customable -- We write code :-) Ethernet | |
Test generator linux
Ethernet
netlab at UU
PIII for many years ftp.sunet.se Dual-Power supply
Intel NIC's
Latest & Greatest Hardware
Intel 10g board Chipset 82598 Open chip specs. Thanks Intel! But why fixed XFP's?? Better classifier needed.
Latest & Greatest Hardware
2U Hi-End Opteron box TYAN S2927/Barcelona
Not all were blessed...
Memory Latency
lat_mem_rd from LMbench
Quad vs Dual Core Opteron
2U Hi-End Opteron box TYAN S2927/Barcelona
Dual
- Cor
e 2222 3. G Hz Q uad- C or e 2365 2. 3 G Hz 100 200 300 400 500 600 700 800 900
Surprising! One CPU core on 2.3 GHz is faster then is the 3.0 GHz Dual-Core. L3 cache, Microcode?
Bifrost concept
- Linux kernel collaboration
- Performance testing, development of tools and
testing techniques
- Hardware validation, support from big vendors
- Detect and cure problems in lab not in the network
infrastructure.
- Test deploy (Often in own network)
The Linux Ashram
The guru is ANK
Kernel footprints
HW_FLOWCONTROL Tulip FASTROUTE path Whitehole device. In the middle of dev.c Hardwired IP addresses. (Russian?)
Overall Effect
- Inelegant handling of heavy net loads
– System collapse
- Scalability affected
– System and number of NICS
- A single hogger netdev can bring the system to
its knees and deny service to others
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60
Sum m ar y 2. 4 vs f eedback
March 15 report on lkml
Thread: "How to optimize routing perfomance" reported by Marten.Wikstron@framsfab.se
- Linux 2.4 peaks at 27Kpps
- Pentium Pro 200, 64MB RAM
A high level view of new system
P
pkt s
Interupt area Polling area
➔P packets to deliver to the stack (on the RX ring) ➔Horizontal line shows different netdevs with different in ➔Area under curve shows how many packets before next ➔Quota enforces fair share
Quota
NAPI observations & issue: fairness
I dl e DoS 100 200 300 400 500 600
122 123 99 380 95 408 95 541 93 254 101 323 105 540 96 202 96 190
Pi ng l at ency/ f ai r ness under xt r em e l
- ad/
SM P
Pi ng l at ency i n m i cr
- seconds
Ping through a idle router Ping through a router under a DoS attack @ 890 kpps
V a e
Very well behaved just an increase a couple of 100 microsec !!
NAPI Kernel support
NAPI kernel part was included in: 2.5.7 and back ported to 2.4.20 Current driver support: e1000 Intel GIGE NIC's – (UFO driver) First driver where (RX & TX done in softirq) tg3 BroadCom GIGE NIC's dl2k D-Link GIGE NIC's tulip (pending) 100 Mbs
Forwarding performance (old)
64 128 256 512 1024 1518 100 200 300 400 500 600 700 800 900
Li nux f
- r
war di ng r at e at di f f er ent pkt si zes Li nux 2. 5. 58 UP/ skb r ecycl i ng 1. 8 G Hz XEO N
I nput Thr
- ughput
packet si ze kpps
Fills a GIGE pipe -- starting from256byte pkts
ipv6 performance(old)
T-put 100 200 300 400 500 600 700
Forwarding kpps 76 byte pkt.
Linux 2.5.12 1 CPU(SMP) Opteron 1.6 GHz e1000
Single flow small Singe flow 543 r rDoS 543 r
How rDoS work on sparse routing table?
fib_trie performance comparison
fib_hash fib_trie 100 200 300 400 500 600 700
forwarding kpps
Linux 2.6.16 1 CPU used(SMP) Opteron 1.6 GHz e1000
dsh hash 5 r single flow 5 r rDoS 123kr rDoS
Preroute patches to disable route hash
32/64 bit || sizeof(sk_buff)
32 64 50 100 150 200 250 300
sizeof(struct sk_buff)
size
64 bit 32 bit 0.1 0.2 0.3 0.4 0.5 0.6
relative forwarding
T-put
Gcc 3.4 x86_64 vs i686 on same HW
Trash data-structure
Interesting novel approach. Trie-Hash --> Trash When extending the LC-trie Paper with Stefan Nilsson/KTH Exploits that key-length does not affect tree depth We lengthen the so key it can be better compressed. Implemented in Linux forwarding patch as a replacement to the route hash.
Trash data-structure
Can do full key lookup. src/dst/sport/dport/proto/if etc and later socket. For even ip6 with little performance degradation Could be a candidate for the grand unified lookup Full flow lookup can understand connections. Free flow logging etc New garbage collection (GC) possible. Active GC stated AGC in the paper. Listen to TCP SYN, FIN and RST Show to be performance winner.
Trash data-structure
Uppsala Universitet core router
Trash data-structure
Very flat(fast) trees
Fully parallel router
multi-queue breakthrough
Load from one incoming 10g interface can be split among several CPU-cores Using RSS (Receiver Scale Option). New NIC HW classifier MSI-X interrupts affinity for RX, TX so a packet a skb is handled by one CPU core. Breakthrough forwarding and for networking in general.
Fully parallel router concept
multi-queue breakthrough
In experiment we used Intel 82598 adapters. Intel follows MS NDIS 6.0 for virtualization SUN's 10g board has a more potent HW classifier aka TCAM. Potent classifiers can yet another breakthrough for both functions and performance. Control plane separation, (routing daemons) QoS, filters etc.
Fully parallel router
multi-queue breakthrough
Flow load. 31.000 fib_lookups/sec BGP table w. 271.064 routes Different 3 packet sizes 64 bytes 45% 576 bytes 25% 1500 bytes 30% RSS and Multi-Queue (RX and TX) in use Linux 2.6.27-rc2 ixgbe-1.3.31.5 + patches Using 2/4 CPU cores from AMD Barcelona 2.3 GHz Forwarding:: 6.2 Gbit/s (960 kpps)
10g boards
multi-queue breakthrough
SUN's seems to use XFP's. Anyone using it.... Other boards with SFP/SFP+/XFP ??
A new network symbol has been seen...
The Penguin Has Landed
118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85
2 3 3 1 1 3 3 3 3 2 1 1 1 1
.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3
SLU's nät (inte hela)
118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85
2 3 3 1 1 3 3 3 3 2 1 1 1 1
.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3
BGP policy routing ISP:er (SUNET)
- ch Knupunkt.
118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85
2 3 3 1 1 3 3 3 3 2 1 1 1 1
.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3
Redundant inre kärna
118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85
2 3 3 1 1 3 3 3 3 2 1 1 1 1
.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3
Redundant ansluting av tunga servernät via router discovery
118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85
2 3 3 1 1 3 3 3 3 2 1 1 1 1
.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3