Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala - - PowerPoint PPT Presentation

linux kongress 2008 hamburg germany robert olsson uppsala
SMART_READER_LITE
LIVE PREVIEW

Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala - - PowerPoint PPT Presentation

Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala University Robert Olsson Uppsala University 2008-10-10 Over 10 years in production Thr e e m a j or i ns t a l l a t i ons UU c or e r out e r s t owa r


slide-1
SLIDE 1

Linux Kongress 2008 Hamburg/Germany Robert Olsson Uppsala University Robert Olsson Uppsala University 2008-10-10

slide-2
SLIDE 2

Over 10 years in production

  • Thr

e e m a j

  • r

i ns t a l l a t i

  • ns
  • UU

c

  • r

e r

  • ut

e r s t

  • wa

r ds SUNET

  • UU

St ude nt Ne t wor k 30. 000 s t ude nt s

  • f

t p. s une t . s e

slide-3
SLIDE 3

Over 10 years in production

UU facts Over 25.000 registered hosts Dual ISP BGP connect GIGE Local DMZ BGP peering GIGE Ipv4/Ipv6 OSPFv2/OSPFv3 600 netfilter rules 10 Cisco 6500 OSPF-routers Redundant Power 10g planned

slide-4
SLIDE 4

Over 10 years in production

slide-5
SLIDE 5

Over 10 years in production The SUNET FTP ARCHIVE

ftp ftp ftp ftp DMZ AS15980 AS1653

Juniper Juniper

Bifrost LINUX Bifrost LINUX

Router discovery IRDP

Full Internet routing IPv4, IPv6

10TB

slide-6
SLIDE 6

Over 10 years in production

Student Network facts Dual ISP BGP connect GIGE Local DMZ BGP peering GIGE Ipv4 IRDP (ICMP) About 30 netfilter rules 19 netlogin-service boxes for premises Very “innovative” users Well connected 10g planned

slide-7
SLIDE 7

Over 10 years in production

slide-8
SLIDE 8

Over 10 years in production Student Network Core Router

slide-9
SLIDE 9

IP-login installation

at Uppsala University

Approx 1000 outlets

slide-10
SLIDE 10

Testing, Verification Development & Research

  • St

a r t e d

  • ut

a s s i m pl e t e s t i ng.

  • Cur

i

  • s

i t y, Ope n Sour c e , Col l a bor a t i

  • n
  • Re

l a t i ve l y f r e e dom , t he i de a t

  • us

e i n

  • wn

i nf r a s t r uc t ur e . No ne e d f

  • r

e xt e r na l f undi ng.

  • OS

wa s i nt e nde d f

  • r

de s kt

  • ps

.

slide-11
SLIDE 11

Building Blocks

Hardware: PC Motherbord/CPU/Memory Network Interfaces GIGE/10g WiFi etc Software Operating System Linux/BSD/Microsoft Applications Routing Daemons Quagga/XORP IP-login/netlogon Network Cable, Fiber, Copper Equipment, Switches

slide-12
SLIDE 12

Testing, Verification Development & Research

No ne e d f

  • r

t e s t ne t wor k. W e c

  • ul

d t e s t i n

  • wn

i nf r a s t r uc t ur e . ( Or SLU) W e c

  • ul

d wor k

  • n

c

  • m pl

i c a t e d i s s ue s

  • NAPI

3 ye a r s

  • Pkt

ge n 2 ye a r s

  • f

i b_t r i e 1ye a r

  • TRASH

1 ye a r

  • Ha

r dwa r e Te s t i ng M a ny ye a r s

slide-13
SLIDE 13

Tested device

Flexible netlab at Uppsala University

* Raw packet performance * TCP * Timing * Variants

sink device linux

El cheapo-- High customable -- We write code :-) Ethernet | |

Test generator linux

Ethernet

slide-14
SLIDE 14

netlab at UU

slide-15
SLIDE 15

PIII for many years ftp.sunet.se Dual-Power supply

slide-16
SLIDE 16

Intel NIC's

slide-17
SLIDE 17

Latest & Greatest Hardware

Intel 10g board Chipset 82598 Open chip specs. Thanks Intel! But why fixed XFP's?? Better classifier needed.

slide-18
SLIDE 18

Latest & Greatest Hardware

2U Hi-End Opteron box TYAN S2927/Barcelona

slide-19
SLIDE 19

Not all were blessed...

slide-20
SLIDE 20

Memory Latency

lat_mem_rd from LMbench

slide-21
SLIDE 21

Quad vs Dual Core Opteron

2U Hi-End Opteron box TYAN S2927/Barcelona

Dual

  • Cor

e 2222 3. G Hz Q uad- C or e 2365 2. 3 G Hz 100 200 300 400 500 600 700 800 900

Surprising! One CPU core on 2.3 GHz is faster then is the 3.0 GHz Dual-Core. L3 cache, Microcode?

slide-22
SLIDE 22

Bifrost concept

  • Linux kernel collaboration
  • Performance testing, development of tools and

testing techniques

  • Hardware validation, support from big vendors
  • Detect and cure problems in lab not in the network

infrastructure.

  • Test deploy (Often in own network)
slide-23
SLIDE 23

The Linux Ashram

The guru is ANK

slide-24
SLIDE 24

Kernel footprints

HW_FLOWCONTROL Tulip FASTROUTE path Whitehole device. In the middle of dev.c Hardwired IP addresses. (Russian?)

slide-25
SLIDE 25

Overall Effect

  • Inelegant handling of heavy net loads

– System collapse

  • Scalability affected

– System and number of NICS

  • A single hogger netdev can bring the system to

its knees and deny service to others

10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60

Sum m ar y 2. 4 vs f eedback

March 15 report on lkml

Thread: "How to optimize routing perfomance" reported by Marten.Wikstron@framsfab.se

  • Linux 2.4 peaks at 27Kpps
  • Pentium Pro 200, 64MB RAM
slide-26
SLIDE 26

A high level view of new system

P

pkt s

Interupt area Polling area

➔P packets to deliver to the stack (on the RX ring) ➔Horizontal line shows different netdevs with different in ➔Area under curve shows how many packets before next ➔Quota enforces fair share

Quota

slide-27
SLIDE 27

NAPI observations & issue: fairness

I dl e DoS 100 200 300 400 500 600

122 123 99 380 95 408 95 541 93 254 101 323 105 540 96 202 96 190

Pi ng l at ency/ f ai r ness under xt r em e l

  • ad/

SM P

Pi ng l at ency i n m i cr

  • seconds

Ping through a idle router Ping through a router under a DoS attack @ 890 kpps

V a e

Very well behaved just an increase a couple of 100 microsec !!

slide-28
SLIDE 28

NAPI Kernel support

NAPI kernel part was included in: 2.5.7 and back ported to 2.4.20 Current driver support: e1000 Intel GIGE NIC's – (UFO driver) First driver where (RX & TX done in softirq) tg3 BroadCom GIGE NIC's dl2k D-Link GIGE NIC's tulip (pending) 100 Mbs

slide-29
SLIDE 29

Forwarding performance (old)

64 128 256 512 1024 1518 100 200 300 400 500 600 700 800 900

Li nux f

  • r

war di ng r at e at di f f er ent pkt si zes Li nux 2. 5. 58 UP/ skb r ecycl i ng 1. 8 G Hz XEO N

I nput Thr

  • ughput

packet si ze kpps

Fills a GIGE pipe -- starting from256byte pkts

slide-30
SLIDE 30

ipv6 performance(old)

T-put 100 200 300 400 500 600 700

Forwarding kpps 76 byte pkt.

Linux 2.5.12 1 CPU(SMP) Opteron 1.6 GHz e1000

Single flow small Singe flow 543 r rDoS 543 r

How rDoS work on sparse routing table?

slide-31
SLIDE 31

fib_trie performance comparison

fib_hash fib_trie 100 200 300 400 500 600 700

forwarding kpps

Linux 2.6.16 1 CPU used(SMP) Opteron 1.6 GHz e1000

dsh hash 5 r single flow 5 r rDoS 123kr rDoS

Preroute patches to disable route hash

slide-32
SLIDE 32

32/64 bit || sizeof(sk_buff)

32 64 50 100 150 200 250 300

sizeof(struct sk_buff)

size

64 bit 32 bit 0.1 0.2 0.3 0.4 0.5 0.6

relative forwarding

T-put

Gcc 3.4 x86_64 vs i686 on same HW

slide-33
SLIDE 33

Trash data-structure

Interesting novel approach. Trie-Hash --> Trash When extending the LC-trie Paper with Stefan Nilsson/KTH Exploits that key-length does not affect tree depth We lengthen the so key it can be better compressed. Implemented in Linux forwarding patch as a replacement to the route hash.

slide-34
SLIDE 34

Trash data-structure

Can do full key lookup. src/dst/sport/dport/proto/if etc and later socket. For even ip6 with little performance degradation Could be a candidate for the grand unified lookup Full flow lookup can understand connections. Free flow logging etc New garbage collection (GC) possible. Active GC stated AGC in the paper. Listen to TCP SYN, FIN and RST Show to be performance winner.

slide-35
SLIDE 35

Trash data-structure

Uppsala Universitet core router

slide-36
SLIDE 36

Trash data-structure

Very flat(fast) trees

slide-37
SLIDE 37

Fully parallel router

multi-queue breakthrough

Load from one incoming 10g interface can be split among several CPU-cores Using RSS (Receiver Scale Option). New NIC HW classifier MSI-X interrupts affinity for RX, TX so a packet a skb is handled by one CPU core. Breakthrough forwarding and for networking in general.

slide-38
SLIDE 38

Fully parallel router concept

multi-queue breakthrough

In experiment we used Intel 82598 adapters. Intel follows MS NDIS 6.0 for virtualization SUN's 10g board has a more potent HW classifier aka TCAM. Potent classifiers can yet another breakthrough for both functions and performance. Control plane separation, (routing daemons) QoS, filters etc.

slide-39
SLIDE 39

Fully parallel router

multi-queue breakthrough

Flow load. 31.000 fib_lookups/sec BGP table w. 271.064 routes Different 3 packet sizes 64 bytes 45% 576 bytes 25% 1500 bytes 30% RSS and Multi-Queue (RX and TX) in use Linux 2.6.27-rc2 ixgbe-1.3.31.5 + patches Using 2/4 CPU cores from AMD Barcelona 2.3 GHz Forwarding:: 6.2 Gbit/s (960 kpps)

slide-40
SLIDE 40

10g boards

multi-queue breakthrough

SUN's seems to use XFP's. Anyone using it.... Other boards with SFP/SFP+/XFP ??

slide-41
SLIDE 41

A new network symbol has been seen...

The Penguin Has Landed

slide-42
SLIDE 42

118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85

2 3 3 1 1 3 3 3 3 2 1 1 1 1

.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3

SLU's nät (inte hela)

slide-43
SLIDE 43

118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85

2 3 3 1 1 3 3 3 3 2 1 1 1 1

.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3

BGP policy routing ISP:er (SUNET)

  • ch Knupunkt.
slide-44
SLIDE 44

118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85

2 3 3 1 1 3 3 3 3 2 1 1 1 1

.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3

Redundant inre kärna

slide-45
SLIDE 45

118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85

2 3 3 1 1 3 3 3 3 2 1 1 1 1

.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3

Redundant ansluting av tunga servernät via router discovery

slide-46
SLIDE 46

118.1 expgw.data ultrouter6 ultrouter7 ultGC-gw Switch HVC knutpunkt 193.10.131.0/24 SLU2 SLU1 ultgw-2 ultgw-1 ultKC-gw KC GC 127.7 HVC 127.2 HVC 127.1 DC 130. 242. 127.54 127.53 127.57 127.58 127.62 127.61 127.69 127.70 127.45 127.46 127.86 UU DC DC 127.6 193.10.131 127.82 127.81 96.2 96.61 98.2 98.61 GigaSUNET skara-gw 127.101 127.102 ..233.33/24 34 Mb 88.34/30 88.33/30 DC HVC 127.21 127.22 88.50/30 88.49/30 80.74/32 80.73/32 ultrouter8 127.8 HVC 127.17 127.18 DMZ UU/ITS ultrouter9 127.9 HVC 127.13 127.14 127.85

2 3 3 1 1 3 3 3 3 2 1 1 1 1

.5 .4 /24 /24 e1 e1 e5 e5 e4 e4 e 3 e3 e6 e6 e2 e9 e7 e8 e0 e0 e0 e2 e3 e0 e9 e10 e0 e2 e3 e1 e0 e1 e0 e2 e3 e10 e1 e3