Guarantee IP Lookup Performance with FIB Explosion
Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG)
with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao - - PowerPoint PPT Presentation
Guarantee IP Lookup Performance with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG) Performance Issue in IP Lookup FIB increasing: 15% per year; FIB size:
Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG)
FIB increasing: 15% per year; FIB size: 512,000 512k bug : In 2014.8, Cisco says that web browsing speeds could slow over the next week as old hardware is upgraded to handle the 512K FIB.
2
512k now FIB
On-chip vs. Off-chip memory. 10 times faster, but limited in size. With FIBs increasing, for almost all packets
3
Constant yet small footprint for FIB: On-chip Memory Constant yet fast lookup speed: Low Time Complexity
Ideal IP Lookup Algorithm
Achieving constant IP lookup time
– TCAM-based – Trie pipeline using FPGA – full-expansion – DIR-24-8
Achieving small memory
– Based on Bloom Filter – Level compression, path compression – LC-trie
4
5
Observation: almost all packets hit 0~24 prefixes Two Splitting
– Splitting lookup process – Splitting prefix length
On-chip Finding prefix length Prefix length 0~24 Prefix length 25~32 Off-chip Off-chip Off-chip Finding next hop
1
Bitmap arrays
1 0 1 1 0 0 1 1 … 0 1 1 0 1 1 … 1 1 1 0 1 1 … 1 1 6 8 0 3 1 0 0 9 2 … 0 3 1 0 7 1 … 2 1 3 0 4 5 … 1 1
Next hop arrays
6
Level 0~24 Short prefixes
Original trie
Bit Maps 0-24 On-Chip 𝑗=0 24
2𝑗 = 4𝑁𝐶
How to avoid searching both short and long prefixes?
Level 25~32 Long prefixes
7
prefix next- hop */0 6 1*/1 4 01*/2 3 001*/3 3 FIB 111*/3 7 0011*/4 1 1110*/4 8 11100*/5 2
3 3 1 9 4 7 8 2 6 1 1 1 1 1 1 1 … 1
level 0 level 1 level 2
3 8 A B C
001011*/6 9
D E F G H
Trie Bit maps
3 7 B0 B1 B2 B3 B4 N3
(a) (b) (c)
O 1 … N4 3
level 3
Lookup 001010 Pivot level: 4 B4 [001010 >> 2] = 1 N4 [2] = 0 long prefix
Pivot push:
8
prefix next- hop */0 6 1*/1 4 01*/2 3 001*/3 3 FIB 111*/3 7 0011*/4 1 1110*/4 8 11100*/5 2
3 3 1 9 4 7 8 2 6 1 1 1 1 1 1 1 … 1
level 0 level 1 level 2
3 8 A B C
001011*/6 9
D E F G H
Trie Bit maps
3 7 B0 B1 B2 B3 B4 N3
(a) (b) (c)
O 1 … N4 3
level 3
Insert 10* 1 B2[10]=1 delete111* B3[111]=0 changing 001*, or inserting 0010*
SAIL_B
– Lookup: 25 on-chip memory accesses in worst case – Update: 1 on-chip memory access
Lookup Oriented Optimization (SAIL_L)
– Lookup: 2 on-chip memory accesses in worst case – Update: unbounded, low average update complexity
Update Oriented Optimization (SAIL_U)
– Lookup: 4 on-chip memory accesses in worst case – Update: 1 on-chip memory access
Extension: SAIL for Multiple FIBs (SAIL_M)
9
10
Level 24 Level 32 Level 16
If B16==1 If B24==1 N16 N24 Y Y N32 N N
11
Level 6 Level 12 Level 18 Level 24
and 24.
2^6= 64 bits in the bitmap array.
12
A: 00* C: 10* G: 110* A:00* C:10* E:100*
Trie 1 Trie 2 Overlay Trie
A: 00* B: 01* E: 100* F: 101* G: 110* H: 111* A A C D C G C E F G B H E A
D
(a) (b) (c)
13
On-Chip Memory Lookup (on-chip) Update (on-chip) SAIL_B = 4MB 25 1 SAIL_L ≤ 2.13MB 2 Unbounded SAIL_U ≤ 2.03MB 4 1 SAIL_M ≤ 2.13MB 2 Unbounded Worst case: 2 off-chip memory accesses for lookup
FPGA: Xilinx ISE 13.2 IDE; Xilinx Virtex 7 device; On- chip memory is 8.26MB
– SAIL_B, SAIL_U, and SAIL_L
Intel CPU: Core(TM) i7-3520M 2.9 GHz; 64KB L1, 512KB L2, 4MB L3; DRAM 8GB
– SAIL_L and SAIL_M
GPU: NVIDIA GPU (Tesla C2075, 1147 MHz, 5376 MB device memory, 448 CUDA cores), Intel CPU (Xeon E5- 2630, 2.30 GHz, 6 Cores).
– SAIL_L
Many-core: TLR4-03680, 36 cores, each 256K L2 cache.
– SAIL_L
14
FIBs
– Real FIB from a tier-1 router in China – 18 real FIBs from www.ripe.net
Traces
– Real packet traces from the same tier-1 router – Generating random packet traces – Generating packer traces according to FIBs
Comparing with
– PBF [sigcomm 03] – LC-trie [applied in Linux Kernel] – Tree Bitmap – Lulea [sigcomm 97 best paper]
15
16
rrc00rrc01rrc03rrc04rrc05rrc06rrc07rrc10rrc11rrc12rrc13rrc14rrc15 0.0B 200.0kB 400.0kB 600.0kB 800.0kB 1.0MB 1.2MB
On-chip memory usage FIB
SAIL_L PBF
SAIL Algorithms Lookup Speed Throughput SAIL_B 351Mpps 112Gbps SAIL_U 405Mpps 130Gbps SAIL_L 479Mpps 153Gbps
17
1 2 3 4 5 6 7 8 9 10 11 12 100 200 300 400 500 600 700 800
Lookup speed (Mpps) FIB
LC-trie TreeBitmap Lulea SAIL_L
18
2 3 4 5 6 7 8 9 10 11 12 100 200 300 400 500
Lookup speed (Mpps) # of FIBs
Prefix-based traffic Random Trace
19
9 19 29 39 49 59 69 79 89 99 109 119 129 139 149 159 169 179 189 199 209 219 229 239 249 259 269 279 289 299 309 319 2 4 6 8 10 12 14
# of memory accesses per update # of updates (*500)
rrc00 average of rrc00 rrc01 average of rrc01 rrc03 average of rrc03
20
rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 50 100 150 200 250 300 350 400 450 500 550 600 650
Lookup speed (Mpps) FIB
30 60 90
21
rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 20 40 60 80 100 120 140 160 180 200 220 240
Latency (microsecond) FIB
30 60 90
22
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 100M 200M 300M 400M 500M 600M 700M
Lookup speed (pps) # of cores
Two-dimensional Splitting Framework: SAIL Three optimization algorithms
– SAIL_U, SAIL_L, SAIL_M – Up to 2.13MB on-chip memory usage – 2 off-chip memory accesses
Suitable for different platforms
– FPGA, CPU, GPU, Many-core – Up to 673.22~708.71 Mpps
Future work: SAIL to IPv6 lookup
23
24
Source codes of SAIL, LC-trie, Tree Bitmap, and Lulea http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource
http://fi.ict.ac.cn