with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao - - PowerPoint PPT Presentation

with fib explosion
SMART_READER_LITE
LIVE PREVIEW

with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao - - PowerPoint PPT Presentation

Guarantee IP Lookup Performance with FIB Explosion Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG) Performance Issue in IP Lookup FIB increasing: 15% per year; FIB size:


slide-1
SLIDE 1

Guarantee IP Lookup Performance with FIB Explosion

Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT) Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG)

slide-2
SLIDE 2

Performance Issue in IP Lookup

 FIB increasing: 15% per year; FIB size: 512,000  512k bug : In 2014.8, Cisco says that web browsing speeds could slow over the next week as old hardware is upgraded to handle the 512K FIB.

2

512k now FIB

slide-3
SLIDE 3

Motivation

 On-chip vs. Off-chip memory. 10 times faster, but limited in size.  With FIBs increasing, for almost all packets

3

Constant yet small footprint for FIB: On-chip Memory Constant yet fast lookup speed: Low Time Complexity

+

Ideal IP Lookup Algorithm

slide-4
SLIDE 4

State-of-the-art

 Achieving constant IP lookup time

– TCAM-based – Trie pipeline using FPGA – full-expansion – DIR-24-8

 Achieving small memory

– Based on Bloom Filter – Level compression, path compression – LC-trie

How to satisfy both constant lookup time and small on-chip memory usage?

4

slide-5
SLIDE 5

SAIL Framework

5

 Observation: almost all packets hit 0~24 prefixes  Two Splitting

– Splitting lookup process – Splitting prefix length

On-chip Finding prefix length Prefix length 0~24 Prefix length 25~32 Off-chip Off-chip Off-chip Finding next hop

slide-6
SLIDE 6

1

Bitmap arrays

1 0 1 1 0 0 1 1 … 0 1 1 0 1 1 … 1 1 1 0 1 1 … 1 1 6 8 0 3 1 0 0 9 2 … 0 3 1 0 7 1 … 2 1 3 0 4 5 … 1 1

Next hop arrays

Splitting

6

Level 0~24 Short prefixes

Original trie

Bit Maps 0-24 On-Chip 𝑗=0 24

2𝑗 = 4𝑁𝐶

How to avoid searching both short and long prefixes?

Level 25~32 Long prefixes

slide-7
SLIDE 7

Pivot Pushing & Lookup

7

prefix next- hop */0 6 1*/1 4 01*/2 3 001*/3 3 FIB 111*/3 7 0011*/4 1 1110*/4 8 11100*/5 2

3 3 1 9 4 7 8 2 6 1 1 1 1 1 1 1 … 1

level 0 level 1 level 2

3 8 A B C

001011*/6 9

D E F G H

Trie Bit maps

3 7 B0 B1 B2 B3 B4 N3

(a) (b) (c)

O 1 … N4 3

level 3

Lookup 001010 Pivot level: 4 B4 [001010 >> 2] = 1 N4 [2] = 0 long prefix

Pivot push:

slide-8
SLIDE 8

Update of SAIL_B

8

prefix next- hop */0 6 1*/1 4 01*/2 3 001*/3 3 FIB 111*/3 7 0011*/4 1 1110*/4 8 11100*/5 2

3 3 1 9 4 7 8 2 6 1 1 1 1 1 1 1 … 1

level 0 level 1 level 2

3 8 A B C

001011*/6 9

D E F G H

Trie Bit maps

3 7 B0 B1 B2 B3 B4 N3

(a) (b) (c)

O 1 … N4 3

level 3

Insert 10* 1 B2[10]=1 delete111* B3[111]=0 changing 001*, or inserting 0010*

  • nly need to update off-chip tables
slide-9
SLIDE 9

Optimization

 SAIL_B

– Lookup: 25 on-chip memory accesses in worst case – Update: 1 on-chip memory access

 Lookup Oriented Optimization (SAIL_L)

– Lookup: 2 on-chip memory accesses in worst case – Update: unbounded, low average update complexity

 Update Oriented Optimization (SAIL_U)

– Lookup: 4 on-chip memory accesses in worst case – Update: 1 on-chip memory access

 Extension: SAIL for Multiple FIBs (SAIL_M)

9

slide-10
SLIDE 10

10

Level 24 Level 32 Level 16

SAIL_L

If B16==1 If B24==1 N16 N24 Y Y N32 N N

slide-11
SLIDE 11

SAIL_U

11

Level 6 Level 12 Level 18 Level 24

  • Pushing to levels 6, 12, 18,

and 24.

  • One update at most affects

2^6= 64 bits in the bitmap array.

Still at most one on-chip memory access is enough for each update.

slide-12
SLIDE 12

SAIL_M

12

A: 00* C: 10* G: 110* A:00* C:10* E:100*

Trie 1 Trie 2 Overlay Trie

A: 00* B: 01* E: 100* F: 101* G: 110* H: 111* A A C D C G C E F G B H E A

+

D

(a) (b) (c)

slide-13
SLIDE 13

SAILs in worst case

13

On-Chip Memory Lookup (on-chip) Update (on-chip) SAIL_B = 4MB 25 1 SAIL_L ≤ 2.13MB 2 Unbounded SAIL_U ≤ 2.03MB 4 1 SAIL_M ≤ 2.13MB 2 Unbounded Worst case: 2 off-chip memory accesses for lookup

slide-14
SLIDE 14

Implementations

 FPGA: Xilinx ISE 13.2 IDE; Xilinx Virtex 7 device; On- chip memory is 8.26MB

– SAIL_B, SAIL_U, and SAIL_L

 Intel CPU: Core(TM) i7-3520M 2.9 GHz; 64KB L1, 512KB L2, 4MB L3; DRAM 8GB

– SAIL_L and SAIL_M

 GPU: NVIDIA GPU (Tesla C2075, 1147 MHz, 5376 MB device memory, 448 CUDA cores), Intel CPU (Xeon E5- 2630, 2.30 GHz, 6 Cores).

– SAIL_L

 Many-core: TLR4-03680, 36 cores, each 256K L2 cache.

– SAIL_L

14

slide-15
SLIDE 15

Evaluation

 FIBs

– Real FIB from a tier-1 router in China – 18 real FIBs from www.ripe.net

 Traces

– Real packet traces from the same tier-1 router – Generating random packet traces – Generating packer traces according to FIBs

 Comparing with

– PBF [sigcomm 03] – LC-trie [applied in Linux Kernel] – Tree Bitmap – Lulea [sigcomm 97 best paper]

15

slide-16
SLIDE 16

FPGA Simulation

16

rrc00rrc01rrc03rrc04rrc05rrc06rrc07rrc10rrc11rrc12rrc13rrc14rrc15 0.0B 200.0kB 400.0kB 600.0kB 800.0kB 1.0MB 1.2MB

On-chip memory usage FIB

SAIL_L PBF

SAIL Algorithms Lookup Speed Throughput SAIL_B 351Mpps 112Gbps SAIL_U 405Mpps 130Gbps SAIL_L 479Mpps 153Gbps

slide-17
SLIDE 17

Intel CPU: real FIB and traces

17

1 2 3 4 5 6 7 8 9 10 11 12 100 200 300 400 500 600 700 800

Lookup speed (Mpps) FIB

LC-trie TreeBitmap Lulea SAIL_L

slide-18
SLIDE 18

Intel CPU: 12 FIBs using prefix-based and random traces

18

2 3 4 5 6 7 8 9 10 11 12 100 200 300 400 500

Lookup speed (Mpps) # of FIBs

Prefix-based traffic Random Trace

slide-19
SLIDE 19

Intel CPU: Update

19

9 19 29 39 49 59 69 79 89 99 109 119 129 139 149 159 169 179 189 199 209 219 229 239 249 259 269 279 289 299 309 319 2 4 6 8 10 12 14

# of memory accesses per update # of updates (*500)

rrc00 average of rrc00 rrc01 average of rrc01 rrc03 average of rrc03

slide-20
SLIDE 20

GPU: Lookup speed VS. batch size

20

rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 50 100 150 200 250 300 350 400 450 500 550 600 650

Lookup speed (Mpps) FIB

30 60 90

slide-21
SLIDE 21

GPU: Lookup latency VS. batch size

21

rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc15 20 40 60 80 100 120 140 160 180 200 220 240

Latency (microsecond) FIB

30 60 90

slide-22
SLIDE 22

Tilera GX-36: Lookup VS. # of cores

22

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 100M 200M 300M 400M 500M 600M 700M

Lookup speed (pps) # of cores

slide-23
SLIDE 23

Conclusion

 Two-dimensional Splitting Framework: SAIL  Three optimization algorithms

– SAIL_U, SAIL_L, SAIL_M – Up to 2.13MB on-chip memory usage – 2 off-chip memory accesses

 Suitable for different platforms

– FPGA, CPU, GPU, Many-core – Up to 673.22~708.71 Mpps

 Future work: SAIL to IPv6 lookup

23

slide-24
SLIDE 24

24

Source codes of SAIL, LC-trie, Tree Bitmap, and Lulea http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource

slide-25
SLIDE 25

Thanks Thanks

http://fi.ict.ac.cn