Fast Software Cache Design for Network Appliances Dong Zhou, - - PowerPoint PPT Presentation

fast software cache design for network appliances
SMART_READER_LITE
LIVE PREVIEW

Fast Software Cache Design for Network Appliances Dong Zhou, - - PowerPoint PPT Presentation

Fast Software Cache Design for Network Appliances Dong Zhou, Huacheng Yu, Michael Kaminsky, David G. Andersen Flow Caching in Open vSwitch Microflow Cache Exact Match Single Hash Table 2 Flow Caching in Open vSwitch srcAddr=10.1.2.3,


slide-1
SLIDE 1

Fast Software Cache Design for Network Appliances

Dong Zhou, Huacheng Yu, Michael Kaminsky, David G. Andersen

slide-2
SLIDE 2

Flow Caching in Open vSwitch

2

Microflow Cache Exact Match Single Hash Table

slide-3
SLIDE 3

Flow Caching in Open vSwitch

3

srcAddr=10.1.2.3, dstAddr=12.4.5.6, srcPort=15213, dstPort=80 à output: 1 srcAddr=12.4.5.6, dstAddr=10.1.2.3, srcPort=80, dstPort=15213 à output: 2 srcAddr=12.4.5.6, dstPort=13.1.2.3, srcPort=80, dstPort=15213 à drop Microflow Cache Exact Match Single Hash Table

slide-4
SLIDE 4

Flow Caching in Open vSwitch

4

Megaflow Cache

Wildcard Match without Priority Multiple Masked Tables

Miss Microflow Cache Exact Match Single Hash Table

slide-5
SLIDE 5

Flow Caching in Open vSwitch

5

srcAddr=10.0.0.0/8, dstAddr=12.0.0.0/8, srcPort=*, dstPort=* à output: 1 srcAddr=12.0.0.0/8, dstAddr=10.0.0.0/8, srcPort=*, dstPort=* à output: 2 srcAddr=*, dstPort=13.0.0.0/8, srcPort=*, dstPort=* à drop Megaflow Cache

Wildcard Match without Priority Multiple Masked Tables

Miss Microflow Cache Exact Match Single Hash Table

slide-6
SLIDE 6

Flow Caching in Open vSwitch

6

Packet Classifier Multiple OpenFlow Tables Miss Megaflow Cache

Wildcard Match without Priority Multiple Masked Tables

Miss Microflow Cache Exact Match Single Hash Table

slide-7
SLIDE 7

Flow Caching in Open vSwitch

7

Packet Classifier Multiple OpenFlow Tables Miss Match Action srcAddr==10.0.0.0/8, dstAddr==12.0.0.0/8

  • utput:1

srcAddr==12.0.0.0/8, dstAddr==10.0.0.0/8

  • utput:2

Megaflow Cache

Wildcard Match without Priority Multiple Masked Tables

Miss Microflow Cache Exact Match Single Hash Table

slide-8
SLIDE 8

Flow Caching in Open vSwitch

8

Packet Classifier Multiple OpenFlow Tables Miss Megaflow Cache

Wildcard Match without Priority Multiple Masked Tables

Miss Microflow Cache Exact Match Single Hash Table

8x!

  • Cache Hit Rate
  • Lookup Latency
slide-9
SLIDE 9

Basic Cache Design

k h(k)

  • oversubscription factor α = # keys / #

entries

  • Assumption
  • uniform workload
  • random eviction
  • α = 0.95
  • 81% cache hit rate

4-way set-associative bucket

9

slide-10
SLIDE 10

Cache Design: Increase Set-Associativity

k h(k) 8-way set-associative bucket

81 à 87% cache hit rate

4-way set-associative bucket

10

slide-11
SLIDE 11

Cache Design: More Candidate Buckets

81 à ~99% cache hit rate

4-way set-associative bucket

11

k h1(k) h2(k)

Cuckoo hashing

slide-12
SLIDE 12

Our Solution: Bounded Linear Probing (BLP)

4-way set-associative bucket k h(k) k’ h(k’)

  • verlapped

bucket 2 buckets

2,4 BLP

12

81 à ~94% cache hit rate

slide-13
SLIDE 13

Qualitative Comparison

13

Design Lookup Speed (cache line reads) Hit Rate 4-way set-assoc. 1 ~ 81% 8-way set-assoc. 1 ~ 87% 2-4 cuckoo 2 random ~ 99% 2-4 BLP 1.5 consecutive ~ 94%

slide-14
SLIDE 14

Qualitative Comparison

14

Design Lookup Speed (cache line reads) Hit Rate 4-way set-assoc. 1 ~ 81% 8-way set-assoc. 1 ~ 87% 2-4 cuckoo 2 random ~ 99% 2-4 BLP 1.5 consecutive ~ 94%

slide-15
SLIDE 15

Why BLP is Better Than Set-Assoc.?

15

1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 5 6 6 7

3 7 6 2 3 1 2 1

1 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 5 6 6 7 1 1 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 5 6 6 7

  • ccupancy = 0.71875
  • ccupancy = 0.75
slide-16
SLIDE 16

Qualitative Comparison

16

Design Lookup Speed (cache line reads) Hit Rate 4-way set-assoc. 1 ~ 81% 8-way set-assoc. 1 ~ 87% 2-4 cuckoo 2 random ~ 99% 2-4 BLP 1.5 consecutive ~ 94%

slide-17
SLIDE 17

Qualitative Comparison

17

Design Lookup Speed (cache line reads) Hit Rate 4-way set-assoc. 1 ~ 81% 8-way set-assoc. 1 ~ 87% 2-4 cuckoo 2 random ~ 99% 2-4 BLP 1.5 consecutive ~ 94%

slide-18
SLIDE 18

Better Cache Replacement

  • Traditional LRU

– High space overhead – CLOCK: 1 bit / key

  • Our Solution: Probabilistic Bubble LRU

(PBLRU)

18

slide-19
SLIDE 19

PBLRU: Bubbling

19

D h(D) A B C D A B D C

Promotion

slide-20
SLIDE 20

PBLRU: Bubbling

20

X h(X) A B D C A B D X

Eviction

slide-21
SLIDE 21

PBLRU

  • Basic bubbling

– Combines both recency and frequency information

  • Probabilistic bubbling

– We only promote every n-th cache hit to reduce the number of memory writes

  • Applying to 2-4 BLP

– We choose a random bucket to apply bubbling

21

slide-22
SLIDE 22

Evaluation

22

Traffic Generator Virtual Switch

Port 0

TX cores RX cores

Port 1

Ethernet

slide-23
SLIDE 23

0.6 0.8 1.0 1.2 1.4 1.6 1.8 3 4 5 6 7 8 9 10 Throughput (Mpps)

Uniform

4-way 4-way w/ SIMD 8-way w/ SIMD 2-4 cuckoo-lite 2-4 BLP w/ PBLRU

Throughput (Uniform)

23

15% higher tput

slide-24
SLIDE 24

0.50 0.75 1.00 1.25 1.50 1.75 60 80 100 120 140 Lookup Latency (Cycles) 0.50 0.75 1.00 1.25 1.50 1.75 50 60 70 80 90 100 Cache Hit Rate 4-way 4-way w/ SIMD 8-way w/ SIMD 2-4 cuckoo-lite 2-4 BLP 2-4 BLP w/ PBLRU

Lookup Latency and Hit Rate

24

cache hit rate improvement is not enough to compensate for its higher lookup latency better better

slide-25
SLIDE 25

0.6 0.8 1.0 1.2 1.4 1.6 1.8 7 8 9 10 Throughput (Mpps) 4-way 4-way w/ SIMD 8-way w/ SIMD 2-4 Cuckoo 2-4 BLP 2-4 BLP w/ PBLRU

Throughput (Skewed)

25

7.5% higher tput

slide-26
SLIDE 26

Lookup Latency and Hit Rate

26

slide-27
SLIDE 27

Summary

  • Bounded Linear Probing
  • Probabilistic Bubble LRU
  • Balance between Cache Hit Rate and Lookup

Latency

27

slide-28
SLIDE 28

Thank You!

28