Hash Table Design and Optimization for Software Virtual Switches
PRESENTER: REN WANG YIPENG WANG, SAMEH GOBRIEL, REN WANG, CHARLIE TAI, CRISTIAN DUMITRESCU INTEL LABS
Hash Table Design and Optimization for Software Virtual Switches - - PowerPoint PPT Presentation
Hash Table Design and Optimization for Software Virtual Switches PRESENTER: REN WANG YIPENG WANG, SAMEH GOBRIEL, REN WANG, CHARLIE TAI, CRISTIAN DUMITRESCU INTEL LABS OUTLINE Background and motivation Survey and understanding
PRESENTER: REN WANG YIPENG WANG, SAMEH GOBRIEL, REN WANG, CHARLIE TAI, CRISTIAN DUMITRESCU INTEL LABS
We found the most common data structure used in virtual switch is hash table.
wildcarding match (tuple space search): routing table, ACL exact match: con-track table, flow cache, etc.
Comparing to tree based data structure, hash table based data structure has certain advantages:
More parallelism: no pointer chasing Faster rule updates
3
Hash table lookup is also one of the most time consuming stage during packet processing:
E.g. Open vSwitch (100k rules, 20 subtables)
A major source of hash table lookup overhead is memory access latency.
4
IO Preprocessing Rule lookup
Execution percentage ~8% ~5% ~78% ~9%
Hash table is a simple data structure, but there are many different design and implementations. Understanding of hash table performance and how to design an efficient hash table structure is the key to a good software switch. A general guideline to hash table designs will benefit future vswitch development.
5
6
Key A Key A Key A
7
Key A
Hash func
Key B Key A Key B
implementations.
8
9
multiple ways per bucket also help a lot.
10
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1 2 4 8 16
load factors vs. assoc.
1 hash 2hash cuckoo
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 90 92 94 96 98 100
Cuckoo hash insertion cycles vs. load factor
index in the table.
show major difference with 16 or 32-byte key size.
performance test.
11 SIG
Key+data
12
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 1 2 4 8 16
miss ratio vs. ways/bucket
miss ratio 5 10 15 insert lookup
speed comparison (lower the better)
avx age ll bplru tplru
lookup
void adjust_location(int location, bucket* bucket){ __m256i array = avx_load(bucket) __m256i permute_pattern = avx_load(permute_index[location]) __m256i permuted_array = avx_permute(array, permute_pattern) avx_store (bucket, permuted_array) }
13
20 40 60 80 100 120 140 no opt. prefetch pipelined
cycle/pkt with prefetch or pipelining
perform signature comparison.
buckets.
14
20 40 60 80 100 120 140
4.2 3.8 2.4 1.05 1
lookup cycle vs. avg entry location
scalar vert hori
15
popular virtual switches.
use cases.
signature comparison.
16