Perfect Hashing for Network Applications Yi Lu, Balaji Prabhakar - PDF document

Perfect Hashing for Network Applications Yi Lu, Balaji Prabhakar Flavio Bonomi Dept. of Electrical Engineering Cisco Systems Stanford University 175 Tasman Dr Stanford, CA 94305 San Jose, CA 95134 yi.lu,balaji@stanford.edu flavio@cisco.com the set of keys changes drastically. We come up with various Abstract — Hash tables are a fundamental data structure in many network applications, including route lookups, packet heuristics for minimizing the probability of rebuilding. classification and monitoring. Often a part of the data path, A. Perfect Hashing they need to operate at wire-speed. However, several associative memory accesses are needed to resolve collisions, making them 1) Definitions: slower than required. This motivates us to consider minimal • Perfect Hash Function: Suppose that S is a subset of size perfect hashing schemes, which reduce the number of memory n of the universe U . A function h mapping U into the accesses to just 1 and are also space-efficient. integers is said to be perfect for S if, when restricted to Existing perfect hashing algorithms are not tailored for net- S , it is injective [6]. work applications because they take too long to construct and are hard to implement in hardware. • Minimal Perfect Hash Function: Let | S | = n and | U | = This paper introduces a hardware-friendly scheme for minimal u . A perfect hash function h is minimal if h ( S ) equals perfect hashing, with space requirement approaching 3 . 7 times { 0 , ..., n − 1 } [6]. the information theoretic lower bound. Our construction is 2) Performance Parameters: several orders faster than existing perfect hashing schemes. Instead of using the traditional mapping-partitioning-searching • Encoding size : The number of bits needed to store the methodology, our scheme employs a Bloom filter, which is known representation of h . for its simplicity and speed. We extend our scheme to the dynamic • Evaluation time : The time needed to compute h ( x ) for setting, thus handling insertions and deletions. x ∈ u . I. I NTRODUCTION • Construction time : The time needed to compute h . Hash tables constitute an integral part of many network Previous Work. Fredman and Koml´ os used a counting argu- applications. For instance, when performing IP address lookup ment to prove a worst-case lower bound of n log e +log log u − at a router, one or more hash tables are queried to determine O (log n ) for the encoding size of a minimal perfect hash function, provided that u ≥ n 2+ ǫ [7]. The bound is almost the egress port for an arriving packet. Hash tables are also used in packet classification, per-flow state maintenance, and tight as the upper bound given by Mehlhorn is n log e + network monitoring. Given the high operating speeds of to- log log u + O (log n ) bits [8]. However, Mehlhorn‘s algorithm has a construction time of order n Θ( ne n u log u ) . day’s network links, hash tables need to respond to queries in few tens of nanoseconds. One often-used approach to search for a minimal perfect Despite the advance in the embedded memory technology, hash function involves three stages: mapping, partitioning and it is still not possible to accommodate a hash table, often with searching. Mapping finds an injective function on S with a hundreds of thousands of entries, in an on-chip memory [1]. smaller range. Partitioning separates the keys into subgroups. Therefore, hash tables are stored in larger but slower off-chip And searching finds a hash value for each subgroup so that memories. It is very important to minimize the number of the resulting function is perfect . More details can be found in off-chip memory accesses and there has been much work on [9], [7]. this recently. For example, Song et. al. [1] proposed a fast Fredman, Koml´ os and Szemer´ edi constructed a data struc- hash table based on Bloom filters [2] and the d -left scheme ture that uses space n + o ( n ) and accommodates membership [3], while Kirsch and Mitzenmacher [4] proposed an on-chip queries in constant time [10]. Fox et. al. [9] constructed an summary that speeds up accesses to an off-chip, multi-level algorithm for large data sets whose encoding size is very hash table, originally proposed by Broder and Karlin [5]. close to the theoretical lower bound, i.e., around 2 . 5 bits per Our approach differs from the above in the construction key. They also carried out experiments on 3 . 8 million keys phase: we construct a perfect hash function on-chip without and the construction time was 6 hours on a NeXT station. consulting the off-chip memory. Moreover, the off-chip mem- Separately, Hagerup and Tholey achieved n log e +log log u + ory is a simple list storing each key and its corresponding o ( n + log log u ) encoding space, constant lookup time and item; there is no additional structure to the list. Finally, the O ( n + log log u ) expected construction time using similar space we use, both on-chip and off-chip, is smaller and our approaches [6]. scheme adapts well to the dynamic situation, allowing us to The dynamic perfect hashing problem was considered by perform insertions and deletions in constant time. A drawback Dietzfelbinger et. al. [11]. Their scheme takes O (1) worst- case time for lookups and O (1) amortized expected time for of our scheme (and, indeed of any perfect hashing scheme) in insertions and deletions; it uses O ( n ) space. the dynamic setting is that it requires a complete rebuild if

Perfect Hashing for Network Applications Yi Lu, Balaji Prabhakar - PDF document

Perfect Hashing for Network Applications Yi Lu, Balaji Prabhakar Flavio Bonomi Dept. of Electrical Engineering Cisco Systems Stanford University 175 Tasman Dr Stanford, CA 94305 San Jose, CA 95134 yi.lu,balaji@stanford.edu flavio@cisco.com

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Advanced Algorithms COMS31900 Hashing part two Static Perfect Hashing Rapha el Clifford

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Hashing What is it? A form of narcotic intake? A side order for your eggs? A

Lecture 8: Hashing I Lecture Overview Dictionaries and Python Motivation Prehashing

Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files

Outline Computer Security: Hashing Hashes Hash applications Bart Jacobs Road pricing example

The Perfect Sales Presentation Shook Robert L The Perfect Sales Presentation Shook Robert L The

Perfect for a New England Vacation! Perfect for a New England Vacation! Sarah Santiago

Study of a glass-forming liquid in a confined geometry : correlation lengths and interfaces of

A control and systems theory approach to the high gradient cavity detuning compensation R.

CIPS PAN AFRICAN CONFERENCE GHANA 22 MAY 2013 Presented by: Nat Maelane Sappi/Nat Maelane/CIPS

to Investment Community 30 September 2019 Financial Highlights - Group Revenue growth at Group

JAKE POLLACK NEW YORK POLICE ACADEMY STRUCTURAL OPTION COLLEGE POINT, NEW YORK AE 897G

Carroll County Outdoor School Community Connections Outdoor School History Began in 1965 at

Transatlantic Environmental Cooperation at the Subnational Level Markus Knigge John J. McCloy

Rochester Wastewater Treatment Facility Aeration Blower Upgrade Project Project Drivers The

Sambuz

Useful Links

Newsletter

Mail Us