univ ersal hashing b y p eter bro miltersen this le ctur
play

Univ ersal Hashing b y P eter Bro Miltersen This le ctur - PDF document

Univ ersal Hashing b y P eter Bro Miltersen This le ctur e note was written for the c ourse \Pe arls of The ory" at University of A arhus. Most r e c ent r evision, Mar ch 5, 1998. 1. Intr oduction


  1. Univ ersal Hashing b y P eter Bro Miltersen This le ctur e note was written for the c ourse \Pe arls of The ory" at University of A arhus. Most r e c ent r evision, Mar ch 5, 1998. 1. Intr oduction Univ ersal hashing is theory at its b est! Hashing started out as a purely heuristic metho d for implemen ting sym b ol tables. It mo v ed in to the hardcore theory of algorithms with Carter and W egman's analysis of the concept of univ ersalit y . It w en t on to pla y an imp ortan t role in sev eral of the most imp ortan t constructions in abstract complexit y theory and cryptograph y . And no w, these constructions start to creep bac k in to practice. Th us, ha ving matured inside theory , hash- ing gets applied in w a ys the original sym b ol table implemen tors could not ha v e dreamed of ! In this note, w e trac k the exciting career of the hash function. 2. The prehistor y of universal hashing The heuristic concept of hashing, as is no w ada ys kno wn to most (all?) program- mers, w as in tro duced b y Dumey in 1956 [4]. It w as in tro duced as a solution to the sym b ol table problem (no w ada ys called the dictionary problem). In the dictionary problem, w e are giv en a sequence of Inser t ( k , x ), Delete ( k ), and Lookup ( k ) op erations whic h m ust b e p erformed on-line (i.e. one op eration m ust b e completely p erformed, b efore the next is considered) on an initially empt y set S . Inser t ( k , x ) inserts the k ey k with asso ciated information x in to the set, Delete ( k ) deletes the k ey k and its asso ciated information from the set, and Lookup ( k ) returns the information asso ciated with k , if k is indeed in the set. F or simplicit y in the analysis whic h is to come, w e assume that single k eys and single pieces of asso ciated information �t in to single mac hine w ords, but that t w o k eys or t w o pieces of information do not �t in to a mac hine w ord. This is often called, b eliev e it or not, the tr ansdichotomous mo del of computation. Exercise 1 (for language fr e aks) Explain the term tr ansdichotomous. The goal is to p erform the op erations while minimizing the time and space used. The space used is measured in terms of memory registers. In general, w e aim for 1

  2. line ar space, i.e. space comparable to the size of the set b eing stored. Of course, the size of the set v aries as the op erations are p erformed, and this causes some complications in the solutions w e'll lo ok at. F or simplicit y , w e will assume that w e kno w a single upp er b ound N on the size of the set at all times, and w e will allo w ourselv es to use O ( N ) registers, ev en when the set is m uc h smaller (but see Problem 19). Exercise 2 R e c al l some solutions to the dictionary pr oblem. Do they use line ar sp ac e? How fast ar e they? Dumey's solution to the dictionary problem w as the follo wing. Assume the k eys and pieces of information are b oth tak en from the univ erse U . Pic k some \crazy",\c haotic",\random" function h (the hash function) mapping U to f 1 ; : : : ; N g . Initialize an arra y A [1 ::N ]. A t an y giv en time, in A [ i ] w e k eep a link ed list con taining the k eys k curren tly in the set, for whic h h ( k ) = i . F or eac h k ey w e attac h the asso ciated information. This is called chaine d hashing. There are other kinds of hashing whic h w e'll happily ignore. Exercise 3 (for language fr e aks) Why hash -function? Exercise 4 Convinc e yourself that it is fairly simple to pr o gr am this data struc- tur e, not much worse than implementing a single linke d list. In tuitiv ely , it is fairly clear wh y this solution should w ork w ell. If the function h is indeed \crazy", \c haotic" and \random", mapping our set S to f 1 ; : : : ; N g using h should b eha v e as if w e w ere just distributing elemen ts of S at random in N buc k ets. Since the size of S is at most N , w e should exp ect the buc k ets to b e quite small in general. As the crazy function, Dumey suggested h ( x ) = x mo d p for p a prime. Exercise 5 Why a prime?? Hashing is widely used in practice and exp erience sho ws that it do es indeed w ork v ery w ell! But what ab out a rigorous analysis? It is easy to see that the ab o v e in tuition cannot b e formalized so that the argumen t ab o v e will b e true for all sets S . Exercise 6 Why not? Ev en giv en the the answ er to exercise 6, hashing w as in tensely analyzed in the t w o decades follo wing Dumey's in v en tion. The problem exp osed in exercise 6 w as dealt with in t w o di�eren t w a ys. 1. In some pap ers, it is assumed that the set to b e stored is not a w orst case 2

  3. set. Instead, w e assume that it is c hosen according to some probabilit y distribution or has some structural prop ert y w e can explore. 2. In some pap ers, w e do not assume an ything ab out the set S , but w e assume that h really is a random function, i.e. c hosen uniformly at random from the set of all functions mapping U to f 1 ; : : : ; N g . There are pap ers of b oth kinds with deep and b eautiful mathematics. Ho w ev er, b oth kinds do lea v e y ou a bit nerv ous ab out the relev ance or the meaningfulness of the results. The �rst kind is based on assumptions on the input set whic h ma y b e hard or imp ossible to guaran tee in practice, and the second is simply based on a false assumption! No matter ho w long time y ou stare at the function h ( x ) = x mo d p , it will not morph in to a random function. 3. An anal ysis of the second kind In spite of the ab o v e, it turns out that the �rst really satisfactory analysis of hashing is based on an analysis of the second kind, so w e shall pro ceed along those lines. Theorem 7 Assume that h r e al ly is chosen uniformly at r andom fr om the set of al l functions b etwe en U and f 1 ; : : : ; N g . F urthermor e assume that h c an b e evaluate d in c onstant time. Then the exp ected time r e quir e d to p erform an y se quenc e of m op er ations (satisfying the upp er b ound N on the maximum size of the set) by chaine d hashing is O ( m ) . In other w ords, w e can p erform the op erations in c onstant exp ected amortize d time p er op eration! Exercise 8 The c onstant amortize d time b ound in the ab ove the or em may se em so attr active that the r e ader may c onsider actual ly ensuring that the pr emise is true, i.e. actual ly cho osing h uniformly at r andom fr om the set of al l functions b etwe en U and f 1 ; : : : ; N g . This is, as we shal l se e later, in a way a go o d ide a, but explain the big pr oblem. Let's pro v e the theorem. Assume that the sequence of op erations is op ( k ) ; op ( k ) ; : : : ; op ( k ) 1 1 2 2 m m with op 2 f Inser t ; Delete ; Lookup g . W e are only men tioning the k ey- i parameters k , since the information-parameters x are unimp ortan t for the anal- i i ysis. 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend