Univ ersal Hashing b y P eter Bro Miltersen This le ctur - - PDF document

univ ersal hashing b y p eter bro miltersen this le ctur
SMART_READER_LITE
LIVE PREVIEW

Univ ersal Hashing b y P eter Bro Miltersen This le ctur - - PDF document

Univ ersal Hashing b y P eter Bro Miltersen This le ctur e note was written for the c ourse \Pe arls of The ory" at University of A arhus. Most r e c ent r evision, Mar ch 5, 1998. 1. Intr oduction


slide-1
SLIDE 1 Univ ersal Hashing b y P eter Bro Miltersen This le ctur e note was written for the c
  • urse
\Pe arls
  • f
The
  • ry"
at University
  • f
A arhus. Most r e c ent r evision, Mar ch 5, 1998. 1. Intr
  • duction
Univ ersal hashing is theory at its b est! Hashing started
  • ut
as a purely heuristic metho d for implemen ting sym b
  • l
tables. It mo v ed in to the hardcore theory
  • f
algorithms with Carter and W egman's analysis
  • f
the concept
  • f
univ ersalit y . It w en t
  • n
to pla y an imp
  • rtan
t role in sev eral
  • f
the most imp
  • rtan
t constructions in abstract complexit y theory and cryptograph y . And no w, these constructions start to creep bac k in to practice. Th us, ha ving matured inside theory , hash- ing gets applied in w a ys the
  • riginal
sym b
  • l
table implemen tors could not ha v e dreamed
  • f
! In this note, w e trac k the exciting career
  • f
the hash function. 2. The prehistor y
  • f
universal hashing The heuristic concept
  • f
hashing, as is no w ada ys kno wn to most (all?) program- mers, w as in tro duced b y Dumey in 1956 [4]. It w as in tro duced as a solution to the sym b
  • l
table problem (no w ada ys called the dictionary problem). In the dictionary problem, w e are giv en a sequence
  • f
Inser t(k ,x), Delete(k ), and Lookup(k )
  • p
erations whic h m ust b e p erformed
  • n-line
(i.e.
  • ne
  • p
eration m ust b e completely p erformed, b efore the next is considered)
  • n
an initially empt y set S . Inser t(k ,x) inserts the k ey k with asso ciated information x in to the set, Delete(k ) deletes the k ey k and its asso ciated information from the set, and Lookup(k ) returns the information asso ciated with k , if k is indeed in the set. F
  • r
simplicit y in the analysis whic h is to come, w e assume that single k eys and single pieces
  • f
asso ciated information t in to single mac hine w
  • rds,
but that t w
  • k
eys
  • r
t w
  • pieces
  • f
information do not t in to a mac hine w
  • rd.
This is
  • ften
called, b eliev e it
  • r
not, the tr ansdichotomous mo del
  • f
computation. Exercise 1 (for language fr e aks) Explain the term tr ansdichotomous. The goal is to p erform the
  • p
erations while minimizing the time and space used. The space used is measured in terms
  • f
memory registers. In general, w e aim for 1
slide-2
SLIDE 2 line ar space, i.e. space comparable to the size
  • f
the set b eing stored. Of course, the size
  • f
the set v aries as the
  • p
erations are p erformed, and this causes some complications in the solutions w e'll lo
  • k
at. F
  • r
simplicit y , w e will assume that w e kno w a single upp er b
  • und
N
  • n
the size
  • f
the set at all times, and w e will allo w
  • urselv
es to use O (N ) registers, ev en when the set is m uc h smaller (but see Problem 19). Exercise 2 R e c al l some solutions to the dictionary pr
  • blem.
Do they use line ar sp ac e? How fast ar e they? Dumey's solution to the dictionary problem w as the follo wing. Assume the k eys and pieces
  • f
information are b
  • th
tak en from the univ erse U . Pic k some \crazy",\c haotic",\random" function h (the hash function) mapping U to f1; : : : ; N g. Initialize an arra y A[1::N ]. A t an y giv en time, in A[i] w e k eep a link ed list con taining the k eys k curren tly in the set, for whic h h(k ) = i. F
  • r
eac h k ey w e attac h the asso ciated information. This is called chaine d hashing. There are
  • ther
kinds
  • f
hashing whic h w e'll happily ignore. Exercise 3 (for language fr e aks) Why hash-function? Exercise 4 Convinc e yourself that it is fairly simple to pr
  • gr
am this data struc- tur e, not much worse than implementing a single linke d list. In tuitiv ely , it is fairly clear wh y this solution should w
  • rk
w ell. If the function h is indeed \crazy", \c haotic" and \random", mapping
  • ur
set S to f1; : : : ; N g using h should b eha v e as if w e w ere just distributing elemen ts
  • f
S at random in N buc k ets. Since the size
  • f
S is at most N , w e should exp ect the buc k ets to b e quite small in general. As the crazy function, Dumey suggested h(x) = x mo d p for p a prime. Exercise 5 Why a prime?? Hashing is widely used in practice and exp erience sho ws that it do es indeed w
  • rk
v ery w ell! But what ab
  • ut
a rigorous analysis? It is easy to see that the ab
  • v
e in tuition cannot b e formalized so that the argumen t ab
  • v
e will b e true for all sets S . Exercise 6 Why not? Ev en giv en the the answ er to exercise 6, hashing w as in tensely analyzed in the t w
  • decades
follo wing Dumey's in v en tion. The problem exp
  • sed
in exercise 6 w as dealt with in t w
  • dieren
t w a ys. 1. In some pap ers, it is assumed that the set to b e stored is not a w
  • rst
case 2
slide-3
SLIDE 3 set. Instead, w e assume that it is c hosen according to some probabilit y distribution
  • r
has some structural prop ert y w e can explore. 2. In some pap ers, w e do not assume an ything ab
  • ut
the set S , but w e assume that h really is a random function, i.e. c hosen uniformly at random from the set
  • f
all functions mapping U to f1; : : : ; N g. There are pap ers
  • f
b
  • th
kinds with deep and b eautiful mathematics. Ho w ev er, b
  • th
kinds do lea v e y
  • u
a bit nerv
  • us
ab
  • ut
the relev ance
  • r
the meaningfulness
  • f
the results. The rst kind is based
  • n
assumptions
  • n
the input set whic h ma y b e hard
  • r
imp
  • ssible
to guaran tee in practice, and the second is simply based
  • n
a false assumption! No matter ho w long time y
  • u
stare at the function h(x) = x mo d p, it will not morph in to a random function. 3. An anal ysis
  • f
the second kind In spite
  • f
the ab
  • v
e, it turns
  • ut
that the rst really satisfactory analysis
  • f
hashing is based
  • n
an analysis
  • f
the second kind, so w e shall pro ceed along those lines. Theorem 7 Assume that h r e al ly is chosen uniformly at r andom fr
  • m
the set
  • f
al l functions b etwe en U and f1; : : : ; N g. F urthermor e assume that h c an b e evaluate d in c
  • nstant
time. Then the exp ected time r e quir e d to p erform an y se quenc e
  • f
m
  • p
er ations (satisfying the upp er b
  • und
N
  • n
the maximum size
  • f
the set) by chaine d hashing is O (m). In
  • ther
w
  • rds,
w e can p erform the
  • p
erations in c
  • nstant
exp ected amortize d time p er
  • p
eration! Exercise 8 The c
  • nstant
amortize d time b
  • und
in the ab
  • ve
the
  • r
em may se em so attr active that the r e ader may c
  • nsider
actual ly ensuring that the pr emise is true, i.e. actual ly cho
  • sing
h uniformly at r andom fr
  • m
the set
  • f
al l functions b etwe en U and f1; : : : ; N g. This is, as we shal l se e later, in a way a go
  • d
ide a, but explain the big pr
  • blem.
Let's pro v e the theorem. Assume that the sequence
  • f
  • p
erations is
  • p
1 (k 1 );
  • p
2 (k 2 ); : : : ;
  • p
m (k m ) with
  • p
i 2 fInser t ; Delete ; Lookup g. W e are
  • nly
men tioning the k ey- parameters k i , since the information-parameters x i are unimp
  • rtan
t for the anal- ysis. 3
slide-4
SLIDE 4 W e c ho
  • se
h at random, and w e w an t to compute the exp ectation
  • f
the ran- dom v ariable T (op 1 (k 1 );
  • p
2 (k 2 ); : : : ;
  • p
m (k m )) = P i T (op i (k i )) By linearit y
  • f
exp ectation (a p earl
  • f
probabilit y theory!) w e ha v e E [ X i T (op i (k i ))] = X i E[T (op i (k i ))]: So, w e
  • nly
ha v e to sho w that for an y i, E [T (op i (k i ))] is O (1), and w e are done. Let's x i and lo
  • k
at the term E [T (op i (k i )]. Let's call the app earance
  • f
the set when this
  • p
eration is to b e p erformed (i.e. the set after i
  • 1
  • p
erations) for S i . Then E[T (op i (k i ))]
  • 1
+ E[length
  • f
link ed list at en try h(k i ) after instruction i
  • 1]
= 1 + E[#fy 2 S i j h(y ) = h(k i )g] = 1 + E 2 4 X y 2S i ( 1; if h(y ) = h(k i ) 0;
  • therwise
3 5 = 1 + X y 2S i E "( 1; if h(y ) = h(k i ) 0;
  • therwise
# = 1 + X y 2S i Pr[h(y ) = h(k i )]
  • 1
+ 1 + X y 2S i nfk i g Pr[h(y ) = h(k i )]
  • 1
+ 1 + N (1= N ) = 3 4. Universal hashing Of course, the answ er to exercise 8 leads y
  • u
to the conclusion that the result in the last section is nice but irrelev an t. Ho w ev er, Carter and W egman in their seminal pap er
  • n
univ ersal hashing [2] sa w the w a y
  • ut:
Lo
  • k
at the analysis
  • f
the last section. Where did w e actually use an ything ab
  • ut
the probabilit y space asso ciated with h? W e didn't use m uc h,
  • nly
the follo wing fact, whic h w e'll call prop ert y (U): (U) F
  • r
all x 6= y ; Pr[h(x) = h(y )]
  • 1=
N . No w, Carter and W egman's simple but brillian t idea w as this: W e wil l actually c ho
  • se
h at random when w e initialize
  • ur
data structure, but not from the space 4
slide-5
SLIDE 5
  • f
all functions. W e will c ho
  • se
h from a m uc h smaller space, but mak e sure that the prop ert y (U) holds. This leads to the follo wing denition: Denition Let H b e a class
  • f
functions mapping U to f1; : : : ; N g. W e sa y that H is universal, if for an y x 6= y in U , and an h, c hosen uniformly at random in H , w e ha v e Pr [h(x) = h(y )]
  • 1=
N : Also, w e sa y that H is ne arly universal, if w e
  • nly
ha v e Pr[h(x) = h(y )]
  • 2=
N . Exercise 9 (for advertising agents to b e) Why univ ersal? The denition
  • f
ne arly univ ersal is not standard, and is added here mainly for con v enience. The theorem ab
  • v
e no w generalizes in to: Theorem 10 Cho
  • se
h uniformly at r andom fr
  • m
a (ne arly) universal family H mapping U to f1; : : : ; N g. Assume that memb ers
  • f
H c an b e evaluate d in c
  • nstant
time. Then the exp e cte d time r e quir e d to p erform an y se quenc e
  • f
m
  • p
er ations (satisfying the upp er b
  • und
N
  • n
the maximum size
  • f
the set) by chaine d hashing is O (m). W e no w
  • nly
ha v e to exhibit a small, ecien t, (nearly) univ ersal family . Theorem 11 L et p b e a prime gr e ater than N . L et H b e the family mapping f0; 1; : : : ; p
  • 1g
to f0; : : : ; N
  • 1g,
c
  • ntaining,
for e ach a 2 f0; : : : ; p
  • 1g,
the function h a (x) = (ax mo d p) mo d N . Then H is ne arly universal. Before w e sho w the theorem, let us note that this do es indeed solv e
  • ur
problem! If
  • ur
univ erse U is, sa y f0; 1; 2; : : : ; 2 w
  • 1g
(i.e. the set
  • f
w
  • bit
w
  • rds),
w e can c ho
  • se
p to b e a prime b et w een 2 w and 2 w +1 (suc h a prime exists). When the computation b egins, w e can select a random hash function from H and store all information ab
  • ut
it (i.e. p and a) in less than 3 machine wor ds. Compare this to the answ er to exercise 8. W e can also ev aluate the hash function in constan t time, using standard arithmetic
  • p
erations. The pro
  • f
  • f
near univ ersalit y is clev er, but simple: Pr[h a (x) = h a (y )] = Pr[(ax mo d p) mo d N = (ay mo d p) mo d N ] = Pr[(ax mo d p)
  • (ay
mo d p) 2 fb p
  • 1
N cN ; : : : ; 2N ; N ; 0; N ; 2N ; : : : ; b p
  • 1
N cN g] = Pr[a(x
  • y
) mo d p 2 R ]; where R = f0; N ; 2N ; : : : ; b p1 N cN ; p
  • N
; p
  • 2n;
: : : ; p
  • b
p1 N cN g. Since Z=pZ is 5
slide-6
SLIDE 6 a eld, the last probabilit y is equal to Pr[a 2 R(x
  • y
) 1 ] = jR(x
  • y
) 1 j p = jRj p
  • 2p
N p = 2 N If w e w an t a truly univ ersal family (i.e. with prop ert y (U)) satised, w e can ac hiev e this b y taking as mem b ers
  • f
H all functions
  • f
the form h a;b (x) = (ax + b mo d p) mo d N with a 6= 0. W e shall not sho w this (near univ ersalit y is sucien t for the dictionary application). W e ha v e no w virtually sho wn the follo wing theorem: Theorem 12 The dynamic dictionary pr
  • blem
c an b e implemente d using O (N ) sp ac e and exp e cte d c
  • nstant
amortize d time p er
  • p
er ation. One sligh t problem with
  • ur
solution is the prime p whic h m uc h b e found some- ho w. Ho w ev er, since it
  • nly
dep ends
  • n
the size
  • f
the univ erse, it is reasonable to assume that it is giv en for free. An alternativ e is to use univ ersal families whic h are not based
  • n
primes. A particularly nice
  • ne
whic h also a v
  • ids
in teger division and uses
  • nly
  • ne
m ultiplication is the follo wing: The univ erse is again U = f0; 1; : : : ; 2 w
  • 1g.
The name
  • f
a hash function is just an
  • dd
n um b er a in U . T
  • hash
a k ey x, w e m ultiply x b y a. This giv es a n um b er in f0; 1; : : : ; 2 2w
  • 1g,
i.e. t w
  • consecutiv
e w
  • rds.
No w, if w e w an t the range
  • f
the family to b e, sa y , f0; 1g l , w e just pic k the l most signican t bits
  • f
the le ast signican t w
  • rd
  • f
ax. The pro
  • f
that this do es indeed ha v e the near univ ersalit y prop ert y can b e found in [3]. The pro
  • f
is
  • nly
sligh tly more complicated than the ab
  • v
e. 5. The fur ther ad ventures
  • f
the hash function A few y ears passed b efore p eople started noticing ho w generally useful a to
  • l
univ ersal hashing is, but b y the late eigh ties, dictionaries w ere
  • nly
  • ne
example in a long list
  • f
(rst theoretical and later practical) applications. Wh y is hashing useful in general? A go
  • d
rule
  • f
th ump is that whenev er y
  • u
ha v e a nice pattern
  • r
some useful information and w an t to see it c
  • mpletely
and utterly destr
  • ye
d (the Bea vis and Butthead
  • b
jectiv e), hashing migh t come in useful. No w, wh y w
  • uld
w e w an t to destro y nice patterns
  • r
information? W ell, w e already sa w the dictionary example; in that example a \nice" pattern migh t b e all the k eys ending up in
  • ne
list! In the rest
  • f
the note, w e sho w three
  • ther
examples, co v ering algorithms, cryptograph y , and complexit y theory . They are just the tip
  • f
an iceb erg. F
  • r
further information, w e recommend the surv ey b y Lub y and Wigderson [5 ]. 6
slide-7
SLIDE 7 6. Derandomiza tion Consider the MAX CUT problem: Giv en a graph G = (V ; E ), nd a t w
  • -colouring
  • f
the v ertices
  • :
V ! fred ; blue g so as to maximize c() = #f(x; y ) 2 E j(x) 6= (y )g: Here is a simple r andomize d algorithm whic h
  • utputs
a coloring , so that E(c()) = jE j=2: Just colour eac h v ertex randomly (red
  • r
blue, eac h with probabilit y 1/2). Then, E [c()] = X fx;y g2E Pr[(x) 6= (y )] = jE j=2 No w, what ab
  • ut
a deterministic, p
  • lynomial
time, algorithm with the same p erformance guaran tee, i.e.
  • utputting
, so that c()
  • jE
j=2? Simple: Let H b e a univ ersal family , mapping V to f0; 1g. The analysis still holds if w e c ho
  • se
h 2 H at random and (v ) = h(v ). But w e kno w w e can c ho
  • se
H so that it
  • nly
con tains jV j O (1) mem b ers, so w e can try them all in p
  • lynomial
time and
  • utput
the coloring with the maxim um c-v alue. Reference: Lub y and Wigderson [5]. 7. A mo vie script Tw
  • secret
agen ts, Alice and Bob, comm unicate using the In ternet. This is not v ery secure, and indeed, Alice and Bob kno w that evil Claire regularly ea v esdrop
  • n
their con v ersation. T
  • morro
w, Alice is going to transmit to Bob a particularly sensitiv e piece
  • f
information con taining 1000 bits, so they are going to encrypt the information. Claire is an emplo y ee at BRICS and has therefore unlimited computational resources, so Alice and Bob do not w an t to emplo y a sc heme based
  • n
computational assumptions (suc h as RSA). Instead, they are going to use an information the
  • r
etic al ly secure sc heme. A mon th ago, Alice and Bob met in p erson, ipp ed a coin 2000 times, and b
  • th
wrote do wn the resulting bit sequence. They agreed to use the bits as
  • ne
time p ads in their next t w
  • sensitiv
e messages (sensitiv e messages alw a ys con tain 1000 bits). So far, no sensitiv e messages ha v e b een sen t, so the secret bits are all un used, but tomorro w, Alice is going to tak e her sensitiv e message, compute a bit wise X OR with the rst 1000 secret bits, and send the result to Bob, who will decrypt it b y a similar
  • p
eration. Claire will not b e able to get an y information from the message, ev en using her net w
  • rk
  • f
P
  • w
erP
  • c
k etMultiIndys. 7
slide-8
SLIDE 8 Exercise 13 Why not? Ho w ev er, ev en the b est plan can fail. During the nigh t Bob con tacts Alice (using the insecure c hannel). A few min utes ago, Bob surprised Doug (an agen t
  • f
Claire) in his
  • ce.
He shot and killed him immediately , but in Doug's hand w as the secret 2000 bit sequence (Bob admits that he probably shouldn't ha v e left it
  • n
his desk) and
  • n
the
  • ce
terminal Bob sa w: talk eclaire@gorm.daimi.aau.d k [connection established] 1000111110110111010000111 0000 100 1000 1000 0000 110 101 1000001011001001110111111 1000 001 0001 0111 1101 010 110 1110101111001000000100001 0010 011 0010 1100 1111 100 100 0010111111011000010111101 0010 000 1101 0100 0111 011 111 Aaaaargh... so no w Claire kno ws something ab
  • ut
the secret sequence; she's receiv ed exactly 200 bits. No w, if these bits w ere 200 consecutiv e bits
  • f
the secret sequence, Alice and Bob could just use a dieren t p
  • rtion
  • f
the sequence, but that do es not seem to b e the case, Alice and Bob do not recognize the transmitted sequence at al l. They m ust b e 200 bits ab
  • ut
the sequence, the nature
  • f
whic h
  • nly
Claire no w kno ws. Exercise 14 Give examples
  • f
information,
  • ther
than sp e cic bits
  • f
the se cr et key, which c
  • uld
b e useful to Clair e (if she has some ide a ab
  • ut
the natur e
  • f
the sensitive message to b e sent tomorr
  • w).
Now, how do A lic e tr ansmit the message tomorr
  • w
without c
  • mpr
  • mising
the unc
  • nditional
se curity demand? Alice and Bob decide to sleep
  • n
it. And no w, the exciting climax
  • f
the story: The next da y , Alice and Bob agree,
  • v
er the insecure c hannel,
  • n
a family
  • f
univ ersal hash functions H mapping f0; 1g 2000 to f0; 1g 1000 . If they decide
  • n
the family whose mem b ers are h a;b (x) = (ax + b mo d p) mo d N , w e ha v e that jH j
  • 2
4002 . Then, Alice ips a coin a few thousand times and thereb y determines a random mem b er h 2 H . The description
  • f
h is sen t to Bob, again
  • v
er the insecure c hannel (Claire hears all this). Then they b
  • th
hash their secret k ey , reducing it to 1000 bits, and Alice sends her sensitiv e message, using the hashed k ey as a
  • ne-time
pad. Whatev er information Claire receiv ed ab
  • ut
the secret k ey , it is c
  • mpletely
and utterly destr
  • ye
d b y the hashing, and Alice's last message lo
  • ks
completely random to Claire. Of course, Alice and Bob will no w
  • nly
b e able to send this single message 8
slide-9
SLIDE 9 using their k ey , so they'll ha v e to meet in p erson again b efore they sen t the next sensitiv e message. P erhaps that'll teac h Bob to stop lea ving secret stu
  • n
his desk. THE END. A ny similarity to r e al p ersons (living
  • r
de ad), events,
  • r
  • c
es, is pur ely c
  • incidental
Since this w as mean t as an exciting mo vie script for a general audience, w e simplied the last part a bit. In a precise information theoretic sense whic h w e w
  • n't
go in to here, w e can exp ect Claire to b e able to learn ab
  • ut
2 10002000+200 = 2 800 bits ab
  • ut
the sensitiv e message. But for all practical purp
  • ses,
that's 0. Reference: Bennett et al [1]. 8. Complexity classes Recall that the circuit satisabilit y problem CIR CUIT SA T problem is N P
  • complete.
Th us, if P 6= N P , there is no w a y to decide in p
  • lynomial
time if the input v ariables
  • f
a Bo
  • lean
circuit can b e assigned truth v alues so that the circuit ev aluates to Tr ue. It follo ws that there is no w a y
  • f
nding satisfying assignmen ts to satisable circuits in p
  • lynomial
time. Exercise 15 Why do es the last statement fol low fr
  • m
the rst? Of course, w e are in terested in trac king the source
  • f
the dicult y . Here is
  • ne
p
  • ssible
h yp
  • thesis:
The source
  • f
dicult y is the fact that a t ypical satisable circuit has quite a few dieren t satisfying assignmen ts, and a searc h algorithm trying to trac k
  • ne
  • f
them b y sophisticated means (suc h as genetic searc h) will b e confused b y the m ultitude
  • f
solutions lea ving inconsisten t hin ts in the searc h space. A more precise w a y to phrase that h yp
  • thesis
is (H) N P 6= P , so CIR CUIT SA T is not in P , but whenev er there is just
  • ne
satisfying assignmen t to a Bo
  • lean
circuit, w e can nd it in p
  • lynomial
time. W ell, the reasons for suggesting (H) are arguably prett y lame, so it is probably reasonable to assume \not (H)" lik e w e usually assume P 6= N P and get
  • n
with
  • ur
liv es. But
  • ne
  • f
the p
  • in
ts
  • f
complexit y theory is to minimize
  • ur
ignor anc e b y making as few unpro v en assumptions as p
  • ssible.
Assuming P 6= N P is usually regarded as prett y safe. Is \not (H)" equally (or almost as) safe? Univ ersal hashing giv es us the answ er. Theorem 16 (H) implies that every pr
  • blem
in N P c an b e solve d by a Monte Carlo algorithm in p
  • lynomial
time. 9
slide-10
SLIDE 10 By a Mon te Carlo algorithm w e mean a randomized algorithm whic h ma y answ er incorrectly , but
  • n
any input x, the probabilit y
  • f
an incorrect answ er is (e.g.) 2 1000 . Th us, y
  • u
are very unlik ely to ev er see an incorrect answ er, and, if y
  • u
  • nce
susp ect an answ er to b e incorrect, y
  • u
can ask again! So \not (H)" seems almost as safe an assumption as P 6= N P . A sk etc h
  • f
the theorem is as follo ws: Assume (H) is true, and let A b e the algorithm whic h nds unique satisfying assignmen ts in p
  • lynomial
time. W e no w sho w that CIR CUIT SA T can b e solv ed b y a Mon te Carlo algorithm in p
  • lynomial
time. Since CIR CUIT SA T is N P
  • complete,
the theorem follo ws. The Mon te Carlo algorithm do es the follo wing: Giv en a circuit C , it constructs a random sequence
  • f
circuits, C 1 ; C 2 ; : : : ; C 2r (with r = n um b er
  • f
v ariable
  • f
C ), so that 1. If C is unsatisable, the C i 's are also unsatisable. 2. If C is satisable, then, with non-negligible probabilit y , at least
  • ne
  • f
the C i 's is uniquely satisable. F urthermore, the size
  • f
the C i 's should b e p
  • lynomial
in C . If w e can do this, w e ha v e solv ed
  • ur
problem: W e just giv e the C i 's to A, and see if it nds satisfying assignmen ts to an y
  • f
them. If it do es, w e kno w that C is satisable. If not, w e iterate, with new C i 's. After ha ving ask ed A a n um b er
  • f
times, and no satisfying assignmen ts are ev er found, w e kno w that C is extremely unlik ely to b e satisable, b ecause w e kno w that with
  • v
erwhelming probabilit y ,
  • ne
  • f
the C i 's w e'v e tried m ust ha v e b een uniquely satisable, and A w
  • uld
ha v e found a satisfying assignmen t to suc h a C i . So ho w are the C i 's dened? W e tak e a univ ersal family H mapping f0; 1g r ! f0; 1g and pic k random mem b ers h 1 ; h 2 ; : : : ; h 2r . No w, w e construct C i so that C i (x) , C (x) ^ h 1 (x) = 1 ^ h 2 (x) = 1 ^
  • ^
h i (x) = 1 This w
  • rks!
The in tuitiv e reason is as follo ws: Eac h time another clause h i (x) = 1 is added, w e can exp ect the n um b er
  • f
satisfying assignmen ts to b e appro ximately halv ed. When w e get to C 2r , they are almost certainly gone completely . But then, it is lik ely that the last circuit that had a satisfying assignmen t will ha v e exactly 1, b ecause a jump from, sa y , 2 satisfying assignmen ts to is less lik ely than going from 2 to 1 to 0. F urthermore, if the family H is a small, ecien t, family (lik e the
  • nes
w e sa w earlier), the size
  • f
C i will b e p
  • lynomial
in C . Reference: V alian t and V azirani [6]. 10
slide-11
SLIDE 11 9. Pr
  • blems
Problem 17 A rnold Dummy de cides to implement hashing using Dumey's func- tion h(x) = (x mo d p) for some prime p. Of c
  • urse,
the r ange
  • f
this function is f0; : : : ; p
  • 1g
and Dummy wants the hash function to c
  • ntain
1000 c el ls
  • and
1000 is not a prime. Dummy de cides to hack his way
  • ut
  • f
this by cho
  • sing
p = 1327 and dening h(x) = (x mo d p) mo d 1000. Is this a go
  • d
ide a? Problem 18 L et A b e a 0-1 matrix
  • f
dimensions r
  • s.
N By line ar algebr a, we c an view A as a line ar map
  • ver
the eld Z 2 = f0; 1g mapping (Z 2 ) s to (Z 2 ) r (just do \usual" matrix multiplic ation exc ept that everything is done mo dulo 2). Show that the set
  • f
al l r
  • s
matric es form a universal family
  • f
hash functions. Discuss advantages and disadvantages
  • f
using this in pr actic e inste ad
  • f
the
  • nes
b ase d
  • n
inte ger arithmetic. Problem 19 In the
  • r
em 12, we assume that we know the upp er b
  • und
N
  • n
the set in advanc e and that we ar e al lowe d to use sp ac e O (N ) at al l times. Show that this assumption c an b e r emove d, i.e. that ther e is a solution to the dictio- nary pr
  • blem
with exp e cte d c
  • nstant
amortize d time p er
  • p
er ation and using sp ac e which is, at any given time, pr
  • p
  • rtional
to the curr ent size
  • f
the set. Problem 20 [Oyster
  • f
the w eek] The p erformanc e guar ante e
  • n
the dictio- nary pr
  • blem
is exp e cte d c
  • nstant
amortize d time. However, individual
  • p
er ations ar e not guar ante e d any go
  • d
worst c ase p erformanc e. In the static dictionary pr
  • blem,
we do not have Inser t's
  • r
Delete's, inste ad we have an Init
  • p
er ation which take a set
  • f
keys and asso ciate d information and pr
  • duc
es a data structur e r epr esenting the set. The data structur e is never change d, we
  • nly
p erform Lookup's
  • n
it Devise a scheme for static dictionaries, so that an y set is c
  • nverte
d into a data structur e
  • f
line ar sp ac e, and so that an y Lookup
  • p
er ation c an b e p erforme d in w
  • rst
case c
  • nstant
time (!!!!!) References [1] C.H. Bennett, G. Brassard, J.-M. Rob ert, Priv acy amplication b y public discussion, SIAM Journal
  • n
Computing 17 (1988) 210{229. [2] J.L. Carter, M.N. W egman, Univ ersal classes
  • f
hash functions, J. Comp. Sys. Sci. 18 (1979) 143-154. 11
slide-12
SLIDE 12 [3] M. Dietzfelbinger, T. Hagerup. J. Kata jainen, M. P en ttonen, A reliable ran- domized algorithm for the closest-pair problem, tec hnical rep
  • rt
513, F ac h- b ereic h Informatik, Univ ersit at Dortm und, 1993. [4] A.I. Dumey , Computers and A utomation 5 (1956) 6{9. [5] M. Lub y , A. Wigderson, P airwise Indep endence and Derandomization, tec h- nical rep
  • rt
TR-95-035, ICSI, 1995. [6] L. V alian t, V. V azirani, NP is as easy as detecting unique solutions, The
  • r
etic al Computer Scienc e 47 (1986) 85{93. 12