ADVANCED ALGORITHMS
Lecture 16: hashing (fin), sampling
1
ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 - - PowerPoint PPT Presentation
ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 ANNOUNCEMENTS HW 3 is due tomorrow! Send project topics n n Send email to utah-algo-ta@googlegroups.com, with subject Project topic; one email per group; names and
Lecture 16: hashing (fin), sampling
1
ANNOUNCEMENTS
➤ HW 3 is due tomorrow! ➤ Send project topics ➤ Send email to utah-algo-ta@googlegroups.com, with subject “Project
topic”; one email per group; names and UIDs
2
n
n
l
n
E
e
n
easycalculus
I
e I
LAST CLASS
3
➤ Hashing ➤ place n balls into n bins, independently and uniformly at random ➤ expected size of a bin = 1 ➤ number of bins with k balls ~= n/k! ➤ max size of bin = O(log n/log log n)
n
MAIN IDEAS
➤ Random variables as sums of “simple” random variables ➤ Linearity of expectation
➤ Markov’s inequality is usually not tight ➤ Union bound
4
Pr[X > t ⋅ 피[X]] ≤ 1 t
Randomvaria
denotdeviate
toomuch from
their
expectations
MAIN IDEAS
➤ Random variables as sums of “simple” random variables ➤ Linearity of expectation
➤ Markov’s inequality is usually not tight ➤ Union bound
5
Pr[X > t ⋅ 피[X]] ≤ 1 t
Pr[E1 ∪ E2 ∪ … ∪ En] ≤ Pr[E1] + Pr[E2] + … + Pr[En]
THOUGHTS
➤ When hashing n balls to n bins, outcomes not “as uniform” as one likes ➤ Many empty bins (HW) ➤ What happens if there are more balls? hash m balls, where ➤ “Power of two choices” (Broder et al. 91)
6
m ≫ n
Max
load ofabin a login
loglogin
Ign
t muchbetter
load
balancing
Max
toad
E0 loglog n
happens
ESTIMATION
7
Question: suppose each person votes R or B. Can we predict the winner without counting all votes?
Want
winner insample
winner inthefullpopulam Answer Sample
m of the people ask forwho they
will votefor andoutput the winnerin the
Sample
Things that
matter i
sampling to be truly uniform
everyone answering truthfully
the number
n of samples
the
margin matters
how closethe
true
votes
are
confidence
in our
prediction
ANALYZING SAMPLING
8
Natural formalism:
➤ Choose n people uniformly at random. ➤ Let Xi (0/1) be outcome of i’th person
Each person chehas achoiceof
N
entire population
Tanipmle
No
that vote 0
u
mo
in sample voting 0 µ
that vote 1
n
i
n
1 Predicted winner
if
no hi
1 of wise
Paki of
N X
t Xzt
t Xn
Mo
n
XitX
X
l X
what is
IE
Xi
1
Efm
Elmo
n
tend
a
rgue that
n
n'ad
Eln
ist
Estimation
error
fraction ofvotes
I received
in thesample
fraction of votes I
i
in
the population
We j st argued
fro
the
r expressions
We justar
gued
from the corr expressions
If
no
EE Elmo
and
n
Eln
then
sample
winner
true winner
1
what if
No
0.4N
N 0.6N
Our prediction is right iff
moan
claim
Elmo
0.4
n
Asks
no
Is
prediction is
right
no
Elmo
L
n
2
what if
No 0.49N and
N
051N
IEf.no
LOo01n
Goal
aster
if
we take
n
samples what is
ANALYZING SAMPLING
9
➤ Error in estimation: ||empirical mean - true expectation||? ➤ “Confidence”
Natural formalism:
➤ Choose n people uniformly at random. ➤ Let Xi (0/1) be outcome of i’th person
Ideal guarantee: || empirical mean - true expectation || < 0.001 w.p. 0.999
MARKOV?
10
want nyo
ItIno3Lo 1rij
t Emf
E
no s EIndftooI
sa
t_In
my
failure prob
I
N
106
n
I
CAN WE USE THE “NUMBER OF SAMPLES”?
11
Variance of random variable X
X
r variable
EX
µ
Barity
X
It Iat
tXn
Y
fx
var x
var Xi
varlx.lt
t var xn if theyL
independent
CHEBYCHEV’S INEQUALITY
12
MY
If
arandom variable has
low variance
then Markov can be improved
theorem
Let
be ar variable
whose variance
r
Then
PrIII
t
r
E
a
t
Backtosany
sling
we wanted
no Efno
so.in
d or
Prf
Ino Efno
E
O l
n
d
r
idea
u
compute this
a
VARIANCE OF AVERAGE
13
BOUND VIA CHEBYCHEV
14
WHAT IF WE TAKE HIGHER POWERS?
15
“Moment methods”
➤ Usually get improved bounds
피[(X − 피X)4] ≤ …
CHERNOFF BOUND
16
INTERPRETING THE CHERNOFF BOUND
17
INTERPRETING THE CHERNOFF BOUND
18
Useful heuristic:
➤ Sums of independent random variables don’t deviate much more
than the variance
MCDIARMID’S INEQUALITY
19
ESTIMATING THE SUM OF NUMBERS
20
ESTIMATING THE SUM OF NUMBERS
21