ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 - - PowerPoint PPT Presentation

advanced algorithms
SMART_READER_LITE
LIVE PREVIEW

ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 - - PowerPoint PPT Presentation

ADVANCED ALGORITHMS Lecture 16: hashing (fin), sampling 1 ANNOUNCEMENTS HW 3 is due tomorrow! Send project topics n n Send email to utah-algo-ta@googlegroups.com, with subject Project topic; one email per group; names and


slide-1
SLIDE 1

ADVANCED ALGORITHMS

Lecture 16: hashing (fin), sampling

1

slide-2
SLIDE 2

ANNOUNCEMENTS

➤ HW 3 is due tomorrow! ➤ Send project topics ➤ Send email to utah-algo-ta@googlegroups.com, with subject “Project

topic”; one email per group; names and UIDs

2

n

n

l

n

E

e

n

easycalculus

I

I

e I

I

slide-3
SLIDE 3

LAST CLASS

3

➤ Hashing ➤ place n balls into n bins, independently and uniformly at random ➤ expected size of a bin = 1 ➤ number of bins with k balls ~= n/k! ➤ max size of bin = O(log n/log log n)

T

n

slide-4
SLIDE 4

MAIN IDEAS

➤ Random variables as sums of “simple” random variables ➤ Linearity of expectation



 


➤ Markov’s inequality is usually not tight ➤ Union bound

4

  • Theorem. Let X be a non-negative random variable. For any t > 0,

Pr[X > t ⋅ 피[X]] ≤ 1 t

Randomvaria

denotdeviate

toomuch from

their

expectations

slide-5
SLIDE 5

MAIN IDEAS

➤ Random variables as sums of “simple” random variables ➤ Linearity of expectation



 


➤ Markov’s inequality is usually not tight ➤ Union bound

5

  • Theorem. Let X be a non-negative random variable. For any t > 0,

Pr[X > t ⋅ 피[X]] ≤ 1 t

  • Theorem. suppose E1, E2, …, En are n events in a probability space. Then

Pr[E1 ∪ E2 ∪ … ∪ En] ≤ Pr[E1] + Pr[E2] + … + Pr[En]

slide-6
SLIDE 6

THOUGHTS

➤ When hashing n balls to n bins, outcomes not “as uniform” as one likes ➤ Many empty bins (HW) ➤ What happens if there are more balls? hash m balls, where ➤ “Power of two choices” (Broder et al. 91)

6

m ≫ n

Max

load ofabin a login

loglogin

I

Ign

t muchbetter

load

balancing

Max

toad

E0 loglog n

happens

slide-7
SLIDE 7

ESTIMATION

7

Question: suppose each person votes R or B. Can we predict the winner without counting all votes?

Want

winner insample

winner inthefullpopulam Answer Sample

m of the people ask forwho they

will votefor andoutput the winnerin the

Sample

slide-8
SLIDE 8

Things that

  • ught to

matter i

sampling to be truly uniform

everyone answering truthfully

the number

n of samples

the

margin matters

how closethe

true

votes

are

confidence

in our

prediction

slide-9
SLIDE 9

ANALYZING SAMPLING

8

Natural formalism:

➤ Choose n people uniformly at random. ➤ Let Xi (0/1) be outcome of i’th person

Each person chehas achoiceof

  • r I

N

entire population

Tanipmle

No

that vote 0

u

mo

in sample voting 0 µ

that vote 1

n

i

n

1 Predicted winner

if

no hi

1 of wise

Prf Xi D

Paki of

slide-10
SLIDE 10

N X

t Xzt

t Xn

Mo

n

XitX

X

l X

what is

IE

Xi

1

Efm

n.IT

Elmo

n

tend

a

rgue that

n

n'ad

Eln

ist

Estimation

error

fraction ofvotes

I received

in thesample

fraction of votes I

i

in

the population

1

We j st argued

fro

the

r expressions

slide-11
SLIDE 11

We justar

gued

from the corr expressions

If

no

EE Elmo

and

n

Eln

then

sample

winner

true winner

1

what if

No

0.4N

N 0.6N

Our prediction is right iff

moan

claim

Elmo

0.4

n

Asks

no

Is

  • ur

prediction is

right

  • ur prediction is right iff

no

Elmo

L

  • l

n

2

what if

No 0.49N and

N

051N

  • ur prediction is right

iffno

IEf.no

LOo01n

Goal

aster

if

we take

n

samples what is

theprob.thatno

JLO.am

slide-12
SLIDE 12

ANALYZING SAMPLING

9

➤ Error in estimation: ||empirical mean - true expectation||? ➤ “Confidence”

Natural formalism:

➤ Choose n people uniformly at random. ➤ Let Xi (0/1) be outcome of i’th person

Ideal guarantee: || empirical mean - true expectation || < 0.001 w.p. 0.999

slide-13
SLIDE 13

MARKOV?

10

t

ii I

want nyo

ItIno3Lo 1rij

Prfn.s

t Emf

I

E

no s EIndftooI

f

sa

t_In

my

failure prob

I

N

106

n

I

slide-14
SLIDE 14

CAN WE USE THE “NUMBER OF SAMPLES”?

11

Variance of random variable X

X

r variable

Hitman

E

T

EX

µ

Barity

X

It Iat

tXn

Y

fx

var x

var Xi

varlx.lt

t var xn if theyL

independent

slide-15
SLIDE 15

CHEBYCHEV’S INEQUALITY

12

MY

If

arandom variable has

low variance

then Markov can be improved

theorem

Let

be ar variable

whose variance

r

Then

PrIII

Ext

t

r

E

a

Prl

yxY

t

slide-16
SLIDE 16

Backtosany

sling

we wanted

no Efno

so.in

d or

Prf

Ino Efno

E

O l

n

d

r

idea

u

compute this

a

slide-17
SLIDE 17

VARIANCE OF AVERAGE

13

slide-18
SLIDE 18

BOUND VIA CHEBYCHEV

14

slide-19
SLIDE 19

WHAT IF WE TAKE HIGHER POWERS?

15

“Moment methods”

➤ Usually get improved bounds

피[(X − 피X)4] ≤ …

slide-20
SLIDE 20

CHERNOFF BOUND

16

slide-21
SLIDE 21

INTERPRETING THE CHERNOFF BOUND

17

slide-22
SLIDE 22

INTERPRETING THE CHERNOFF BOUND

18

Useful heuristic:

➤ Sums of independent random variables don’t deviate much more

than the variance

slide-23
SLIDE 23

MCDIARMID’S INEQUALITY

19

slide-24
SLIDE 24

ESTIMATING THE SUM OF NUMBERS

20

slide-25
SLIDE 25

ESTIMATING THE SUM OF NUMBERS

21