Probabilistic Counting and Morris Counter Lecture 04 September 3, - - PowerPoint PPT Presentation

probabilistic counting and morris counter
SMART_READER_LITE
LIVE PREVIEW

Probabilistic Counting and Morris Counter Lecture 04 September 3, - - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Probabilistic Counting and Morris Counter Lecture 04 September 3, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18 Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that


slide-1
SLIDE 1

CS 498ABD: Algorithms for Big Data

Probabilistic Counting and Morris Counter

Lecture 04

September 3, 2020

Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18
slide-2
SLIDE 2

Streaming model

The input consists of m objects/items/tokens e1, e2, . . . , em that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m) and hence cannot store all the input Want to compute interesting functions over input

Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18
slide-3
SLIDE 3

Counting problem

Simplest streaming question: how many events in the stream?

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18
slide-4
SLIDE 4

Counting problem

Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture)

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18
slide-5
SLIDE 5

Counting problem

Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better?

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18
slide-6
SLIDE 6

Counting problem

Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Not deterministically.

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18
slide-7
SLIDE 7

Counting problem

Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Not deterministically. Yes, with randomization. “Counting large numbers of events in small registers” by Rober Morris (Bell Labs), Communications of the ACM (CACM), 1978

Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

=

=

t

slide-8
SLIDE 8

Probabilistic Counting Algorithm

ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1/2X If (coin turns up heads) X X + 1 endWhile Output 2X 1 as the estimate for the length of the stream.

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18
slide-9
SLIDE 9

Probabilistic Counting Algorithm

ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1/2X If (coin turns up heads) X X + 1 endWhile Output 2X 1 as the estimate for the length of the stream.

Intuition: X keeps track of log n in a probabilistic sense. Hence requires O(log log n) bits

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18
slide-10
SLIDE 10

Probabilistic Counting Algorithm

ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1/2X If (coin turns up heads) X X + 1 endWhile Output 2X 1 as the estimate for the length of the stream.

Intuition: X keeps track of log n in a probabilistic sense. Hence requires O(log log n) bits Theorem Let Y = 2X. Then E[Y ] 1 = n, the number of events seen.

Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18
slide-11
SLIDE 11

log n vs log log n

Morris’s motivation: Had 8 bit registers. Can count only up to 28 = 256 events using deterministic counter. Had many counters for keeping track of different events and using 16 bits (2 registers) was infeasible. If only log log n bits then can count to 228 = 2256 events! In practice overhead due to error control etc. Morris reports counting up to 130,000 events using 8 bits while controlling error. See 2 page paper for more details.

Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18
slide-12
SLIDE 12

Analysis of Expectation

Induction on n. For i 0, let Xi be the counter value after i events. Let Yi = 2Xi. Both are random variables.

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

E=it

Hi > o

.

E EY ]

  • I
slide-13
SLIDE 13

Analysis of Expectation

Induction on n. For i 0, let Xi be the counter value after i events. Let Yi = 2Xi. Both are random variables. Base case: n = 0, 1 easy to check: Xi, Yi 1 deterministically equal to 0, 1.

Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

X -0

4=20--1

X= I

4=21--2

slide-14
SLIDE 14

E[Yi]

  • it '

tic n

and pure pre

  • =n
.

E-[Yn]=E[2Xn]=jEFzPe[Xn=j3

=¥od( Paan .

4-⇒

+ Paley.is

:B

zit) .

= E.in#iTEeoeafxnni&?...s,

in

= ELXn-D-ifogkthfxn.io

:B

= (n -1+1 , + *

Ithaca

,

D

  • nti.

f.odhlxn.es]!

slide-15
SLIDE 15

Analysis of Expectation

E[Yn] = E h 2Xni =

1

X

j=0

2j Pr[Xn = j] =

1

X

j=0

2j ✓ Pr[Xn1 = j] · (1 1 2j ) + Pr[Xn1 = j 1] · 1 2j1 ◆ =

1

X

j=0

2j Pr[Xn1 = j] +

1

X

j=0

(2 Pr[Xn1 = j 1] Pr[Xn1 = j]) = E[Yn1] + 1 (by applying induction) = n + 1

Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18
  • =
slide-16
SLIDE 16

Jensen’s Inequality

Definition A real-valued function f : R ! R is convex if f ((a + b)/2)  (f (a) + f (b))/2 for all a, b. Equivalently, f (a + (1 )b)  f (a) + (1 )f (b) for all 2 [0, 1].

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18
slide-17
SLIDE 17

Jensen’s Inequality

Definition A real-valued function f : R ! R is convex if f ((a + b)/2)  (f (a) + f (b))/2 for all a, b. Equivalently, f (a + (1 )b)  f (a) + (1 )f (b) for all 2 [0, 1]. Theorem (Jensen’s inequality) Let Z be random variable with E[Z] < 1. If f is convex then f (E[Z])  E[f (Z)].

Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18

I

r
slide-18
SLIDE 18

Implication for counter size

We have Yn = 2Xn. The function f (z) = 2z is convex. Hence 2E[Xn]  E[Yn]  n + 1 which implies E[Xn]  log(n + 1) Hence expected number of bits in counter is dlog log(n + 1))e.

Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

#

ZECH

⇐ Efzxn] = Efyn)
  • n -71

ELM

e ldI

slide-19
SLIDE 19

Variance calculation

Question: Is the random variable Yn well behaved even though expectation is right? What is its variance? Is it concentrated around expectation?

Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18
slide-20
SLIDE 20

Variance calculation

Question: Is the random variable Yn well behaved even though expectation is right? What is its variance? Is it concentrated around expectation? Lemma E ⇥ Y 2

n

⇤ = 3

2n2 + 3 2n + 1 and hence Var[Yn] = n(n 1)/2. Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18

O

  • El yay - LEGIT

  • (n tilt

TyuEh

slide-21
SLIDE 21

Variance analysis

Analyze E ⇥ Y 2

n

⇤ via induction. Base cases: n = 0, 1 are easy to verify since Yn is deterministic.

E[Y 2

n ]

= E[22Xn] = X

j0

22j · Pr[Xn = j] = X

j0

22j · ✓ Pr[Xn1 = j](1 1 2j ) + Pr[Xn1 = j 1] 1 2j1 ◆ = X

j0

22j · Pr[Xn1 = j] + X

j0

⇣ 2j Pr[Xn1 = j 1] + 42j1 Pr[Xn1 = j 1] ⌘ = E[Y 2

n1] + 3E[Yn1]

= 3 2(n 1)2 + 3 2(n 1) + 1 + 3n = 3 2n2 + 3 2n + 1.

Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 18

2-

slide-22
SLIDE 22
slide-23
SLIDE 23

Error analysis via Chebyshev inequality

We have E[Yn] = n and Var(Yn) = n(n 1)/2 implies Yn = p n(n 1)/2  n. Applying Cheybyshev’s inequality: Pr[|Yn E[Yn] | tn]  1/(2t2). Hence constant factor approximation with constant probability (for instance set t = 1/2).

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18 <

=hE

'

=

¢

. =

=

E- 5

slide-24
SLIDE 24

Error analysis via Chebyshev inequality

We have E[Yn] = n and Var(Yn) = n(n 1)/2 implies Yn = p n(n 1)/2  n. Applying Cheybyshev’s inequality: Pr[|Yn E[Yn] | tn]  1/(2t2). Hence constant factor approximation with constant probability (for instance set t = 1/2). Question: Want estimate to be tighter. For any given ✏ > 0 want estimate to have error at most ✏n with say constant probability or with probability at least (1 ) for a given > 0.

Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18

e-

slide-25
SLIDE 25

Part I Improving Estimators

Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

2

slide-26
SLIDE 26

Probabilistic Estimation

Setting: want to compute some real-value function f of a given input I Probabilistic estimator: a randomized algorithm that given I

  • utputs a random answer X such that E[X] ' f (I). Estimator is

exact if E[X] = f (I) for all inputs I. Additive approximation: | E[X] f (I)|  ✏ Multiplicative approximation: (1 ✏)f (I)  E[X]  (1 + ✏)f (I)

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18
slide-27
SLIDE 27

Probabilistic Estimation

Setting: want to compute some real-value function f of a given input I Probabilistic estimator: a randomized algorithm that given I

  • utputs a random answer X such that E[X] ' f (I). Estimator is

exact if E[X] = f (I) for all inputs I. Additive approximation: | E[X] f (I)|  ✏ Multiplicative approximation: (1 ✏)f (I)  E[X]  (1 + ✏)f (I) Question: Estimator only gives expectation. Bound on Var[X] allows Chebyshev. Sometimes Chernoff applies. How do we improve estimator?

Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18
slide-28
SLIDE 28

Variance reduction via averaging

Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1

h

Ph

i=1 Y (i) Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18
slide-29
SLIDE 29

Variance reduction via averaging

Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1

h

Ph

i=1 Y (i)

Claim: E[Zn] = n and Var(Zn) = 1

h(n(n 1)/2). Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18
slide-30
SLIDE 30

Variance reduction via averaging

Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1

h

Ph

i=1 Y (i)

Claim: E[Zn] = n and Var(Zn) = 1

h(n(n 1)/2).

Choose h =

2 ✏2. Then applying Cheybyshev’s inequality

Pr[|Zn E[Zn] | ✏n]  1/4.

Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18

O

=

E.cn#H

=

*

. =

eEE¥

'

slide-31
SLIDE 31

Variance reduction via averaging

Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1

h

Ph

i=1 Y (i)

Claim: E[Zn] = n and Var(Zn) = 1

h(n(n 1)/2).

Choose h =

2 ✏2. Then applying Cheybyshev’s inequality

Pr[|Zn E[Zn] | ✏n]  1/4. To run h copies need O( 1

✏2 log log n) bits for the counters. Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18 =

E

too

=

=

slide-32
SLIDE 32

Error reduction via median trick

We have: Pr[|Zn E[Zn] | ✏n]  1/4. Want: Pr[|Zn E[Zn] | ✏n]  for some given parameter .

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18

I Fester

=

E

slide-33
SLIDE 33

Error reduction via median trick

We have: Pr[|Zn E[Zn] | ✏n]  1/4. Want: Pr[|Zn E[Zn] | ✏n]  for some given parameter . Can set h =

1 2✏2 and apply Chebyshev. Better dependence on ? Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18
  • Q
e -
  • ¥

s

  • too
slide-34
SLIDE 34

Error reduction via median trick

We have: Pr[|Zn E[Zn] | ✏n]  1/4. Want: Pr[|Zn E[Zn] | ✏n]  for some given parameter . Can set h =

1 2✏2 and apply Chebyshev. Better dependence on ?

Idea: Repeat independently c log(1/) times for some constant c. We know that with probability (1 ) one of the counters will be ✏n close to n. Why?

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18
  • ÷÷i
=
slide-35
SLIDE 35

Error reduction via median trick

We have: Pr[|Zn E[Zn] | ✏n]  1/4. Want: Pr[|Zn E[Zn] | ✏n]  for some given parameter . Can set h =

1 2✏2 and apply Chebyshev. Better dependence on ?

Idea: Repeat independently c log(1/) times for some constant c. We know that with probability (1 ) one of the counters will be ✏n close to n. Why? Which one should we pick?

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18
slide-36
SLIDE 36

Error reduction via median trick

We have: Pr[|Zn E[Zn] | ✏n]  1/4. Want: Pr[|Zn E[Zn] | ✏n]  for some given parameter . Can set h =

1 2✏2 and apply Chebyshev. Better dependence on ?

Idea: Repeat independently c log(1/) times for some constant c. We know that with probability (1 ) one of the counters will be ✏n close to n. Why? Which one should we pick? Algorithm: Output median of Z (1), Z (2), . . . , Z (`).

Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18

=

slide-37
SLIDE 37

Error reduction via median trick

Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n]  .

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18

=

=

I
  • Ey

¥l⑤

⇐ cast
slide-38
SLIDE 38

Error reduction via median trick

Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n]  . Let Ai be event that estimate Z (i) is bad: that is, |Z (i) n| > ✏n. Pr[Ai] < 1/4. Hence expected number of bad estimates is `/4.

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18

→I

=

  • O
O
  • , TN

=÷÷

utCn u!En

T

fine value

  • n
slide-39
SLIDE 39

Error reduction via median trick

Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n]  . Let Ai be event that estimate Z (i) is bad: that is, |Z (i) n| > ✏n. Pr[Ai] < 1/4. Hence expected number of bad estimates is `/4. For median estimate to be bad, more than half of Ai’s have to be bad.

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18

=

slide-40
SLIDE 40

Error reduction via median trick

Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n]  . Let Ai be event that estimate Z (i) is bad: that is, |Z (i) n| > ✏n. Pr[Ai] < 1/4. Hence expected number of bad estimates is `/4. For median estimate to be bad, more than half of Ai’s have to be bad. Using Chernoff bounds: probability of bad median is at most 2c0` for some constant c0.

Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18

=

slide-41
SLIDE 41

Summarizing

Using variance reduction and median trick: with O( 1

✏2 log(1/) log log n) bits one can maintain a (1 ✏)-factor

estimate of the number of events with probability (1 ). This is a generic scheme that we will repeatedly use. For counter one can do (much) better by changing algorithm and better analysis. See homework and references in notes.

Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 18

E.

slide-42
SLIDE 42