CS 498ABD: Algorithms for Big Data
Probabilistic Counting and Morris Counter
Lecture 04
September 3, 2020
Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18
Probabilistic Counting and Morris Counter Lecture 04 September 3, - - PowerPoint PPT Presentation
CS 498ABD: Algorithms for Big Data Probabilistic Counting and Morris Counter Lecture 04 September 3, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18 Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that
CS 498ABD: Algorithms for Big Data
Probabilistic Counting and Morris Counter
Lecture 04
September 3, 2020
Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18Streaming model
The input consists of m objects/items/tokens e1, e2, . . . , em that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m) and hence cannot store all the input Want to compute interesting functions over input
Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18Counting problem
Simplest streaming question: how many events in the stream?
Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18Counting problem
Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture)
Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18Counting problem
Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better?
Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18Counting problem
Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Not deterministically.
Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18Counting problem
Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires dlog ne = Θ(log n) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Not deterministically. Yes, with randomization. “Counting large numbers of events in small registers” by Rober Morris (Bell Labs), Communications of the ACM (CACM), 1978
Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18=
=
t
Probabilistic Counting Algorithm
ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1/2X If (coin turns up heads) X X + 1 endWhile Output 2X 1 as the estimate for the length of the stream.
Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18Probabilistic Counting Algorithm
ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1/2X If (coin turns up heads) X X + 1 endWhile Output 2X 1 as the estimate for the length of the stream.
Intuition: X keeps track of log n in a probabilistic sense. Hence requires O(log log n) bits
Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18Probabilistic Counting Algorithm
ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1/2X If (coin turns up heads) X X + 1 endWhile Output 2X 1 as the estimate for the length of the stream.
Intuition: X keeps track of log n in a probabilistic sense. Hence requires O(log log n) bits Theorem Let Y = 2X. Then E[Y ] 1 = n, the number of events seen.
Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18log n vs log log n
Morris’s motivation: Had 8 bit registers. Can count only up to 28 = 256 events using deterministic counter. Had many counters for keeping track of different events and using 16 bits (2 registers) was infeasible. If only log log n bits then can count to 228 = 2256 events! In practice overhead due to error control etc. Morris reports counting up to 130,000 events using 8 bits while controlling error. See 2 page paper for more details.
Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18Analysis of Expectation
Induction on n. For i 0, let Xi be the counter value after i events. Let Yi = 2Xi. Both are random variables.
Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18E=it
Hi > o
.E EY ]
Analysis of Expectation
Induction on n. For i 0, let Xi be the counter value after i events. Let Yi = 2Xi. Both are random variables. Base case: n = 0, 1 easy to check: Xi, Yi 1 deterministically equal to 0, 1.
Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18X -0
4=20--1
X= I
4=21--2
E[Yi]
tic n
and pure pre
E-[Yn]=E[2Xn]=jEFzPe[Xn=j3
=¥od( Paan .
4-⇒
+ Paley.is:B
zit) .
= E.in#iTEeoeafxnni&?...s,in
= ELXn-D-ifogkthfxn.io:B
= (n -1+1 , + *Ithaca
,D
f.odhlxn.es]!
Analysis of Expectation
E[Yn] = E h 2Xni =
1X
j=02j Pr[Xn = j] =
1X
j=02j ✓ Pr[Xn1 = j] · (1 1 2j ) + Pr[Xn1 = j 1] · 1 2j1 ◆ =
1X
j=02j Pr[Xn1 = j] +
1X
j=0(2 Pr[Xn1 = j 1] Pr[Xn1 = j]) = E[Yn1] + 1 (by applying induction) = n + 1
Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18Jensen’s Inequality
Definition A real-valued function f : R ! R is convex if f ((a + b)/2) (f (a) + f (b))/2 for all a, b. Equivalently, f (a + (1 )b) f (a) + (1 )f (b) for all 2 [0, 1].
Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18Jensen’s Inequality
Definition A real-valued function f : R ! R is convex if f ((a + b)/2) (f (a) + f (b))/2 for all a, b. Equivalently, f (a + (1 )b) f (a) + (1 )f (b) for all 2 [0, 1]. Theorem (Jensen’s inequality) Let Z be random variable with E[Z] < 1. If f is convex then f (E[Z]) E[f (Z)].
Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18I
rImplication for counter size
We have Yn = 2Xn. The function f (z) = 2z is convex. Hence 2E[Xn] E[Yn] n + 1 which implies E[Xn] log(n + 1) Hence expected number of bits in counter is dlog log(n + 1))e.
Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18#
ZECH
⇐ Efzxn] = Efyn)⇒
ELM
e ldI
Variance calculation
Question: Is the random variable Yn well behaved even though expectation is right? What is its variance? Is it concentrated around expectation?
Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18Variance calculation
Question: Is the random variable Yn well behaved even though expectation is right? What is its variance? Is it concentrated around expectation? Lemma E ⇥ Y 2
n⇤ = 3
2n2 + 3 2n + 1 and hence Var[Yn] = n(n 1)/2. Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18O
TyuEh
Variance analysis
Analyze E ⇥ Y 2
n⇤ via induction. Base cases: n = 0, 1 are easy to verify since Yn is deterministic.
E[Y 2
n ]= E[22Xn] = X
j022j · Pr[Xn = j] = X
j022j · ✓ Pr[Xn1 = j](1 1 2j ) + Pr[Xn1 = j 1] 1 2j1 ◆ = X
j022j · Pr[Xn1 = j] + X
j0⇣ 2j Pr[Xn1 = j 1] + 42j1 Pr[Xn1 = j 1] ⌘ = E[Y 2
n1] + 3E[Yn1]= 3 2(n 1)2 + 3 2(n 1) + 1 + 3n = 3 2n2 + 3 2n + 1.
Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 182-
Error analysis via Chebyshev inequality
We have E[Yn] = n and Var(Yn) = n(n 1)/2 implies Yn = p n(n 1)/2 n. Applying Cheybyshev’s inequality: Pr[|Yn E[Yn] | tn] 1/(2t2). Hence constant factor approximation with constant probability (for instance set t = 1/2).
Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18 <=hE
'=
¢
. ==
E- 5
Error analysis via Chebyshev inequality
We have E[Yn] = n and Var(Yn) = n(n 1)/2 implies Yn = p n(n 1)/2 n. Applying Cheybyshev’s inequality: Pr[|Yn E[Yn] | tn] 1/(2t2). Hence constant factor approximation with constant probability (for instance set t = 1/2). Question: Want estimate to be tighter. For any given ✏ > 0 want estimate to have error at most ✏n with say constant probability or with probability at least (1 ) for a given > 0.
Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18Part I Improving Estimators
Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 182
Probabilistic Estimation
Setting: want to compute some real-value function f of a given input I Probabilistic estimator: a randomized algorithm that given I
exact if E[X] = f (I) for all inputs I. Additive approximation: | E[X] f (I)| ✏ Multiplicative approximation: (1 ✏)f (I) E[X] (1 + ✏)f (I)
Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18Probabilistic Estimation
Setting: want to compute some real-value function f of a given input I Probabilistic estimator: a randomized algorithm that given I
exact if E[X] = f (I) for all inputs I. Additive approximation: | E[X] f (I)| ✏ Multiplicative approximation: (1 ✏)f (I) E[X] (1 + ✏)f (I) Question: Estimator only gives expectation. Bound on Var[X] allows Chebyshev. Sometimes Chernoff applies. How do we improve estimator?
Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18Variance reduction via averaging
Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1
hPh
i=1 Y (i) Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18Variance reduction via averaging
Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1
hPh
i=1 Y (i)Claim: E[Zn] = n and Var(Zn) = 1
h(n(n 1)/2). Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18Variance reduction via averaging
Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1
hPh
i=1 Y (i)Claim: E[Zn] = n and Var(Zn) = 1
h(n(n 1)/2).Choose h =
2 ✏2. Then applying Cheybyshev’s inequalityPr[|Zn E[Zn] | ✏n] 1/4.
Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18O
=
E.cn#H
=*
. =eEE¥
'
Variance reduction via averaging
Run h parallel copies of algorithm with independent randomness Let Y (1), Y (2), . . . , Y (h) be estimators from the h parallel copies Output Z = 1
hPh
i=1 Y (i)Claim: E[Zn] = n and Var(Zn) = 1
h(n(n 1)/2).Choose h =
2 ✏2. Then applying Cheybyshev’s inequalityPr[|Zn E[Zn] | ✏n] 1/4. To run h copies need O( 1
✏2 log log n) bits for the counters. Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 18 =E
too
=
=
Error reduction via median trick
We have: Pr[|Zn E[Zn] | ✏n] 1/4. Want: Pr[|Zn E[Zn] | ✏n] for some given parameter .
Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18I Fester
=E
Error reduction via median trick
We have: Pr[|Zn E[Zn] | ✏n] 1/4. Want: Pr[|Zn E[Zn] | ✏n] for some given parameter . Can set h =
1 2✏2 and apply Chebyshev. Better dependence on ? Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18s
Error reduction via median trick
We have: Pr[|Zn E[Zn] | ✏n] 1/4. Want: Pr[|Zn E[Zn] | ✏n] for some given parameter . Can set h =
1 2✏2 and apply Chebyshev. Better dependence on ?Idea: Repeat independently c log(1/) times for some constant c. We know that with probability (1 ) one of the counters will be ✏n close to n. Why?
Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18Error reduction via median trick
We have: Pr[|Zn E[Zn] | ✏n] 1/4. Want: Pr[|Zn E[Zn] | ✏n] for some given parameter . Can set h =
1 2✏2 and apply Chebyshev. Better dependence on ?Idea: Repeat independently c log(1/) times for some constant c. We know that with probability (1 ) one of the counters will be ✏n close to n. Why? Which one should we pick?
Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18Error reduction via median trick
We have: Pr[|Zn E[Zn] | ✏n] 1/4. Want: Pr[|Zn E[Zn] | ✏n] for some given parameter . Can set h =
1 2✏2 and apply Chebyshev. Better dependence on ?Idea: Repeat independently c log(1/) times for some constant c. We know that with probability (1 ) one of the counters will be ✏n close to n. Why? Which one should we pick? Algorithm: Output median of Z (1), Z (2), . . . , Z (`).
Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 18=
Error reduction via median trick
Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n] .
Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18=
=
I→
¥l⑤
⇐ castError reduction via median trick
Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n] . Let Ai be event that estimate Z (i) is bad: that is, |Z (i) n| > ✏n. Pr[Ai] < 1/4. Hence expected number of bad estimates is `/4.
Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18→I
=
=÷÷
utCn u!En
T
fine value
Error reduction via median trick
Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n] . Let Ai be event that estimate Z (i) is bad: that is, |Z (i) n| > ✏n. Pr[Ai] < 1/4. Hence expected number of bad estimates is `/4. For median estimate to be bad, more than half of Ai’s have to be bad.
Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18=
Error reduction via median trick
Let Z 0 be median of the ` = c log(1/) independent estimators. Lemma Pr[|Z 0 n| ✏n] . Let Ai be event that estimate Z (i) is bad: that is, |Z (i) n| > ✏n. Pr[Ai] < 1/4. Hence expected number of bad estimates is `/4. For median estimate to be bad, more than half of Ai’s have to be bad. Using Chernoff bounds: probability of bad median is at most 2c0` for some constant c0.
Chandra (UIUC) CS498ABD 17 Fall 2020 17 / 18=
Summarizing
Using variance reduction and median trick: with O( 1
✏2 log(1/) log log n) bits one can maintain a (1 ✏)-factorestimate of the number of events with probability (1 ). This is a generic scheme that we will repeatedly use. For counter one can do (much) better by changing algorithm and better analysis. See homework and references in notes.
Chandra (UIUC) CS498ABD 18 Fall 2020 18 / 18