Probabilistic Counting and Morris Counter Lecture 04 September 3, - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Probabilistic Counting and Morris Counter Lecture 04 September 3, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18

Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that are seen one by one by the algorithm. The algorithm has “limited” memory say for B tokens where B < m (often B ⌧ m ) and hence cannot store all the input Want to compute interesting functions over input Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 18

Counting problem Simplest streaming question: how many events in the stream? Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

Counting problem Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires d log n e = Θ (log n ) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

Counting problem Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires d log n e = Θ (log n ) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

Counting problem Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires d log n e = Θ (log n ) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Not deterministically. Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

Counting problem Simplest streaming question: how many events in the stream? Obvious: counter that increments on seeing each new item. Requires d log n e = Θ (log n ) bits to be able to count up to n events. (We will use n for length of stream for this lecture) Question: can we do better? Not deterministically. Yes, with randomization. t “Counting large numbers of events in small registers” by Rober Morris (Bell Labs), Communications of the ACM (CACM), 1978 = = Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 18

Probabilistic Counting Algorithm ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1 / 2 X If (coin turns up heads) X X + 1 endWhile Output 2 X � 1 as the estimate for the length of the stream. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18

Probabilistic Counting Algorithm ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1 / 2 X If (coin turns up heads) X X + 1 endWhile Output 2 X � 1 as the estimate for the length of the stream. Intuition: X keeps track of log n in a probabilistic sense. Hence requires O (log log n ) bits Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18

Probabilistic Counting Algorithm ProbabilisticCounting: X 0 While (a new event arrives) Toss a biased coin that is heads with probability 1 / 2 X If (coin turns up heads) X X + 1 endWhile 0 Output 2 X � 1 as the estimate for the length of the stream. Intuition: X keeps track of log n in a probabilistic sense. Hence requires O (log log n ) bits Theorem Let Y = 2 X . Then E[ Y ] � 1 = n , the number of events seen. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 18

log n vs log log n Morris’s motivation: Had 8 bit registers. Can count only up to 2 8 = 256 events using deterministic counter. Had many counters for keeping track of di ff erent events and using 16 bits (2 registers) was infeasible. If only log log n bits then can count to 2 2 8 = 2 256 events! In practice overhead due to error control etc. Morris reports counting up to 130,000 events using 8 bits while controlling error. See 2 page paper for more details. Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 18

Analysis of Expectation Induction on n . For i � 0 , let X i be the counter value after i events. Let Y i = 2 X i . Both are random variables. E=it Hi > o . E EY ] - I Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

Analysis of Expectation Induction on n . For i � 0 , let X i be the counter value after i events. Let Y i = 2 X i . Both are random variables. Base case: n = 0 , 1 easy to check: X i , Y i � 1 deterministically equal to 0 , 1 . 4=20--1 X -0 4=21--2 X= I Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 18

and pure pre E[ Yi ] tic n - =n - it ' - . E- [ Yn ]=E[2Xn]=jEFzPe[Xn=j3 + Paley .is =¥od( Paan . :B 4- ⇒ = E.in#iTEeoeafxnni&?...s , zit ) . = ELXn-D-ifogkthfxn.io in :B D Ithaca , = ( n -1+1 , f.odhlxn.es ] ! * + - nti . -

Analysis of Expectation 1 h 2 X n i 2 j Pr[ X n = j ] X E[ Y n ] = = E - j =0 1 Pr[ X n � 1 = j ] · (1 � 1 1 ✓ ◆ X 2 j = 2 j ) + Pr[ X n � 1 = j � 1] · 2 j � 1 j =0 1 2 j Pr[ X n � 1 = j ] X = j =0 1 X + (2 Pr[ X n � 1 = j � 1] � Pr[ X n � 1 = j ]) j =0 = E[ Y n � 1 ] + 1 (by applying induction) = n + 1 = Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 18

Jensen’s Inequality Definition A real-valued function f : R ! R is convex if f (( a + b ) / 2)  ( f ( a ) + f ( b )) / 2 for all a , b . Equivalently, f ( � a + (1 � � ) b )  � f ( a ) + (1 � � ) f ( b ) for all � 2 [0 , 1] . Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18

Jensen’s Inequality Definition A real-valued function f : R ! R is convex if f (( a + b ) / 2)  ( f ( a ) + f ( b )) / 2 for all a , b . Equivalently, f ( � a + (1 � � ) b )  � f ( a ) + (1 � � ) f ( b ) for all � 2 [0 , 1] . Theorem (Jensen’s inequality) Let Z be random variable with E[ Z ] < 1 . If f is convex then I f (E[ Z ])  E[ f ( Z )] . r Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 18

⇒ Implication for counter size We have Y n = 2 X n . The function f ( z ) = 2 z is convex. Hence # 2 E [ X n ]  E[ Y n ]  n + 1 which implies E[ X n ]  log( n + 1) Hence expected number of bits in counter is d log log( n + 1)) e . = Efyn ) ZECH ⇐ Efzxn ] n -71 - - e ldI ELM Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 18

Variance calculation Question: Is the random variable Y n well behaved even though expectation is right? What is its variance? Is it concentrated around expectation? Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18

Variance calculation Question: Is the random variable Y n well behaved even though expectation is right? What is its variance? Is it concentrated around expectation? Lemma O 2 n 2 + 3 = 3 ⇥ Y 2 ⇤ 2 n + 1 and hence Var [ Y n ] = n ( n � 1) / 2 . E n - El yay - LEGIT ✓ - - - ( n tilt TyuEh Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 18

Variance analysis ⇥ Y 2 ⇤ Analyze E via induction. n Base cases: n = 0 , 1 are easy to verify since Y n is deterministic. 2 2 j · Pr[ X n = j ] E [ Y 2 E [2 2 X n ] = X n ] = 2- j � 0 ✓ Pr[ X n � 1 = j ](1 � 1 1 ◆ 2 2 j · X = 2 j ) + Pr[ X n � 1 = j � 1] 2 j � 1 j � 0 2 2 j · Pr[ X n � 1 = j ] X = j � 0 ⇣ � 2 j Pr[ X n � 1 = j � 1] + 42 j � 1 Pr[ X n � 1 = j � 1] ⌘ X + j � 0 E [ Y 2 = n � 1 ] + 3 E [ Y n � 1 ] 2( n � 1) 2 + 3 3 2( n � 1) + 1 + 3 n = 3 2 n 2 + 3 = 2 n + 1 . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 18

Error analysis via Chebyshev inequality We have E[ Y n ] = n and Var ( Y n ) = n ( n � 1) / 2 implies < p = � Y n = n ( n � 1) / 2  n . =hE ' Applying Cheybyshev’s inequality: ¢ Pr[ | Y n � E[ Y n ] | � tn ]  1 / (2 t 2 ) . . = = Hence constant factor approximation with constant probability (for instance set t = 1 / 2 ). 5 E- Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18

Error analysis via Chebyshev inequality We have E[ Y n ] = n and Var ( Y n ) = n ( n � 1) / 2 implies p � Y n = n ( n � 1) / 2  n . Applying Cheybyshev’s inequality: Pr[ | Y n � E[ Y n ] | � tn ]  1 / (2 t 2 ) . Hence constant factor approximation with constant probability (for instance set t = 1 / 2 ). Question: Want estimate to be tighter. For any given ✏ > 0 want e- estimate to have error at most ✏ n with say constant probability or with probability at least (1 � � ) for a given � > 0 . Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 18

Part I 2 Improving Estimators Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 18

Probabilistic Estimation Setting: want to compute some real-value function f of a given input I Probabilistic estimator: a randomized algorithm that given I outputs a random answer X such that E[ X ] ' f ( I ) . Estimator is exact if E[ X ] = f ( I ) for all inputs I . Additive approximation: | E[ X ] � f ( I ) |  ✏ Multiplicative approximation: (1 � ✏ ) f ( I )  E[ X ]  (1 + ✏ ) f ( I ) Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

Probabilistic Estimation Setting: want to compute some real-value function f of a given input I Probabilistic estimator: a randomized algorithm that given I outputs a random answer X such that E[ X ] ' f ( I ) . Estimator is exact if E[ X ] = f ( I ) for all inputs I . Additive approximation: | E[ X ] � f ( I ) |  ✏ Multiplicative approximation: (1 � ✏ ) f ( I )  E[ X ]  (1 + ✏ ) f ( I ) Question: Estimator only gives expectation. Bound on Var [ X ] allows Chebyshev. Sometimes Cherno ff applies. How do we improve estimator? Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 18

Probabilistic Counting and Morris Counter Lecture 04 September 3, - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data Probabilistic Counting and Morris Counter Lecture 04 September 3, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 18 Streaming model The input consists of m objects/items/tokens e 1 , e 2 , . . . , e m that

8051 Serial Port and Timer/Counter Serial Port Timer Counter Chatchai Jantaraprim

For Loops and Arrays November 13, 2008 Counting Initialize counter Test counter against limit

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

I ntroduction Mr. Joseph Cacciaguida Morris Hills Regional District Mrs. Cheryl Giordano Morris

Installation of an A/C System in the Octagon Room at Morris Jumel Mansion in Roger Morris Park

Aim To understand who William Morris was. To recognise examples of William Morris patterns.

Three approaches to recommender systems Martin Powers University of Minnesota - Morris Morris,

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting and Probability Whats to come? Counting and Probability Whats to come?

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

The UN Global Counter- -Terrorism Strategy Terrorism Strategy The UN Global Counter The UN

Counter Braids: A novel counter architecture Balaji Prabhakar Balaji Prabhakar Stanford

Can We Understand Performance Counter Results? Vince Weaver ICL Lunch Talk 23 July 2010 How Do

12The Ugly Corners of Math, Logic and Computation UIT2206: The Importance of Being Formal

Perfect Set Games and Colorings on Generalized Baire Spaces Dorottya Szirki 5 th Workshop on

Compactness [Harrison, Section 3.16] 1 More Herbrand Theory Recall G odel-Herbrand-Skolem:

Lecture Slides for MAT-60556 Part VIII: Model theory Henri Hansen November 5, 2013 1

A multi-counter problem A multi-counter problem Problem: Write a method mostFrequentDigit We

Unsolved ques,on in fast counter measurement K. Yonehara APC,

Buenos Diaz, Dear colleagues! 1. About my research team and me 2. The Short Paper outline

Liveness Checking as Safety Checking FMICS, July 12 13, Malaga, Spain Armin Biere, Cyrille