Compressed Counting Ping Li Department of Statistical Science - PowerPoint PPT Presentation

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 1 Compressed Counting Ping Li Department of Statistical Science Faculty of Computing and Information Science Cornell University Ithaca, NY 14850 March, 2009

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 2 What is Counting in This Talk? Assume a very long vector of D items: x 1 , x 2 , ..., x D . This talk is about counting � D i =1 x α where 0 < α ≤ 2 . i , x 1 2 4 6 8 10 12 14 D The case α → 1 is particularly interesting and important.

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 3 Related Summary Statistics • The sum � D The number of non-zeros, � D i =1 x i . i =1 1 x i � =0 • The α th moment F ( α ) = � D i =1 x α i F (1) = the sum, F (2) = the power/energy, F (0) = number of non-zeros. • The future fortune, � D i =1 x 1 ± ∆ , ∆ = interest/decay rate (usually small) i • The entropy moment � D i =1 x i log x i and entropy � D x i x i F (1) log i =1 F (1) 1 − F ( α ) /F α F ( α ) 1 (1) • The Tsallis Entropy 1 − α log The R´ enyi Entropy F α α − 1 (1)

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 4 Isn’t Counting a Simple (Trivial) Task? Partially True! , if data are static. However Real-world data are in general Massive and Dynamic —— Data Streams • Databases in Amazon, Ebay, Walmart, and search engines • Internet/telephone traffic, high-way traffic • Finance (stock) data • ... • May need answers in real-time, eg anomaly detection (using entropy).

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 5 For example, the Turnstile data stream model for an online bookstore t=0 .... 0 0 0 0 0 0 0 IP 1 .... IP 2 IP 3 IP 4 IP D t=1 arriving stream = (3, 10 ) user 3 ordered 10 books .... 0 0 10 0 0 0 0 IP 1 .... IP 2 IP 3 IP 4 IP D t=2 arriving stream = (1, 5 ) user 1 ordered 5 books .... 5 0 10 0 0 0 0 IP 1 .... IP 2 IP 3 IP 4 IP D t=3 arriving stream = (3, −8 ) user 3 cancelled 8 books .... 5 0 2 0 0 0 0 IP 1 .... IP 2 IP 3 IP 4 IP D

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 6 Turnstile Data Stream Model At time t , an incoming element : a t = ( i t , I t ) i t ∈ [1 , D ] index, I t : increment/decrement. A t [ i t ] = A t − 1 [ i t ] + I t Updating rule : Goal : Count F ( α ) = � D i =1 A t [ i ] α

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 7 Counting: Trivial if α = 1 , but Non-trivial in General Goal : Count F ( α ) = � D i =1 A t [ i ] α , where A t [ i t ] = A t − 1 [ i t ] + I t . When α � = 1 , counting F ( α ) exactly requires D counters. (but D can be 2 64 ) When α = 1 , however, counting the sum is trivial, using a simple counter. D t � � F (1) = A t [ i ] = I s , i =1 s =1

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 8 The Intuition for α ≈ 1 There might exist an intelligent counting system which works like a simple counter when α is close 1; and its complexity is a function of how close α is to 1. Our answer: Yes! Two caveats: Shouldn’t we define F ( α ) = � D i =1 | A t [ i ] | α ? (1) What if data are negative? (2) Why the case α ≈ 1 is important ?

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 9 The Non-Negativity Constraint ”God created the natural numbers; all the rest is the work of man.” —- by German mathematician Leopold Kronecker (1823 - 1891) Turnstile model, a t = ( i t , I t ) , A t [ i t ] = A t − 1 [ i t ] + I t , I t > 0 : increment, insertion, eg place orders I t < 0 : decrement, deletion, eg cancel orders, This talk: Strict Turnstile model A t [ i ] ≥ 0 , always. One can only cancel an order if she/he did place the order!! Suffices for almost all applications.

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 10 Sample Applications of α th Moments (Especially α ≈ 1 ) 1. F ( α ) = � D i =1 A t [ i ] α itself is a useful summary statistic enyi entropy, Tsallis entropy, are functions of F ( α ) . e.g., R´ 2. Statistical modeling and inference of parameters using method of moments Some moments may be much easier to compute than others. 3. F ( α ) = � D i =1 A t [ i ] α is a fundamental building element for other algorithms Eg., estimating Shannon entropy of data streams

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 11 Shannon Entropy of Data Streams Definition of Shannon Entropy D D A t [ i ] log A t [ i ] � � H = − , F (1) = A t [ i ] F (1) F (1) i =1 i =1 Shannon entropy can be approximated by R´ enyi Entropy or Tsallis Entropy. R´ enyi Entropy 1 − α log F ( α ) 1 H α = → H, as α → 1 F α (1) Tsallis Entropy � � 1 − F ( α ) 1 T α = → H, as α → 1 F α α − 1 (1)

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 12 Algorithms on Estimating Shannon Entropy • Many algorithms in theoretical CS and databases on estimating entropy. • A recent trend: Using α th moments to approximate Shannon entropy. – Zhao et. al. (IMC07), used symmetric stable random projections (Indyk JACM06, Li SODA08) to approximate moments and Shannon entropy. – Harvey et. al. (ITW08). A theoretical paper proposed a criterion on how close α is to 1. Used symmetric stable random projections as the underlying algorithm. – Harvey et. al. (FOCS08). They proposed refined criteria on how to choose α and cited both symmetric stable random projections and Compressed Counting as underlying algorithms.

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 13 Anomaly Detection in Large Networks Using Entropy of Traffic Example: Laura Feinstein, Dan Schnackenberg, Ravindra Balupari, and Darrell Kindred. Statistical approaches to DDoS attack detection and response. In DARPA Information Survivability Conference and Exposition, 2003 General idea: Anomaly events (such as failure of service, distributed denial of service (DoS) attacks) change the the distribution of the traffic data. The change of distribution can be characterized by the change of entropy.

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 14 Previous Methods for Estimating F ( α ) • The pioneering work, [AMS STOC’96] • A popular algorithm, symmetric stable random projections [Indyk JACM’06], [Li SODA’08] – Basic idea: Let X = A t × R , where entries of R ∈ R D × k are sampled from a symmetric α -stable distribution. Entries of X ∈ R k are also samples from a symmetric α -stable distribution with the scale = F ( α ) . 1 /ǫ 2 � � – k = O , the large-deviation bound. k may be too large for real applications [GC RANDOM’07].

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 15 Compressed Counting: Skewed Stable Random Projections Original data stream signal: A t [ i ] , i = 1 to D . eg D = 2 64 Projected signal: X t = A t × R ∈ R k , k is small (eg k = 20 ∼ 100 ) Projection matrix: R ∈ R D × k , Sample entries of R i.i.d. from a skewed α -stable distribution.

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 16 The Standard Data Stream Technique: Incremental Projection Linear Projection: X t = A t × R + Linear data model: A t [ i t ] = A t − 1 [ i t ] + I t = ⇒ Conduct X t = A t × R incrementally. Generate entries of R on-demand Our method differs from previous algorithms in the choice of the distribution of R .

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 17 Recover F ( α ) from Projected Data X t = ( x 1 , x 2 , ..., x k ) = A t × R R = { r ij } ∈ R D × k , r ij ∼ S ( α, β, 1) S ( α, β, γ ) : α -stable, β -skewed distribution with scale γ Then, by stability, at any t , x j ’s are i.i.d. stable samples � D � � A t [ i ] α x j ∼ S α, β, F ( α ) = i =1 = ⇒ A statistical estimation problem.

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 18 Review of Skewed Stable Distributions Z follows a β -skewed α -stable distribution if Fourier transform of its density � √ � F Z ( t ) = E exp − 1 Zt α � = 1 , √ � πα � − F | t | α � �� = exp 1 − − 1 β sign ( t ) tan , 2 0 < α ≤ 2 , − 1 ≤ β ≤ 1 . The scale F > 0 . Z ∼ S ( α, β, F ) If Z 1 , Z 2 ∼ S ( α, β, 1) , independent, then for any C 1 ≥ 0 , C 2 ≥ 0 , Z = C 1 Z 1 + C 2 Z 2 ∼ S ( α, β, F = C α 1 + C α 2 ) .

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 19 If C 1 and C 2 do not have the same signs, the “stability” does not hold. Let Z = C 1 Z 1 − C 2 Z 2 , with C 1 ≥ 0 and C 2 ≥ 0 . Because F − Z 2 ( t ) = F Z 2 ( − t ) , √ � πα � −| C 1 t | α � �� F Z ( t ) = exp 1 − − 1 β sign ( t ) tan 2 √ � πα � −| C 2 t | α � �� × exp 1 + − 1 β sign ( t ) tan , 2 Does NOT represent a stable law, unless β = 0 or α = 2 , 0+ . Symmetric ( β = 0 ) projections work for any data, but if data are non-negative, benefits of skewed projection are enormous.

Compressed Counting Ping Li Department of Statistical Science - PowerPoint PPT Presentation

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 1 Compressed Counting Ping Li Department of Statistical Science Faculty of Computing and Information Science Cornell University Ithaca, NY 14850 March, 2009 Ping Li

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

counting colours in compressed strings Travis Gagie Juha K arkk ainen CPM 2011 counting

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting and Probability Whats to come? Counting and Probability Whats to come?

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

Introduction to Compressed Sensing Gitta Kutyniok (Institut f ur Mathematik, Technische

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

Foundations of Compressed Sensing Mike Davies Edinburgh Compressed Sensing research group (E-CoS)

lands! Serve the LORD with gladness; Come before His presence with singing. Psalm 100:1-2 Psalm

MICHAEL POLANYI And The STUDY GROUP for FOUNDATIONS of CULTURAL UNITY and the STUDY GROUP on

YES FAQ Chat questions will be o answered Slides will be shared o Handouts will be

God ordains that his children: a. Walk in sorrow and pain, sometimes because of sin (Num

Welcome to 1989! Leipzig Fair, March 1989 Fashion! Music! Movies! March 2, 1989: ARD im

Bride Price and the Well Being of Women Sara Lowes and Nathan Nunn Bocconi University and Harvard

H EAVEN S M ESSENGERS D& C 130 BYU G RAFFITI Police Beat (Daily Universe): Graffiti was

Second Quarter 2015 Earnings Call August 3, 2015 Genesee & Wyoming Inc. 1 Forward-Looking

Sambuz

Useful Links

Newsletter

Mail Us

Compressed Counting Ping Li Department of Statistical Science - PowerPoint PPT Presentation

Ping Li Compressed Counting, Data Streams March 2009 DIMACS 2009 1 Compressed Counting Ping Li Department of Statistical Science Faculty of Computing and Information Science Cornell University Ithaca, NY 14850 March, 2009 Ping Li

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

counting colours in compressed strings Travis Gagie Juha K arkk ainen CPM 2011 counting

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Decoding in Compressed Sensing Ronald DeVore USC, 2008 p. 1/33 Discrete Compressed Sensing R

44 Days And Counting 44 Days And Counting 2010 World Equestrian Games Overview September 25

Counting is Hard: Probabilistically Counting Views at Reddit Krishnan Chandra, Data Engineer

Counting Basic 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 of 1 10/02/2003 04:00 PM 1

Counting CS1200, CSE IIT Madras Meghana Nasre April 2, 2020 CS1200, CSE IIT Madras Meghana

Counting CS1200, CSE IIT Madras Meghana Nasre March 26, 2020 CS1200, CSE IIT Madras Meghana

Counting and Probability Whats to come? Counting and Probability Whats to come?

AIR CHALLENGE SUMMARY SUSTAINABILITY NORTH AMERICA WHY COMPRESSED AIR? Inappropriate

Introduction to Compressed Sensing Gitta Kutyniok (Institut f ur Mathematik, Technische

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

Foundations of Compressed Sensing Mike Davies Edinburgh Compressed Sensing research group (E-CoS)

lands! Serve the LORD with gladness; Come before His presence with singing. Psalm 100:1-2 Psalm

MICHAEL POLANYI And The STUDY GROUP for FOUNDATIONS of CULTURAL UNITY and the STUDY GROUP on

YES FAQ Chat questions will be o answered Slides will be shared o Handouts will be

God ordains that his children: a. Walk in sorrow and pain, sometimes because of sin (Num

Welcome to 1989! Leipzig Fair, March 1989 Fashion! Music! Movies! March 2, 1989: ARD im

Bride Price and the Well Being of Women Sara Lowes and Nathan Nunn Bocconi University and Harvard

H EAVEN S M ESSENGERS D&amp; C 130 BYU G RAFFITI Police Beat (Daily Universe): Graffiti was

Second Quarter 2015 Earnings Call August 3, 2015 Genesee &amp; Wyoming Inc. 1 Forward-Looking

Sambuz

Useful Links

Newsletter

Mail Us

H EAVEN S M ESSENGERS D& C 130 BYU G RAFFITI Police Beat (Daily Universe): Graffiti was

Second Quarter 2015 Earnings Call August 3, 2015 Genesee & Wyoming Inc. 1 Forward-Looking