Joshua Brody and Amit Chakrabarti Dartmouth College 24 th CCC, 2009, - PowerPoint PPT Presentation

Gap-Hamming Lower Bound July 18, 2009 Lower Bounds for Gap-Hamming-Distance and Consequences for Data Stream Algorithms Joshua Brody and Amit Chakrabarti Dartmouth College 24 th CCC, 2009, Paris Joshua Brody 1

Gap-Hamming Lower Bound July 18, 2009 Counting Distinct Elements in a Data Stream 3 14 1 3 9 9 4 2 1 5 2 3 6 Input: Stream of integers σ = < a 1 , . . . , a m > Output: F 0 := number of distinct elements in σ Joshua Brody 2

Gap-Hamming Lower Bound July 18, 2009 Counting Distinct Elements in a Data Stream 3 14 1 3 9 9 4 2 1 5 2 3 6 Input: Stream of integers σ = < a 1 , . . . , a m > Output: F 0 := number of distinct elements in σ Goal: Minimize space used to compute F 0 Joshua Brody 2-a

Gap-Hamming Lower Bound July 18, 2009 Previous Streaming Results Frequency Moments: F k = � n i =1 freq( i ) k [Alon-Matias-Szegedy ’96] Joshua Brody 3

Gap-Hamming Lower Bound July 18, 2009 Previous Streaming Results Frequency Moments: F k = � n i =1 freq( i ) k [Alon-Matias-Szegedy ’96] • Ω( n ) space unless randomization and approximation used • Upper, lower bounds for randomized algorithms that approximate F k • Spawned lots of research, won 2005 G¨ odel Prize Joshua Brody 3-a

Gap-Hamming Lower Bound July 18, 2009 Previous Streaming Results Frequency Moments: F k = � n i =1 freq( i ) k [Alon-Matias-Szegedy ’96] • Ω( n ) space unless randomization and approximation used • Upper, lower bounds for randomized algorithms that approximate F k • Spawned lots of research, won 2005 G¨ odel Prize � � � output � � One-pass, randomized, ε -approximate: answer − 1 � ≤ ε Joshua Brody 3-c

Gap-Hamming Lower Bound July 18, 2009 Previous Streaming Results Frequency Moments: F k = � n i =1 freq( i ) k [Alon-Matias-Szegedy ’96] • Ω( n ) space unless randomization and approximation used • Upper, lower bounds for randomized algorithms that approximate F k • Spawned lots of research, won 2005 G¨ odel Prize � � � output � � One-pass, randomized, ε -approximate: answer − 1 � ≤ ε Status as of Jan 2009: • Space upper bound: � O ( ε − 2 ) • Space lower bound: � Ω( ε − 2 ) • Also hold for other problems, e.g. empirical entropy Do multiple passes help? Joshua Brody 3-d

Gap-Hamming Lower Bound July 18, 2009 Previous Streaming Results Frequency Moments: F k = � n i =1 freq( i ) k [Alon-Matias-Szegedy ’96] • Ω( n ) space unless randomization and approximation used • Upper, lower bounds for randomized algorithms that approximate F k • Spawned lots of research, won 2005 G¨ odel Prize � � � output � � One-pass, randomized, ε -approximate: answer − 1 � ≤ ε Status as of Jan 2009: • Space upper bound: � O ( ε − 2 ) • Space lower bound: � Ω( ε − 2 ) • Also hold for other problems, e.g. empirical entropy Do multiple passes help? If not, why not? Joshua Brody 3-e

Gap-Hamming Lower Bound July 18, 2009 The Gap-Hamming-Distance Problem Input: Alice gets x ∈ { 0 , 1 } n , Bob gets y ∈ { 0 , 1 } n . Output: 2 + √ n • ghd ( x, y ) = 1 if ∆( x, y ) > n 2 − √ n • ghd ( x, y ) = 0 if ∆( x, y ) < n Problem: Design randomized, constant error protocol to solve this Cost: Worst case number of bits communicated 0 1 0 0 1 0 1 1 0 0 0 1 x = y = 0 0 0 0 0 0 1 1 1 0 0 1 √ √ n = 12; ∆( x, y ) = 3 ∈ [6 − 12 , 6 + 12] Joshua Brody 4

Gap-Hamming Lower Bound July 18, 2009 The Reductions E.g., Distinct Elements (Other problems: similar) 0 1 0 0 1 0 1 1 0 0 0 1 x = ) ) ) 0 0 1 ) ) ) ) ) ) ) ) ) 0 1 0 0 1 0 0 0 0 , , , σ : 0 1 2 , , , , , , , , , 1 2 3 4 5 6 9 8 9 1 1 1 ( ( ( ( ( ( ( ( ( ( ( ( y = 0 0 0 0 0 0 1 1 1 0 0 1 ) ) ) 0 0 1 ) ) ) ) ) ) ) ) ) 0 0 0 0 0 0 0 0 1 , , , τ : 0 1 2 , , , , , , , , , 1 2 3 4 5 6 9 8 9 1 1 1 ( ( ( ( ( ( ( ( ( ( ( ( Alice: x �− → σ = � (1 , x 1 ) , (2 , x 2 ) , . . . , ( n, x n ) � Bob: y �− → τ = � (1 , y 1 ) , (2 , y 2 ) , . . . , ( n, y n ) �  2 − √ n, or  < 3 n 1 Notice: F 0 ( σ ◦ τ ) = n + ∆( x, y ) = Set ε = √ n . 2 + √ n.  > 3 n Joshua Brody 5

Gap-Hamming Lower Bound July 18, 2009 Communication to Streaming p -pass streaming algorithm = ⇒ (2 p − 1) -round communication protocol messages = memory contents of streaming algorithm And Thus Previous results [Indyk-Woodruff’03] , [Woodruff’04] , [C.-Cormode-McGregor’07] : • For one-round protocols, R → ( ghd ) = Ω( n ) • Implies the � Ω( ε − 2 ) streaming lower bounds Joshua Brody 6

Gap-Hamming Lower Bound July 18, 2009 Communication to Streaming p -pass streaming algorithm = ⇒ (2 p − 1) -round communication protocol messages = memory contents of streaming algorithm And Thus Previous results [Indyk-Woodruff’03] , [Woodruff’04] , [C.-Cormode-McGregor’07] : • For one-round protocols, R → ( ghd ) = Ω( n ) • Implies the � Ω( ε − 2 ) streaming lower bounds Key open questions: • What is the unrestricted randomized complexity R( ghd ) ? • Better algorithm for Distinct Elements (or F k , or H ) using two passes? Joshua Brody 6-a

Gap-Hamming Lower Bound July 18, 2009 Our Results Previous Results (Communication): • One-round (one-way) lower bound: R → ( ghd ) = Ω( n ) [Woodruff’04] • Simplification, clever reduction from index [Jayram-Kumar-Sivakumar] • Multi-round case: R( ghd ) = Ω( √ n ) [Folklore] Joshua Brody 7

Gap-Hamming Lower Bound July 18, 2009 Our Results Previous Results (Communication): • One-round (one-way) lower bound: R → ( ghd ) = Ω( n ) [Woodruff’04] • Simplification, clever reduction from index [Jayram-Kumar-Sivakumar] Hard distribution “contrived,” non-uniform • Multi-round case: R( ghd ) = Ω( √ n ) [Folklore] Joshua Brody 7-a

Gap-Hamming Lower Bound July 18, 2009 Our Results Previous Results (Communication): • One-round (one-way) lower bound: R → ( ghd ) = Ω( n ) [Woodruff’04] • Simplification, clever reduction from index [Jayram-Kumar-Sivakumar] Hard distribution “contrived,” non-uniform • Multi-round case: R( ghd ) = Ω( √ n ) [Folklore] Reduction from disjointness using “repetition code” Hard distribution again far from uniform Joshua Brody 7-b

Gap-Hamming Lower Bound July 18, 2009 Our Results Previous Results (Communication): • One-round (one-way) lower bound: R → ( ghd ) = Ω( n ) [Woodruff’04] • Simplification, clever reduction from index [Jayram-Kumar-Sivakumar] Hard distribution “contrived,” non-uniform • Multi-round case: R( ghd ) = Ω( √ n ) [Folklore] Reduction from disjointness using “repetition code” Hard distribution again far from uniform What we show: • Theorem 1: Ω( n ) lower bound for any O (1) -round protocol Holds under uniform distribution Joshua Brody 7-c

Gap-Hamming Lower Bound July 18, 2009 Our Results Previous Results (Communication): • One-round (one-way) lower bound: R → ( ghd ) = Ω( n ) [Woodruff’04] • Simplification, clever reduction from index [Jayram-Kumar-Sivakumar] Hard distribution “contrived,” non-uniform • Multi-round case: R( ghd ) = Ω( √ n ) [Folklore] Reduction from disjointness using “repetition code” Hard distribution again far from uniform What we show: • Theorem 1: Ω( n ) lower bound for any O (1) -round protocol Holds under uniform distribution • Theorem 2: one-round, deterministic: D → ( ghd ) = n − Θ( √ n log n ) • Theorem 3: R → ( ghd ) = Ω( n ) (simpler proof, uniform distrib) (independently proved by [Woodruff’09] ) Joshua Brody 7-d

Gap-Hamming Lower Bound July 18, 2009 Technique: Round Elimination Base Case Lemma: There is no “nice” 0 -round ghd protocol. Round Elimination Lemma: If there is a “nice” k -round ghd protocol, then there is a “nice” ( k − 1) -round ghd protocol. • The ( k − 1) -round protocol will be solving a “simpler” problem • Parameters degrade with each round elimination step Joshua Brody 8

Gap-Hamming Lower Bound July 18, 2009 Technique: Round Elimination Base Case Lemma: There is no 0 -round ghd protocol with error ε < 1 2 . Round Elimination Lemma: If there is a “nice” k -round ghd protocol, then there is a “nice” ( k − 1) -round ghd ′ protocol. Joshua Brody 8

Gap-Hamming Lower Bound July 18, 2009 Technique: Round Elimination Base Case Lemma: There is no 0 -round ghd protocol with error ε < 1 2 . Round Elimination Lemma: If there is a “nice” k -round ghd protocol, then there is a “nice” ( k − 1) -round ghd ′ protocol. • The ( k − 1) -round protocol will be solving a “simpler” problem • Parameters degrade with each round elimination step Joshua Brody 8-a

Gap-Hamming Lower Bound July 18, 2009 Parametrized Gap-Hamming-Distance Problem The problem:  if ∆( x, y ) ≥ n/ 2 + c √ n ,  1 ,   if ∆( x, y ) ≤ n/ 2 − c √ n , ghd c,n ( x, y ) = 0 ,    ⋆ , otherwise. Joshua Brody 9

Joshua Brody and Amit Chakrabarti Dartmouth College 24 th CCC, 2009, - PowerPoint PPT Presentation

Gap-Hamming Lower Bound July 18, 2009 Lower Bounds for Gap-Hamming-Distance and Consequences for Data Stream Algorithms Joshua Brody and Amit Chakrabarti Dartmouth College 24 th CCC, 2009, Paris Joshua Brody 1 Gap-Hamming Lower Bound July

Amit Chakrabarti (Joint work with Joshua Brody) Dartmouth College DIMACS/DyDAn Workshop, March

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass

Mobile IPv6 Mobile IPv6 Connectathon 2003 2003 Connectathon IETF56 IETF56 Interoperability

Unsupervised Methods for NLP WSD Samuel Brody Department of Biomedical Informatics Columbia

Targeted agents in Targeted agents in Gynecologic Cancer Gynecologic Cancer Amit M Oza Amit M

Robust Lower Bounds for Communication and Stream Computation Amit Chakrabarti Dartmouth

Optimisation While Streaming Amit Chakrabarti Dartmouth College Joint work with S. Kale, A.

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear

Submodular Maximization in a Data Streaming Setting Amit Chakrabarti Dartmouth College Hanover,

Amit Chakrabarti Dartmouth College Main result joint with Oded Regev, Tel Aviv University

JOSHUA GREGORY Chef and hunter| Head chef, Muse Kitchen JOSHUA GREGORY Chef and

The Probabilistic Method Week 6: Expectation, Variance, and Beyond Joshua Brody CS49/Math59

The Probabilistic Method Week 6: Expectation, Variance, and Beyond Joshua Brody CS49/Math59

The Probabilistic Method Week 3: Asymptotic Analysis Joshua Brody CS49/Math59 Fall 2015

The Probabilistic Method Joshua Brody CS49/Math59 Fall 2015 Traditional Lectures Little

The Probabilistic Method Week 1: Introduction to Probability Theory Joshua Brody CS49/Math59

Translating Evidence Into Practice Susan E. Shapiro, PhD, RN Associate Chief Nursing Officer,

Stochastic Programming Models with Decision Dependent Probabilities David L. Woodruff Graduate

Communication Complexity David P. Woodruff IBM Almaden Talk Outline 1. Information Theory

Randomness in Computing L ECTURE 27 Last time Stationary distributions Random walks on

On the communication complexity of sparse set disjointness and exists-equal problems Mert

Navigation and Zooming Navigation and Zooming Histories in a Zoomable Zoomable User Interface

Examining Service Excellence in Higher Education for Adult Learners: A Text-mining Analysis Lee

Local Naming and Scope These slides borrow heavily from Ben Woods