L ECTURE 10 Last time Multipurpose sketches Count-min and - - PowerPoint PPT Presentation

โ–ถ
l ecture 10
SMART_READER_LITE
LIVE PREVIEW

L ECTURE 10 Last time Multipurpose sketches Count-min and - - PowerPoint PPT Presentation

Sublinear Algorithms L ECTURE 10 Last time Multipurpose sketches Count-min and count-sketch Range queries, heavy hitters, quantiles Today Limitations of streaming algorithms Communication complexity 10/6/2020 Sofya


slide-1
SLIDE 1

10/6/2020

Sublinear Algorithms

LECTURE 10

Last time

  • Multipurpose sketches
  • Count-min and count-sketch
  • Range queries, heavy hitters, quantiles

Today

  • Limitations of streaming algorithms
  • Communication complexity

Sofya Raskhodnikova;Boston University

slide-2
SLIDE 2

Recall: Frequency Moments Estimation

Input: a stream ๐‘1, ๐‘2, โ€ฆ , ๐‘๐‘› โˆˆ ๐‘œ ๐‘›

  • The frequency vector of the stream is ๐‘” = (๐‘”

1, โ€ฆ , ๐‘” ๐‘œ),

where ๐‘”

๐‘— is the number of times ๐‘— appears in the stream

  • The ๐‘ž-th frequency moment is ๐บ

๐‘ž =

๐‘”

๐‘ž ๐‘ž = ฯƒ๐‘—=1 ๐‘œ

๐‘”

๐‘— ๐‘ž

๐บ0 is the number of nonzero entries of ๐‘” (# of distinct elements) ๐บ

1 = ๐‘› (# of elements in the stream)

๐บ2 = ๐‘”

2 2 is a measure of non-uniformity

used e.g. for anomaly detection in network analysis ๐บ

โˆž = max ๐‘—

๐‘”

๐‘— is the most frequent element

We obtained streaming algorithms for ๐บ

0, ๐บ 1, ๐บ 2.

What about ๐บ

3 to ๐บ โˆž?

2

slide-3
SLIDE 3

Communication Complexity

A Method for Proving Lower Bounds

slide-4
SLIDE 4

(Randomized) Communication Complexity

4

Compute ๐ท ๐‘ฆ, ๐‘ง 0100 11 001 โ‹ฏ 0011 Bob Alice ๐ฝ๐‘œ๐‘ž๐‘ฃ๐‘ข: ๐‘ฆ Input: ๐‘ง 1101000101110101110101010110โ€ฆ ๐‘‡โ„Ž๐‘๐‘ ๐‘“๐‘’ ๐‘ ๐‘๐‘œ๐‘’๐‘๐‘› ๐‘ก๐‘ข๐‘ ๐‘—๐‘œ๐‘• Goal: minimize the number of bits exchanged.

  • Communication complexity of a protocol is the maximum number of bits

exchanged by the protocol.

  • Communication complexity of a function ๐ท, denoted ๐‘†(๐ท), is the

communication complexity of the best protocol for computing C.

Partially based on slides by Eric Blais

slide-5
SLIDE 5

Example: Set Disjointness ๐ธ๐ฝ๐‘‡๐พ๐’

5

Theorem [Kalyanasundaram Schmitger 92, Razborov 92] ๐‘† DISJ๐‘™ โ‰ฅ ฮฉ ๐‘™ for all ๐‘™ โ‰ค

๐‘œ 2.

Compute ๐ธ๐ฝ๐‘‡๐พ๐‘™ ๐‘‡, ๐‘ˆ = แ‰Š๐’ƒ๐’…๐’…๐’‡๐’’๐’– if ๐‘‡ โˆฉ ๐‘ˆ = โˆ… ๐’”๐’‡๐’Œ๐’‡๐’…๐’–

  • therwise

Bob Alice ๐ฝ๐‘œ๐‘ž๐‘ฃ๐‘ข: ๐‘‡ โІ [๐‘œ], ๐‘‡ = ๐‘™. Input: ๐‘ˆ โІ [๐‘œ], ๐‘ˆ = ๐‘™ 1101000101110101110101010110โ€ฆ

slide-6
SLIDE 6

One-Way Communication Complexity

6

Compute ๐ท ๐‘ฆ, ๐‘ง ๐‘›1 Bob Alice ๐ฝ๐‘œ๐‘ž๐‘ฃ๐‘ข: ๐‘ฆ Input: ๐‘ง 1101000101110101110101010110โ€ฆ ๐‘‡โ„Ž๐‘๐‘ ๐‘“๐‘’ ๐‘ ๐‘๐‘œ๐‘’๐‘๐‘› ๐‘ก๐‘ข๐‘ ๐‘—๐‘œ๐‘• Goal: minimize the number of bits Alice sends to Bob. One-way communication complexity of a function ๐ท, denoted ๐‘†โ†’(๐ท), is the communication complexity of the best one-way protocol for computing C.

slide-7
SLIDE 7

3-Player One-Way Communication Complexity

7

๐‘‡โ„Ž๐‘๐‘ ๐‘“๐‘’ ๐‘ ๐‘๐‘œ๐‘’๐‘๐‘› ๐‘ก๐‘ข๐‘ ๐‘—๐‘œ๐‘• Goal: minimize ๐‘›1 + |๐‘›2|.

  • Require correct output w.p. at least 2/3 over the random string

Carol Alice ๐ฝ๐‘œ๐‘ž๐‘ฃ๐‘ข: ๐‘ฆ Input: ๐‘จ Input: ๐‘ง ๐‘›1 ๐‘›2 1101000101110101110101010110โ€ฆ Bob Compute ๐ท ๐‘ฆ, ๐‘ง, ๐‘จ

slide-8
SLIDE 8

Converting Streaming Algorithm to CC Protocol

8

An ๐‘ก-bit algorithm ๐ต for ๐“  gives a 2๐‘ก-bit protocol for ๐ท

  • Alice runs ๐ต on ๐‘ก1 and sends memory state, ๐‘›1, to Bob
  • Bob instantiates ๐ต with ๐‘›1, runs ๐ต on ๐‘ก2, sends memory state, ๐‘›2, to Carol
  • Carol instantiates ๐ต with ๐‘›2, runs ๐ต on ๐‘ก3 to get ๐“ (๐‘ก1 โˆ˜ ๐‘ก2 โˆ˜ ๐‘ก3) and

computes ๐ท(๐‘ฆ, ๐‘ง, ๐‘จ) ๐ฝ๐‘œ๐‘ž๐‘ฃ๐‘ข: ๐‘ฆ Input: ๐‘จ Input: ๐‘ง ๐‘›1 ๐‘›2 Let ๐“  be a streaming problem.

  • Suppose there is a transformation ๐‘ฆ โ†’ ๐‘ก1, ๐‘ง โ†’ ๐‘ก2, ๐‘จ โ†’ ๐‘ก3 such that

๐“ (๐‘ก1 โˆ˜ ๐‘ก2 โˆ˜ ๐‘ก3) suffices to compute ๐ท(๐‘ฆ, ๐‘ง, ๐‘จ) Compute ๐ท ๐‘ฆ, ๐‘ง, ๐‘จ ๐‘ก1 ๐‘ก2 ๐‘ก3

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

slide-9
SLIDE 9

Converting Streaming Algorithm to CC Protocol

9

An ๐‘ก-bit algorithm ๐ต for ๐“  gives a 2๐‘ก-bit protocol for ๐ท

  • If there are ๐‘ž players than the protocol uses ๐‘ž โˆ’ 1 ๐‘ก bits
  • A lower bound ๐‘€ for computing ๐ท implies ๐‘ = ฮฉ

๐‘€ ๐‘ž

๐ฝ๐‘œ๐‘ž๐‘ฃ๐‘ข: ๐‘ฆ Input: ๐‘จ Input: ๐‘ง ๐‘›1 ๐‘›2 Let ๐“  be a streaming problem.

  • Suppose there is a transformation ๐‘ฆ โ†’ ๐‘ก1, ๐‘ง โ†’ ๐‘ก2, ๐‘จ โ†’ ๐‘ก3 such that

๐“ (๐‘ก1 โˆ˜ ๐‘ก2 โˆ˜ ๐‘ก3) suffices to compute ๐ท(๐‘ฆ, ๐‘ง, ๐‘จ) Compute ๐ท ๐‘ฆ, ๐‘ง, ๐‘จ ๐‘ก1 ๐‘ก2 ๐‘ก3

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

slide-10
SLIDE 10

A lower bound using CC method

Approximating ๐บ

โˆž

slide-11
SLIDE 11

Application: Approximating ๐‘ฎโˆž

Proof: Reduction from Set Disjointness On input ๐‘ฆ, ๐‘ง โˆˆ 0,1 ๐‘œ, players generate ๐‘ก1 = {๐‘˜: ๐‘ฆ๐‘˜ = 1} and ๐‘ก2 = {๐‘˜: ๐‘ง๐‘˜ = 1}

  • Then ๐บ

โˆž = 1 if ๐‘ฆ, ๐‘ง represent disjoint sets, and ๐บ โˆž = 2, otherwise.

  • An ๐‘ก-space algorithm implies an ๐‘ก-bit protocol:

๐‘ก = ฮฉ ๐‘œ

11

Example: 0 0 1 1 0 0 (1 0 1 0 1 0) โ†’ โŒฉ3,4; 1,3,5โŒช

by communication complexity of ๐‘‡๐‘“๐‘ข ๐ธ๐‘—๐‘ก๐‘˜๐‘๐‘—๐‘œ๐‘ข๐‘œ๐‘“๐‘ก๐‘ก Output โ‰ค 4/3

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

Output โ‰ฅ 3/2

Theorem Every algorithm that computes 4/3-approximation of ๐บ

โˆž

(w.p. โ‰ฅ2/3) needs ฮฉ(๐‘œ) space.

slide-12
SLIDE 12

A lower bound using CC method

Computing the median of a stream

slide-13
SLIDE 13

Index

  • Alice gets an ๐‘œ-bit string ๐‘ฆ, and Bob gets an index ๐‘˜ โˆˆ [๐‘œ].
  • Define ๐ฝ๐‘œ๐‘’๐‘“๐‘ฆ(๐‘ฆ, ๐‘˜) = ๐‘ฆ๐‘˜.
  • One-way communication complexity of ๐ฝ๐‘œ๐‘’๐‘“๐‘ฆ(๐‘ฆ, ๐‘˜) is ฮฉ ๐‘œ

13

slide-14
SLIDE 14

Application: Finding the Median of a Stream

14

Proof: Reduction from Index.

  • On input ๐‘ฆ โˆˆ 0,1 ๐‘œ, Alice generates ๐‘ก1 = {2๐‘— + ๐‘ฆ๐‘—: ๐‘— โˆˆ [๐‘œ]}
  • On input ๐‘˜ โˆˆ [๐‘œ], Bob generates

๐‘ก2 = ๐‘œ โˆ’ ๐‘˜ copies of 0 and ๐‘˜ โˆ’ 1 copies of 2๐‘œ + 2

  • Then ๐‘›๐‘“๐‘’๐‘—๐‘๐‘œ ๐‘ก1 โˆ˜ ๐‘ก2 = 2๐‘˜ + ๐‘ฆ๐‘˜ and Index ๐‘ฆ, ๐‘˜ = 2๐‘˜ + ๐‘ฆ๐‘˜ ๐‘›๐‘๐‘’ 2
  • An ๐‘ก-space algorithm implies an ๐‘ก-bit protocol:

๐‘ก = ฮฉ ๐‘œ

Theorem Every algorithm that computes the median of an (2๐‘œ โˆ’ 1)- element stream exactly (w.p. โ‰ฅ2/3) needs ฮฉ(๐‘œ) space.

Example: 0 0 1 1 0 1 1 โ†’ โŒฉ2,4,7,9,10,13,15โŒช Example: ๐‘˜ = 2 โ†’ โŒฉ0,0,0,0,0,16โŒช

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

by 1-way communication complexity of ๐ฝ๐‘œ๐‘’๐‘“๐‘ฆ

slide-15
SLIDE 15

A lower bound using CC method

Approximating Frequency Moments

[Bar-Yossef, Jayram, Kumar, Sivakumar 04]

slide-16
SLIDE 16

Multi-party Set Disjointness

  • Consider a ๐‘ž ร— ๐‘œ binary matrix ๐‘ where each column has weight 0, 1 or ๐‘ž

1 1 1 1 1 1 1 1

  • The input of player ๐‘— is row ๐‘— of ๐‘

๐ธ๐ฝ๐‘‡๐พ ๐‘ž ๐‘ = แ‰Š0 if there is a column of 1s 1

  • therwise
  • Communication complexity of ๐ธ๐ฝ๐‘‡๐พ ๐‘ž ๐‘ is ฮฉ

๐‘œ ๐‘ž

16

1 4 3 5 6

Example:

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

slide-17
SLIDE 17

Application: Frequency Moments for ๐’ > ๐Ÿ‘

Proof: Reduction from multi-party Set Disjointness

  • On input ๐‘ โˆˆ 0,1 ๐‘žร—๐‘œ, player ๐‘— generates ๐‘ก๐‘— = {๐‘˜: ๐‘๐‘—๐‘˜ = 1}
  • If all columns have weight 0 or 1 then ๐บ๐‘™ = ฯƒ๐‘—=1

๐‘œ

๐‘”

๐‘— ๐‘™ โ‰ค ๐‘œ

  • If there is a column of weight ๐‘ž then ๐บ๐‘™ โ‰ฅ ๐‘ž๐‘™
  • A 2-approximation of ๐บ๐‘™ distinguishes the cases if ๐‘ž๐‘™ > 4๐‘œ โ‡” ๐‘ž > 4๐‘œ

1 ๐‘™

  • An ๐‘ก-space algorithm implies ๐‘ก(๐‘ž โˆ’ 1)-bit protocol:

๐‘ก = ฮฉ ๐‘œ ๐‘ž2 = ฮฉ ๐‘œ 4๐‘œ

2 ๐‘™

= ฮฉ ๐‘œ1โˆ’2

๐‘™

17

Every algorithm that 2-approximaes ๐บ๐‘™ (w.p. โ‰ฅ2/3) needs ฮฉ ๐‘œ1โˆ’2

๐‘™

space Thm.

1 4 3 5 6

Example: 1 1 1 1 1 1 1 1 โ†’ โŒฉ3,4; 1,3,5; 3; 3,6โŒช

by communication complexity of ๐ธ๐ฝ๐‘‡๐พ(๐‘ž) for constant ๐‘™

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

slide-18
SLIDE 18

A lower bound using CC method

Distinct Elements

slide-19
SLIDE 19

Gap Hamming

  • Alice and Bob get ๐‘œ-bit strings ๐‘ฆ and ๐‘ง, respectively.
  • Hamming distance ๐ผ๐‘๐‘›(๐‘ฆ, ๐‘ง) is the number of positions on which ๐‘ฆ and ๐‘ง

differ.

  • Output: ๐ผ๐‘๐‘›(๐‘ฆ, ๐‘ง) with additive error ๐‘œ w.p. โ‰ฅ 2/3
  • Communication complexity of ๐ผ๐‘๐‘›(๐‘ฆ, ๐‘ง) is ฮฉ ๐‘œ

even when |๐‘ฆ| and |๐‘ง| are known to both players

19

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

slide-20
SLIDE 20

Application: Distinct Elements

Proof: Reduction from Gap Hamming On input ๐‘ฆ, ๐‘ง โˆˆ 0,1 ๐‘œ, players generate ๐‘ก1 = {๐‘˜: ๐‘ฆ๐‘˜ = 1} and ๐‘ก2 = {๐‘˜: ๐‘ง๐‘˜ = 1}

  • Then 2๐บ0 = ๐‘ฆ + ๐‘ง + ๐ผ๐‘๐‘›(๐‘ฆ, ๐‘ง)
  • When |๐‘ฆ| is known to Bob,

(1 + ๐œ)-approximation of ๐บ0 gives an additive approximation to Ham ๐‘ฆ, ๐‘ง ๐œ โ‹… ๐‘ฆ + ๐‘ง + ๐ผ๐‘๐‘› ๐‘ฆ, ๐‘ง 2 โ‰ค ๐œ๐‘œ โ‰ค ๐‘œ

  • An ๐‘ก-space algorithm implies an ๐‘ก-bit protocol:

๐‘ก = ฮฉ ๐‘œ = ฮฉ 1 ๐œ2

20

Thm. Example: 0 0 1 1 0 0 (1 0 1 0 1 0) โ†’ โŒฉ3,4; 1,3,5โŒช

by communication complexity of ๐ป๐‘๐‘ž ๐ผ๐‘๐‘›๐‘›๐‘—๐‘œ๐‘• for ๐œ โ‰ค 1/ ๐‘œ

Based on Andrew McGregorโ€™s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf

Every algorithm (1 + ๐œ)-approximing ๐บ0 (w.p. โ‰ฅ2/3) needs ฮฉ 1/๐œ2 space