Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - - PowerPoint PPT Presentation

streaming set cover
SMART_READER_LITE
LIVE PREVIEW

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - - PowerPoint PPT Presentation

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear Algorithms Workshop JHU, Jan 2016 Combinatorial Optimisation Problems I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s:


slide-1
SLIDE 1

Streaming Set Cover

Amit Chakrabarti

Dartmouth College

Joint work with A. Wirth

Sublinear Algorithms Workshop JHU, Jan 2016

slide-2
SLIDE 2

Combinatorial Optimisation Problems

I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s: Approximation algorithms, hardness of approximation I 2010s: Space-constrained settings, e.g., streaming

slide-3
SLIDE 3

Set Cover

slide-4
SLIDE 4

Set Cover

slide-5
SLIDE 5

Set Cover with Sets Streamed

I Input: stream of m sets, each ⊆ [n] I Goal: cover universe [n] using as few sets as possible

slide-6
SLIDE 6

Set Cover with Sets Streamed

I Input: stream of m sets, each ⊆ [n] I Goal: cover universe [n] using as few sets as possible

  • Use sublinear (in m) space
  • Ideally O(n polylog n) ... “semi-streaming”
  • Need Ω(n log n) space to certify: for each item, who covered it?

Think m ≥ n

slide-7
SLIDE 7

Background and Related Work

Offline results:

I Best possible poly-time approx (1 ± o(1)) ln n [Johnson’74] [Slav´

ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14]

I Simple greedy strategy gets ln n-approx:

  • Repeatedly add set with highest contribution
  • Contribution := number of new elements covered
slide-8
SLIDE 8

Background and Related Work

Offline results:

I Best possible poly-time approx (1 ± o(1)) ln n [Johnson’74] [Slav´

ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14]

I Simple greedy strategy gets ln n-approx:

  • Repeatedly add set with highest contribution
  • Contribution := number of new elements covered

Streaming results:

I One pass semi-streaming O(√n) approx I This is best possible in one semi-streaming pass

[Emek-Ros´ en’14]

I O(log n) semi-streaming passes allow O(log n) approx

[Saha-Getoor’09] [Cormode-Karloff-Wirth’10]

slide-9
SLIDE 9

Background and Related Work

Offline results:

I Best possible poly-time approx (1 ± o(1)) ln n [Johnson’74] [Slav´

ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14]

I Simple greedy strategy gets ln n-approx:

  • Repeatedly add set with highest contribution
  • Contribution := number of new elements covered

Streaming results:

I One pass semi-streaming O(√n) approx I This is best possible in one semi-streaming pass

[Emek-Ros´ en’14]

I O(log n) semi-streaming passes allow O(log n) approx

[Saha-Getoor’09] [Cormode-Karloff-Wirth’10]

I There’s more: wait till the end!

[Nisan’02] [Demaine-Indyk-Mahabadi-Vakilian’14] [Indyk-M-V’16]

slide-10
SLIDE 10

Related Work: In Greater Detail

Algorithms using p passes, S space, giving α-approximation Upper bounds:

I p = 1, S = e

O(n), α = O(√n)

[Emek-Ros´ en’14]

I p = O(log n), S = e

O(n), α = O(log n)

[Cormode-Karloff-Wirth’10]

I S = e

O(mn1/Ω(log p)), α = O(p)

[Demaine-Indyk-Mahabadi-Vakilian’14]

I S = e

O(mn1/Ω(p)), α = O(p)

[Indyk-Mahabadi-Vakilian’16]

Lower bounds:

I p = 1, S = e

O(n) ⇒ α = Ω(n1/2−δ)

[Emek-Ros´ en’14]

I α < 1 2 log2 n ⇒ S = Ω(m)

[Nisan’02]

I α = O(1), deterministic ⇒ S = Ω(mn)

[Demaine-I-M-V’14]

I α = 1 ⇒ S = e

Ω(n1+1/(2(p+1)))

[Indyk-Mahabadi-Vakilian’16]

I p = 1, α = 3 2 ⇒ S = Ω(mn)

[Indyk-Mahabadi-Vakilian’16]

slide-11
SLIDE 11

Our Results

Upper bound

I With p passes, semi-streaming space, get O(n1/(p+1))-approx I Algorithm giving this approx based on very simple heuristic I Deterministic

Lower bound

I Randomised I In p passes, semi-streaming space, need Ω(n1/(p+1)/p2) approx I Upper bound tight for all constant p I Semi-streaming O(log n) approx requires Ω(log n/ log log n) passes

slide-12
SLIDE 12

Progressive Greedy Algorithm

Recall simple greedy:

I Repeatedly add set with highest contribution I Contribution := number of new elements covered

Progressive greedy:

I In first pass, add all sets with contribution ≥ n1−1/p I In second pass, add all sets with contribution ≥ n1−2/p I ... I ... I In pth pass, add all sets with contribution ≥ 1

slide-13
SLIDE 13

Progressive Greedy Algorithm

1: procedure GreedyPass(stream , threshold ⌧, set Sol, array Coverer) 2:

for each set Si in do

3:

C {x : Coverer[x] 6= 0} . the already covered elements

4:

if |Si \ C| ⌧ then . set’s contribution threshold

5:

Sol Sol [ {i}

6:

for each x 2 Si \ C do Coverer[x] i

7: procedure ProgGreedyNaive(stream , integer n, integer p 1) 8:

Coverer[1 . . . n] 0n; Sol ∅

9:

for j = 1 to p do GreedyPass(, n1−j/p, Sol, Coverer)

10:

  • utput Sol, Coverer
slide-14
SLIDE 14

Progressive Greedy: Analysis Idea

Consider p = 2 passes

I First pass: admit sets iff contribution ≥ √n I Thus, first pass adds at most √n sets to Sol

slide-15
SLIDE 15

Progressive Greedy: Analysis Idea

Consider p = 2 passes

I First pass: admit sets iff contribution ≥ √n I Thus, first pass adds at most √n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √n I Thus, Sol will cover the same using ≤ √n|Opt| sets

slide-16
SLIDE 16

Progressive Greedy: Analysis Idea

Consider p = 2 passes

I First pass: admit sets iff contribution ≥ √n I Thus, first pass adds at most √n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √n I Thus, Sol will cover the same using ≤ √n|Opt| sets

But wait, this uses two passes for O(√n) approx!

slide-17
SLIDE 17

Progressive Greedy: Analysis Idea

Consider p = 2 passes

I First pass: admit sets iff contribution ≥ √n I Thus, first pass adds at most √n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √n I Thus, Sol will cover the same using ≤ √n|Opt| sets

But wait, this uses two passes for O(√n) approx!

I Logic of last pass especially simple: add set if positive contrib I Can fold this into previous one

Final result: p passes, O(n1/(p+1))-approx

slide-18
SLIDE 18

Lower Bound Idea: One Pass

Reduce from index: Alice gets x ∈ {0, 1}n, Bob gets j ∈ [n], Alice talks to Bob, who must determine xj. Requires Ω(n)-bit message.

[Ablayev’96]

sets F Bob’s set

q

Alice’s

Universe F2

q

n = q2

slide-19
SLIDE 19

Lower Bound Idea: One Pass

Reduce from index: Alice gets x ∈ {0, 1}n, Bob gets j ∈ [n], Alice talks to Bob, who must determine xj. Requires Ω(n)-bit message.

[Ablayev’96]

sets F Bob’s set

q

Alice’s

Universe F2

q

n = q2 If Alice has Bob’s missing line, then |Opt| = 2, else |Opt| ≥ q

slide-20
SLIDE 20

Lower Bound Idea: One Pass

Reduce from index: Alice gets x ∈ {0, 1}n, Bob gets j ∈ [n], Alice talks to Bob, who must determine xj. Requires Ω(n)-bit message.

[Ablayev’96]

sets F Bob’s set

q

Alice’s

Universe F2

q

n = q2 If Alice has Bob’s missing line, then |Opt| = 2, else |Opt| ≥ q So Θ(√n) approx requires Ω(#lines) = Ω(q2) = Ω(n) space

slide-21
SLIDE 21

Next Steps

Goal: p semi-streaming passes require Ω(n1/(p+1)) approx

I Handle more passes I Increase space bound

slide-22
SLIDE 22

Next Steps

Goal: p semi-streaming passes require Ω(n1/(p+1)) approx

I Handle more passes

  • Can’t start from index, need harder communication problem

I Increase space bound

  • Need !(n) to rule out semi-streaming
slide-23
SLIDE 23

Tree Pointer Jumping

Multiplayer game tpjp+1,t defined on complete (p + 1)-level t-ary tree

I Pointer to child at each internal level-i node (known to Player i) I Bit at each leaf node (known to Player 1) I Goal: output (whp) bit reached by following pointers from root

Model: p rounds of communication Each round: player1, player2, . . . , playerp+1

1 0 0 1 1 1 1 Level Level 2 3 Level 1

Theorem: Longest message is Ω(t/p2) bits

[C.-Cormode-McGregor’08]

slide-24
SLIDE 24

Multi-Pass Set Cover: First Attempt

Two passes, reducing from tpj3,t, using universe F3

q (so n = q3) I Three players: Alice, Bob, Carol

  • Alice encodes leaf bits: lines in F3

q

  • Bob encodes lower pointers: planes in F3

q with a line deleted

  • Carol encodes root pointer: F3

q with a plane deleted

slide-25
SLIDE 25

Multi-Pass Set Cover: First Attempt

Two passes, reducing from tpj3,t, using universe F3

q (so n = q3) I Three players: Alice, Bob, Carol

  • Alice encodes leaf bits: lines in F3

q

  • Bob encodes lower pointers: planes in F3

q with a line deleted

  • Carol encodes root pointer: F3

q with a plane deleted

I (Carol set) ∪ (corresp. Bob set) = F3 q \ (a line) I If Alice has the missing line, then |Opt| = 3, else ⇒ |Opt| ≥ q

(*)

slide-26
SLIDE 26

Multi-Pass Set Cover: First Attempt

Two passes, reducing from tpj3,t, using universe F3

q (so n = q3) I Three players: Alice, Bob, Carol

  • Alice encodes leaf bits: lines in F3

q

  • Bob encodes lower pointers: planes in F3

q with a line deleted

  • Carol encodes root pointer: F3

q with a plane deleted

I (Carol set) ∪ (corresp. Bob set) = F3 q \ (a line) I If Alice has the missing line, then |Opt| = 3, else ⇒ |Opt| ≥ q

(*) How good is this?

slide-27
SLIDE 27

Multi-Pass Set Cover: First Attempt

Two passes, reducing from tpj3,t, using universe F3

q (so n = q3) I Three players: Alice, Bob, Carol

  • Alice encodes leaf bits: lines in F3

q

  • Bob encodes lower pointers: planes in F3

q with a line deleted

  • Carol encodes root pointer: F3

q with a plane deleted

I (Carol set) ∪ (corresp. Bob set) = F3 q \ (a line) I If Alice has the missing line, then |Opt| = 3, else ⇒ |Opt| ≥ q

(*) How good is this?

I Each pointer encoded by Bob can choose from only as many leaves as

there are lines in a specific plane = ⇒ t = Θ(q2) = Θ(n2/3)

slide-28
SLIDE 28

Multi-Pass Set Cover: First Attempt

Two passes, reducing from tpj3,t, using universe F3

q (so n = q3) I Three players: Alice, Bob, Carol

  • Alice encodes leaf bits: lines in F3

q

  • Bob encodes lower pointers: planes in F3

q with a line deleted

  • Carol encodes root pointer: F3

q with a plane deleted

I (Carol set) ∪ (corresp. Bob set) = F3 q \ (a line) I If Alice has the missing line, then |Opt| = 3, else ⇒ |Opt| ≥ q

(*) How good is this?

I Each pointer encoded by Bob can choose from only as many leaves as

there are lines in a specific plane = ⇒ t = Θ(q2) = Θ(n2/3)

I Implies space Ω(n2/3) for approx < q/3 = Θ(n1/3)

slide-29
SLIDE 29

Insight

1 0 0 1 1 1 1 space minus a plane plane minus a line line Carol Bob Alice

Too few lines in a plane...

slide-30
SLIDE 30

Insight

1 0 0 1 1 1 1 space minus a plane plane minus a line line Carol Bob Alice cubic curve quadric surface minus a curve space minus a surface

Too few lines in a plane... increase the degree!

slide-31
SLIDE 31

Edifices

Basic idea: Generalise affine plane to high-rank Buekenhout geometry

Pointer encoded as Xu \ Xv |Xleaf| ≥ q Xroot = (Fq)p+1 u v |Xz ∩ Xv| ≤ 2p z

slide-32
SLIDE 32

Edifices

Basic idea: Generalise affine plane to high-rank Buekenhout geometry

Pointer encoded as Xu \ Xv |Xleaf| ≥ q Xroot = (Fq)p+1 u v |Xz ∩ Xv| ≤ 2p z

I Universe Fp+1 q I Variety Xu at node u I u above v

= ⇒ Xu ⊇ Xv

slide-33
SLIDE 33

Edifices

Basic idea: Generalise affine plane to high-rank Buekenhout geometry

Pointer encoded as Xu \ Xv |Xleaf| ≥ q Xroot = (Fq)p+1 u v |Xz ∩ Xv| ≤ 2p z

I Universe Fp+1 q I Variety Xu at node u I u above v

= ⇒ Xu ⊇ Xv

I Leaf z with bit = 1

encoded as set Xz

slide-34
SLIDE 34

Edifices

Basic idea: Generalise affine plane to high-rank Buekenhout geometry

Pointer encoded as Xu \ Xv |Xleaf| ≥ q Xroot = (Fq)p+1 u v |Xz ∩ Xv| ≤ 2p z

I Universe Fp+1 q I Variety Xu at node u I u above v

= ⇒ Xu ⊇ Xv

I Leaf z with bit = 1

encoded as set Xz

I If player 1 has the

missing variety, then |Opt| = p + 1, else |Opt| ≥ q/(2p)

slide-35
SLIDE 35

Construction of an Edifice

Basic idea: Varieties at leaves are low-degree curves, at level 2 they are low-degree surfaces, and so on. Concern: Determining “cardinality” of algebraic variety over finite field is the stuff of difficult mathematics.

slide-36
SLIDE 36

Construction of an Edifice

Basic idea: Varieties at leaves are low-degree curves, at level 2 they are low-degree surfaces, and so on. Concern: Determining “cardinality” of algebraic variety over finite field is the stuff of difficult mathematics. Our Solution: Define varieties using equations of special format

I Coordinates (x, y1, y2, . . . , yp) I Equation at each edge of tree; at level i:

yi = a1y1 + · · · ai−1yi−1 + aifp+1−i(x) fj(x) = monic poly in Fq[x] of degree p + j

I Variety Xu defined by equations on root-to-u path

slide-37
SLIDE 37

Construction of an Edifice

Basic idea: Varieties at leaves are low-degree curves, at level 2 they are low-degree surfaces, and so on. Concern: Determining “cardinality” of algebraic variety over finite field is the stuff of difficult mathematics. Our Solution: Define varieties using equations of special format

I Coordinates (x, y1, y2, . . . , yp) I Equation at each edge of tree; at level i:

yi = a1y1 + · · · ai−1yi−1 + aifp+1−i(x) fj(x) = monic poly in Fq[x] of degree p + j Cardinality bound via much simpler mathematics.

I Schwartz-Zippel lemma I Linear independence arguments via row reduction

slide-38
SLIDE 38

Recap: Related Work and Our Results

Upper bounds (p passes, S space, ↵-approximation):

I p = 1, S = e

O(n), ↵ = O(pn) [Emek-Ros´ en’14]

I p = O(log n), S = e

O(n), ↵ = O(log n) [Cormode-Karloff-Wirth’10]

I S = e

O(mn1/Ω(log p)), ↵ = O(p) [Demaine-Indyk-Mahabadi-Vakilian’14]

I S = e

O(mn1/Ω(p)), ↵ = O(p) [Indyk-Mahabadi-Vakilian’16]

I S = e

O(n), ↵ = O(pn1/(p+1)) [this work] Lower bounds:

I p = 1, S = e

O(n) ) ↵ = Ω(n1/2−δ) [Emek-Ros´ en’14]

I ↵ < 1

2 log2 n ) S = Ω(m)

[Nisan’02]

I ↵ = O(1), deterministic ) S = Ω(mn)

[Demaine-I-M-V’14]

I ↵ = 1 ) S = e

Ω(n1+1/(2(p+1))) [Indyk-Mahabadi-Vakilian’16]

I p = 1, ↵ = 3

2 ) S = Ω(mn)

[Indyk-Mahabadi-Vakilian’16]

I S = e

O(n) ) ↵ = Ω(n1/(p+1)/p2) [this work]

slide-39
SLIDE 39

Final Remarks

Combinatorial optimisation: old topic but relatively new territory for data stream algorithms

I Potential for many new research questions I Fuller understanding of possible tradeoffs for set cover?