Parameterized Streaming Algorithms Graham Cormode Rajesh Chitnis - - PowerPoint PPT Presentation
Parameterized Streaming Algorithms Graham Cormode Rajesh Chitnis - - PowerPoint PPT Presentation
Towards a Theory ry of f Parameterized Streaming Algorithms Graham Cormode Rajesh Chitnis Parameterized Streaming Algorithms We increasingly have to deal with huge graphs Facebook graph Brain graph Google Maps in USA Web Graph 10
We increasingly have to deal with huge graphs…
Parameterized Streaming Algorithms
Facebook graph
- 109 nodes
Brain graph
- 109 nodes
Web Graph
- 232 nodes
Google Maps in USA
- 108 intersection nodes
- It is inconvenient or impossible to store the whole input for random access
- “Solved” problems become hard under different models of data access
- E.g. External memory, MapReduce, Streaming…
- The paradigm of streaming algorithms is one attempt to deal with Big Data
- The streaming model (for graphs) is as follows:
- The vertex set 𝑊 = {1,2, … , 𝑜} is fixed, and known in advance
- The edges arrive one-by-one (in arbitrary order)
- For each edge arrival, we need to make a (fast) decision what information to store
- Cannot (do not want to) store all the edges
Parameterized Streaming Algorithms
- We allow unbounded computation at end of the stream
- Which graph problems can we solve efficiently in this model?
- Naïve algorithm for any graph problem uses 𝑃 𝑜2 bits by storing whole adjacency matrix
1 5 4 2 3
- Recall that the naïve algorithm for any graph problem uses 𝑃 𝑜2 bits
- Bad News : Many graph problems have a lower bound of Ω(𝑜2) space in streaming model
- E.g. Does the given graph have any triangle?
- Typically use communication complexity to show lower bounds for streaming algorithms
- INDEX problem: Alice has string 𝑌 ∈ 0,1 𝑂, Bob has index 𝑗 ∈ 𝑂 , want to find 𝑗th bit of X
- Lower bound of Ω(𝑂) if Alice can send only one message to Bob, even with randomization
- Communication complexity reductions: show that a streaming algorithm would solve INDEX
Parameterized Streaming Algorithms
10010110 One-way communication from Alice to Bob 𝑗 = 5
Parameterized Streaming Algorithms
- Sketch of a simple INDEX reduction for triangle detection:
- Alice adds edges between 𝑍 and 𝑎 according to her string 𝑌
- Then she sends her data structure to Bob
- Bob has an index 𝐽 ∈ 𝑂 corresponding to some 𝑘, ℓ ∈ 𝑠 × 𝑠
- Bob adds a new vertex 𝑡 and the edges (𝑡, 𝑧𝑘) and (𝑡, 𝑨ℓ)
6
𝑧1 𝑧𝑠 𝑧𝑘 𝑨1 𝑨ℓ 𝑨𝑠
Y Z
Let 𝑂 = 𝑠2
𝑡
The resulting graph has a triangle iff the edge (𝑧𝑘, 𝑨ℓ) is present, i.e., 𝐽𝑢ℎ bit of X is 1
- Bad News : Many graph problems require Ω(𝑜2) space in streaming model
- How can we cope with this (space) intractability?
Parameterized Streaming Algorithms
BIG
Time
BIG
Data
- Feigenbaum et al. [ICALP ‘04]: Finding (size of) a min VC needs Ω(𝑜2) space
- But how much space does 𝑙-VC need?
- We design a streaming algorithm in 𝑃(𝑙 ⋅ log 𝑜) bits (with 2𝑙 passes over the input)
- Essentially, the standard branching FPT algorithm in streaming model…
Fine-grained understanding via parameterized analysis
- Streaming algorithm for 𝑙-VC with 𝑃(𝑙 ⋅ log 𝑜) bits and 2𝑙 passes
Parameterized Streaming Algorithms
𝒇 = 𝒚𝟐𝒛𝟐 𝒇 = 𝒚𝟒𝒛𝟒 𝒇 = 𝒚𝟑𝒛𝟑
𝒚𝟒 𝒚𝟐 𝒚𝟑 𝒛𝟐 𝒛𝟑 𝒛𝟒
𝑯
𝑯-𝒛𝟐 𝑯-𝒚𝟐 𝑯-𝒛𝟐-𝒚𝟒 𝑯-𝒚𝟐-𝒛𝟑
- Consider all 2𝑙 binary strings from 0,1 𝑙, one in each pass
- The binary search tree has 2𝑙 leaves
- Each pass corresponds to a root → leaf path in the tree
- 0 for left branch, and 1 for right branch
- Algorithm only stores current binary string and corresponding VC
- Storage is 𝑃(𝑙 ⋅ log 𝑜) bits
- Optimal if you also want to output a VC!
Streaming implementation of FPT algorithm via iterative compression: (𝑙 ⋅ 2𝑙)-pass streaming algorithm for 𝑙-VC which uses 𝑃(𝑙 ⋅ log 𝑜) bits
𝑯-𝒛𝟐-𝒛𝟒 𝑯-𝒚𝟐-𝒚𝟑
𝑙
Reducing the number of passes: Chitnis et al. [SODA ‘15] designed a 1-pass streaming algorithm for 𝑙-VC using 𝑃(𝑙2 ⋅ log 𝑜) bits
Towards a general theory of (space) parameterized streaming algorithms…..
Parameterized Streaming Algorithms
FPS: 𝑔 𝑙 ⋅ log 𝑜 LinPS: 𝑔 𝑙 ⋅ 𝑜 ⋅ log 𝑜 SubPS: 𝑔 𝑙 ⋅ 𝑜1−𝜗 ⋅ log 𝑜 BrutePS: 𝑃(𝑜2)
- FPS: Fixed-Parameter Streaming
- SubPS: Sublinear dependence on input 𝑜
- LinPS: Linear dependence on input 𝑜
- BrutePS: Naïvely storing the whole graph
Goal: Develop algorithms and lower bounds to categorize graph problems in this hierarchy
𝒍-Vertex-Cover K-MaxMatching
𝒍-Path, 𝒍-FVS, 𝒍-Treewidth 𝒍-Girth, 𝒍-Clique, 𝒍-Dominating-Set
1.5-approx. for MaxMatching
- n trees
We study all problems, not just NP-hard ones!
Picture is a bit more complicated: Any entry in this landscape is really a 6-tuple
[Problem, Parameter, Approximation Ratio, Type of Stream, Type of Algorithm, # of passes]
Parameterized Streaming Algorithms
FPS: 𝑔 𝑙 ⋅ log 𝑜 LinPS: 𝑔 𝑙 ⋅ 𝑜 ⋅ log 𝑜 SubPS: 𝑔 𝑙 ⋅ 𝑜1−𝜗 ⋅ log 𝑜 BrutePS: 𝑃(𝑜2)
- FPS: Fixed-Parameter Streaming Algorithms
- SubPS: Sublinear dependence on input 𝑜
- LinPS: Linear dependence on input 𝑜
- BrutePS: Naïvely storing the whole graph
Insertion-only or Insertion-deletion Deterministic or Randomized
Towards a general theory of (space) parameterized streaming algorithms…..
Tight problems for the class LinPS via simple upper bounds
Parameterized Streaming Algorithms
FPS: 𝑔 𝑙 ⋅ log 𝑜 LinPS: 𝑔 𝑙 ⋅ 𝑜 ⋅ log 𝑜 SubPS: 𝑔 𝑙 ⋅ 𝑜1−𝜗 ⋅ log 𝑜 BrutePS: 𝑃(𝑜2)
Store all edges till we see (𝑙 ⋅ 𝑜) edges Hence this needs 𝑃(𝑙 ⋅ 𝑜 ⋅ log 𝑜) bits
𝒍-Path, 𝒍-FVS, 𝒍-Treewidth
These problems need Ω(𝑜 ⋅ log 𝑜) space (for constant 𝑙) Hence, they are not in SubPS 𝑙-Path: If 𝐹 ≥ 𝑙 ⋅ 𝑜 then there is a 𝑙-path 𝑙-FVS: If there is a fvs of size 𝑙 then 𝐹 ≤ 𝑙 ⋅ 𝑜 𝑙-Treewidth: If treewidth is ≤ 𝑙 then 𝐹 ≤ 𝑙 ⋅ 𝑜 Rules out any algorithm using space 𝑔 𝑙 ⋅ 𝑝(𝑜 ⋅ log 𝑜) for any function 𝑔
- Hardness reduction: “Small” space streaming algorithm for 6-Path
⇒ 1- way communication protocol for PERMUTATION of “small” cost
- PERMUTATION problem:
Alice has a permutation 𝜀: 𝑂 → 𝑂 encoded as a bit-string of length 𝑂 ⋅ log 𝑜 . Bob has an index 𝐽 ∈ 𝑂 ⋅ log 𝑂 and wants to find 𝐽𝑢ℎ bit of 𝜀
- Sun and Woodruff [APPROX ‘15]: need Ω(𝑂 ⋅ log 𝑂) bits one-way communication
Parameterized Streaming Algorithms
𝛁(𝐨 ⋅ 𝒎𝒑𝒉 𝒐) bit bit lower r bou bound for
- r 𝒍-Path
th with th 𝒍 = 𝟕
- Alice adds edges between 𝑍 and 𝑎 according to the permutation 𝜀
- For each 𝑗 ∈ [𝑂] she adds an edge from 𝑧𝑗 to 𝑨𝜀 𝑗
- Bob’s index 𝐽 ∈ [𝑂 ⋅ log 𝑂] maps to ℓ𝑢ℎ-bit of 𝜀(𝑘) for some 𝑘, ℓ
- Bob adds a new vertex 𝑡, and the edge 𝑡 − 𝑧𝑘
- Let 𝑇ℓ = {𝑨𝜀(𝑠) ∶ ℓ𝑢ℎ-bit of 𝜀(𝑠) is one }
- Bob adds new vertex 𝑢, and edges from 𝑢 to each vertex of 𝑇ℓ
𝑧1 𝑧𝑂 𝑧𝑘 𝑨𝜀(1) 𝑨𝜀(2) 𝑨𝜀(𝑂)
Y Z
𝑡
The resulting graph has a 6-path iff edge 𝑨𝜀(𝑘) ∈ 𝑇ℓ is present, i.e., 𝐽𝑢ℎ bit of X is 1
𝑧2 𝑨𝜀(𝑘) 𝑢
Tight problems for the class BrutePS
Parameterized Streaming Algorithms
FPS: 𝑔 𝑙 ⋅ log 𝑜 LinPS: 𝑔 𝑙 ⋅ 𝑜 ⋅ log 𝑜 SubPS: 𝑔 𝑙 ⋅ 𝑜1−𝜗 ⋅ log 𝑜 BrutePS: 𝑃(𝑜2)
How do we show a problem does not belong to the smaller class LinPS?
- Show Ω(𝑜2) bits lower bound for constant 𝑙
- Rules out any algorithm using space 𝑔 𝑙 ⋅ 𝑝(𝑜2)
- Next slide gives proof for 3-Girth…
Note that 𝑙-Girth is polynomial time solvable, but hard in terms of space!
𝒍-Girth, 𝒍-Clique, 𝒍-Dominating-Set
INDEX problem requires Ω(𝑂) bits of one-way communication from Alice to Bob Alice has a string 𝑌 ∈ 0,1 𝑂. Bob has an index 𝐽 ∈ 𝑂 and wants to find 𝐽𝑢ℎ bit of X
Parameterized Streaming Algorithms
𝛁(𝐨𝟑) bit bits lower bou bound for
- r ch
checkin ing if f girth rth of
- f a
a grap aph is s ≤ 𝟒
- Same set up as previously:
- Let 𝑂 = 𝑠2 and fix a bijection 𝜚: 𝑂 → 𝑠 × [𝑠]
- Alice adds edges between 𝑍 and 𝑎 according to string 𝑌
- Then she sends her data structure to Bob
- Bob’s index 𝐽 ∈ 𝑂 corresponds to some 𝑘, ℓ ∈ 𝑠 × 𝑠
- Bob adds a new vertex 𝑡 and the edges (𝑡, 𝑧𝑘) and (𝑡, 𝑨ℓ)
- Lower bound of Ω(𝑂) translates to Ω(𝑜2) for 3-girth on graphs with 𝑜 vertices
𝑧1 𝑧𝑠 𝑧𝑘 𝑨1 𝑨ℓ 𝑨𝑠
Y Z
𝑡
The resulting graph has a triangle iff the edge (𝑧𝑘, 𝑨ℓ) is present, i.e., 𝐽𝑢ℎ bit of X is 1
Parameterized Streaming Algorithms
Goal: Develop algorithms and lower bounds to categorize graph problems in this hierarchy
- The story so far ….
- Can simulate parameterized techniques (branching, iterative compression,
bidimensionality, etc.) in the streaming model
- Developed new lower bounds using communication complexity
- Beyond “standard” graph problems? Game theory, machine learning, etc …..
- Connections with kernelization?
- Implement and evaluate these new parameterized streaming algorithms?
- Code for some of the 𝑙-VC algorithms available at http://projects.csail.mit.edu/dnd/
Streaming (space) algorithms Parameterized (time) algorithms Two-way flow of ideas
Looking forward…
Parameterized Streaming Algorithms
𝐌𝐩𝐱𝐟𝐬 𝐜𝐩𝐯𝐨𝐞𝐭 𝐣𝐨𝐭𝐪𝐣𝐬𝐟𝐞 𝐜𝐳 𝐋𝐟𝐬𝐨𝐟𝐦 𝐦𝐩𝐱𝐟𝐬 𝐜𝐩𝐯𝐨𝐞𝐭
- Connections with Kernelization – a different (but related) data-compression model
- Kernelization versus streaming
- Polytime computation versus unbounded computation
- Full access of the input versus limited access to input
- AND-compression: No poly kernel unless NP⊆ coNP/poly
- New definition of AND-compatible, inspired by AND-compression
A problem Π is AND-compatible if ∃ constant 𝑙 ∈ℕ such that
- ∀ 𝑜 ∈ℕ there is a graph 𝐻𝑍𝐹𝑇 on 𝑜 vertices such that Π 𝐻𝑍𝐹𝑇, 𝑙 is YES instance
- ∀ 𝑜 ∈ℕ there is a graph 𝐻𝑂𝑃 on 𝑜 vertices such that Π 𝐻𝑂𝑃, 𝑙 is YES instance
- ∀ 𝑢 ∈ℕ we have that Π 𝐻1 ⊎ 𝐻2 ⊎ ⋯ ⊎ 𝐻𝑢, 𝑙 = ⋀ Π(𝐻𝑗, 𝑙) where ⊎ denotes vertex disjoint union
- Many natural graph problems are AND-compatible: 𝑙-coloring, 𝑙-treewidth, 𝑙-girth
- Our result: If a problem Π is AND-compatible then it does not admit a streaming
algorithm using space 𝑔 𝑙 ⋅ 𝑝(𝑜), for any function 𝑔.
- Unconditional, unlike kernel lower bounds
- Similar definition and result for OR-compatible
Parameterized Streaming Algorithms
𝐌𝐩𝐱𝐟𝐬 𝐜𝐩𝐯𝐨𝐞𝐭 𝐣𝐨𝐭𝐪𝐣𝐬𝐟𝐞 𝐜𝐳 𝐋𝐟𝐬𝐨𝐟𝐦 𝐦𝐩𝐱𝐟𝐬 𝐜𝐩𝐯𝐨𝐞𝐭
A problem Π is AND-compatible if ∃ constant 𝑙 ∈ℕ such that
- ∀ 𝑜 ∈ℕ there is a graph 𝐻𝑍𝐹𝑇 on 𝑜 vertices such that Π 𝐻𝑍𝐹𝑇, 𝑙 is YES instance
- ∀ 𝑜 ∈ℕ there is a graph 𝐻𝑂𝑃 on 𝑜 vertices such that Π 𝐻𝑂𝑃, 𝑙 is YES instance
- ∀ 𝑢 ∈ℕ we have that Π 𝐻1 ⊎ 𝐻2 ⊎ ⋯ ⊎ 𝐻𝑢, 𝑙 = ⋀ Π(𝐻𝑗, 𝑙) where ⊎ denotes vertex disjoint union
- Our result: If a problem Π is AND-compatible then it does not admit a streaming
algorithm using space 𝑔 𝑙 ⋅ 𝑝(𝑜), for any function 𝑔.
- Consider 𝑢 graphs 𝐻1, 𝐻2, … , 𝐻𝑢 each having 𝑜 vertices
- Let 𝐻 be disjoint union 𝐻1 ⊎ 𝐻2 ⊎ ⋯ ⊎ 𝐻𝑢
- By pigeonhole principle, any (correct) algorithm for 𝐻 must use ≥ 𝑢 bits
- Otherwise two subsets 𝐽, 𝐾 of 𝑢 collide. Let 𝑗∗ ∈ 𝐽 ∖ 𝐾
- Select 𝐻𝑗 = 𝐻𝑍𝐹𝑇 for each 𝑗 ∈ 𝐽 ∪ 𝐾 ∖ 𝑗∗ and 𝐻𝑗∗ = 𝐻𝑂𝑃
- This violates correctness of the algorithm
- Hence, we have that 𝑔 𝑙 ⋅ 𝑝 𝑜𝑢 ≥ 𝑢
- Contradiction since 𝑙, 𝑜 are constants and we can take 𝑢 as large as we want