Information Complexity and Applications Mark Braverman Princeton - - PowerPoint PPT Presentation
Information Complexity and Applications Mark Braverman Princeton - - PowerPoint PPT Presentation
Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM17 July 17, 2017 Coding vs complexity: a tale of two theories Coding Computational Complexity Goal: data transmission Goal: computation Different
Coding vs complexity: a tale of two theories
Coding Computational Complexity Goal: data transmission Goal: computation Different channels Models of computation “Big” questions are answered with theorems “Big” questions are conjectures “𝐶𝑇𝐷1/3 can transmit ≈ 0.052 trits per application” “One day, we’ll prove EXP requires > 𝑜3 𝑂𝐵𝑂𝐸 gates”
A key difference
- Information theory is a very effective
language: fits many coding situations perfectly
- Shannon’s channel coding theory is
“continuous”:
– Turn the channel into a continuous resource; – Separate the communication channel from how it is used
3
Theory of computation is “discrete”
- Von Neumann (~1948):
“…Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics. The theory of automata, … will have to share this unattractive property of formal logic. It will have to be, from the mathematical point of view, combinatorial rather than analytical.”
4
Overview
- Today: Will discuss the extension of the
information language to apply to problems in complexity theory.
5
Background: Shannon’s entropy
- Assume a lossless binary channel.
- A message 𝑌 is distributed according to
some prior 𝜈.
- The inherent amount of bits it takes to
transmit 𝑌 is given by its entropy 𝐼 𝑌 = 𝜈 𝑌 = 𝑦 log2(1/𝜈[𝑌 = 𝑦]).
6
communication channel
𝑌 ∼ 𝜈 A B
Shannon’s Noiseless Coding Theorem
- The cost of communicating many copies of
𝑌 scales as 𝐼(𝑌).
- Shannon’s source coding theorem:
– Let 𝐷𝑜 𝑌 be the cost of transmitting 𝑜 independent copies of 𝑌. Then the amortized transmission cost lim
𝑜→∞ 𝐷𝑜(𝑌)/𝑜 = 𝐼 𝑌 .
- Operationalizes 𝐼 𝑌 .
7
8
- Sending a uniform trit 𝑈 in {1,2,3}.
- Using the prefix-free encoding {0,10,11}
sending on trit 𝑈1 costs 𝐷1 = 5/3 ≈ 1.667 bits.
- Sending two trits (𝑈1𝑈2) costs 𝐷2 =
29 9 bits
using the encoding {000,001,010,011,100,101,110,1110,1111}. The cost per trit is 29/18 ≈ 1.611 < 𝐷1.
- 𝐷1 + 𝐷1 ≠ 𝐷2.
𝐼(𝑌) is nicer than 𝐷𝑜(𝑌)
𝐼(𝑌) is nicer than 𝐷𝑜(𝑌)
9
- 𝐷1 =
15 9 , 𝐷2 = 29 9
- 𝐷1 + 𝐷1 ≠ 𝐷2.
- The entropy 𝐼(𝑈) = log23 ≈ 1.585.
- We have 𝐼 𝑈
1𝑈 2 = log2 9 = 𝐼 𝑈 1 + 𝐼(𝑈2).
- 𝐼 𝑈 is additive over independent variables.
- 𝐷𝑜 = 𝑜 ⋅ log2 3 ± 𝑝(𝑜).
Today
- We will discuss generalizing information
and coding theory to interactive computation scenarios: “using interaction over a channel to solve a computational problem”
- In Computer Science, the amount of
communication needed to solve a problem is studied by the area of communication complexity.
10
Communication complexity [Yao’79]
- Considers functionalities requiring
interactive computation.
- Focus on the two party setting first.
11
A B
X Y
A & B implement a functionality F(X,Y).
F(X,Y)
e.g. F(X,Y) = “X=Y?”
Communication complexity
A B
X Y
Goal: implement a functionality 𝐺(𝑌, 𝑍). A protocol 𝜌(𝑌, 𝑍) computing 𝐺(𝑌, 𝑍):
F(X,Y) m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R)
Communication cost 𝐷𝐷 𝜌 = #of bits exchanged.
Shared randomness R
Communication complexity
- (Distributional) communication complexity with
input distribution 𝜈 and error 𝜁: 𝐷𝐷 𝐺, 𝜈, 𝜁 . Error ≤ 𝜁 w.r.t. 𝜈: 𝐷𝐷 𝐺, 𝜈, 𝜁 ≔ min
𝜌:𝜈 𝜌 𝑌,𝑍 ≠𝐺 𝑌,𝑍 ≤𝜁 𝐷𝐷(𝜌)
- (Randomized/worst-case) communication
complexity: 𝐷𝐷(𝐺, 𝜁). Error ≤ 𝜁 on all inputs.
- Yao’s minimax:
𝐷𝐷 𝐺, 𝜁 = max
𝜈
𝐷𝐷(𝐺, 𝜈, 𝜁).
13
A tool for unconditional lower bounds about computation
- Streaming;
- Data structures;
- Distributed computing;
- VLSI design lower bounds;
- Circuit complexity;
- One of two main tools for unconditional lower
bounds.
- Connections to other problems in complexity
theory (e.g. hardness amplification).
14
Set disjointness and intersection
Alice and Bob each given a set 𝑌 ⊆ 1, … , 𝑜 , 𝑍 ⊆ {1, … , 𝑜} (can be viewed as vectors in 0,1 𝑜).
- Intersection 𝐽𝑜𝑢𝑜 𝑌, 𝑍 = 𝑌 ∩ 𝑍.
- Disjointness 𝐸𝑗𝑡𝑘𝑜 𝑌, 𝑍 = 1 if 𝑌 ∩ 𝑍 = ∅, and 0
- therwise
- A non-trivial theorem [Kalyanasundaram-Schnitger’87,
Razborov’92]: 𝐷𝐷 𝐸𝑗𝑡𝑘𝑜, 1/4 = Ω(𝑜).
- Exercise: Solve 𝐸𝑗𝑡𝑘𝑜 with error → 0 (say, 1/𝑜) in
0.9𝑜 bits of communication. Can you do 0.6𝑜? 0.4𝑜?
Direct sum
- 𝐽𝑜𝑢𝑜 is just 𝑜 times 2-bit 𝐵𝑂𝐸.
- ¬𝐸𝑗𝑡𝑘𝑜 is a disjunction of 2-bit 𝐵𝑂𝐸s.
- What is the connection between the
communication cost of one 𝐵𝑂𝐸 and the communication cost of 𝑜 𝐵𝑂𝐸s?
- Understanding the connection between the
hardness of a problem and the hardness of its pieces.
- A natural approach to lower bounds.
16
How does CC scale with copies?
- 𝐷𝐷 𝐺𝑜, 𝜈𝑜, 𝜁 /𝑜 →?
Recall:
- lim
𝑜→∞ 𝐷𝑜(𝑌)/𝑜 = 𝐼 𝑌
- Information complexity is the corresponding
scaling limit for 𝐷𝐷 𝐺𝑜, 𝜈𝑜, 𝜁 /𝑜.
- Helps understand problems composed of
smaller problems.
17
𝐷𝐷 𝐺, 𝜈, 𝜁 ?
Interactive information complexity
- Information complexity ::
communication complexity as
- Shannon’s entropy ::
transmission cost
18
Information theory in two slides
- For two (potentially correlated) variables
𝑌, 𝑍, the conditional entropy of 𝑌 given 𝑍 is the amount of uncertainty left in 𝑌 given 𝑍: 𝐼 𝑌 𝑍 ≔ 𝐹𝑧~𝑍H X Y = y .
- One can show 𝐼 𝑌𝑍 = 𝐼 𝑍 + 𝐼(𝑌|𝑍).
- This important fact is knows as the chain
rule.
- If 𝑌 ⊥ 𝑍, then
𝐼 𝑌𝑍 = 𝐼 𝑌 + 𝐼 𝑍 𝑌 = 𝐼 𝑌 + 𝐼 𝑍 .
19
Mutual information
- The mutual information is defined as
𝐽 𝑌; 𝑍 = 𝐼 𝑌 − 𝐼 𝑌 𝑍 = 𝐼 𝑍 − 𝐼(𝑍|𝑌)
- “How much knowing 𝑌 reduce the
uncertainty of 𝑍?”
- Conditional mutual information:
𝐽 𝑌; 𝑍 𝑎 ≔ 𝐼 𝑌 𝑎 − 𝐼(𝑌|𝑍𝑎)
- Simple intuitive interpretation.
20
The information cost of a protocol
- Prior distribution: 𝑌, 𝑍 ∼ 𝜈.
A B
X Y Protocol π Protocol transcript Π
𝐽𝐷(𝜌, 𝜈) = 𝐽(Π; 𝑍|𝑌) + 𝐽(Π; 𝑌|𝑍) what Alice learns about Y + what Bob learns about X
Depends
- n both
Π and 𝜈
Example
- 𝐺 is “𝑌 = 𝑍? ”.
- 𝜈 is a distribution where 𝑌 = 𝑍 w.p. ½ and
(𝑌, 𝑍) are random w.p. ½. A B
X Y
𝐽𝐷(𝜌, 𝜈) = 𝐽(Π; 𝑍|𝑌) + 𝐽(Π; 𝑌|𝑍) ≈
what Alice learns about Y + what Bob learns about X
SHA-256(X) [256 bits]
X=Y? [1 bit]
129 = 130 bits 1 +
The information complexity of a problem
- Communication complexity:
𝐷𝐷 𝐺, 𝜈, 𝜁 ≔ min
𝜌 𝑑𝑝𝑛𝑞𝑣𝑢𝑓𝑡 𝐺 𝑥𝑗𝑢ℎ 𝑓𝑠𝑠𝑝𝑠 ≤𝜁
𝐷𝐷(𝜌).
- Analogously:
𝐽𝐷 𝐺, 𝜈, 𝜁 ≔ inf
𝜌 𝑑𝑝𝑛𝑞𝑣𝑢𝑓𝑡 𝐺 𝑥𝑗𝑢ℎ 𝑓𝑠𝑠𝑝𝑠 ≤𝜁
𝐽𝐷(𝜌, 𝜈).
- (Easy) fact: 𝐽𝐷 𝐺, 𝜈, 𝜁 ≤ 𝐷𝐷 𝐺, 𝜈, 𝜁 .
23
Needed!
Information = amortized communication
- Recall: lim
𝑜→∞ 𝐷𝑜(𝑌)/𝑜 = 𝐼 𝑌
Theorem: [B.-Rao’11]
- lim
𝑜→∞ 𝐷𝐷(𝐺𝑜, 𝜈𝑜, 𝜁)/𝑜 = 𝐽𝐷 𝐺, 𝜈, 𝜁 .
- Corollary:
lim
𝑜→∞ 𝐷𝐷(𝐽𝑜𝑢𝑜, 0+)/𝑜 = 𝐽𝐷 𝐵𝑂𝐸, 0
The two-bit AND
- Alice and Bob each have a bit 𝑌, 𝑍 ∈ {0,1}
distributed according to some 𝜈 on 0,1 2.
- Want to compute 𝑌 ∧ 𝑍, while revealing to
each other as little as possible to each
- thers’ inputs (w.r.t. the worst 𝜈).
- Answer 𝐽𝐷(𝐵𝑂𝐸, 0) is a number between 1
and 2.
25
The two-bit AND
Results [B.-Garg-Pankratov-Weinstein’13]:
- 𝐽𝐷 𝐵𝑂𝐸, 0 ≈ 1.4922 bits.
- Find the value of 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 for all
priors 𝜈 and exhibit the information- theoretically optimal protocol for computing the 𝐵𝑂𝐸 of two bits.
- Studying 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 as a function
ℝ+4/ℝ+ → ℝ+ is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions).
26
The two-bit AND
- Studying 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 as a function
ℝ+4/ℝ+ → ℝ+ is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions).
- We adopt a “guess and verify” strategy,
although the general question of computing the information complexity of a function from its truth table is a very interesting one.
27
The optimal protocol for AND
A B 𝑌 ∈ {0,1} 𝑍 ∈ {0,1} If X=1, A=1 If X=0, A=U[0,1] If Y=1, B=1 If Y=0, B=U[0,1]
1
The optimal protocol for AND
A B If X=1, A=1 If X=0, A=U[0,1] If Y=1, B=1 If Y=0, B=U[0,1]
1
“Raise your hand when your number is reached” 𝑌 ∈ {0,1} 𝑍 ∈ {0,1}
Corollary: communication complexity of intersection
- Corollary:
lim
𝜁→0 𝐷𝐷 𝐽𝑜𝑢𝑜, 𝜁 ≈ 1.4922 ⋅ 𝑜 ± 𝑝 𝑜 .
- Specifically, e.g.
𝐷𝐷 𝐽𝑜𝑢𝑜, 1 𝑜 ≈ 1.4922 ⋅ 𝑜 ± 𝑝 𝑜 .
- Note: Require 𝜕(1) rounds of interaction.
Using 𝑠 rounds results in +Θ
𝑜 𝑠2 cost!
30
Communication complexity of Disjointness
- With some additional work, obtain a tight
bound on the communication complexity of 𝐸𝑗𝑡𝑘𝑜 with tiny error: lim
𝜁→0 𝐷𝐷 𝐸𝑗𝑡𝑘𝑜, 𝜁 = 𝐷𝐸𝐽𝑇𝐾 ⋅ 𝑜 ± 𝑝(𝑜),
where 𝐷𝐸𝐽𝑇𝐾 ≔ max
𝜈:𝜈 1,1 =0 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 ≈ 0.4827 …
- Intuition: 𝐸𝑗𝑡𝑘𝑜 is an 𝑜-wise repetition of
𝐵𝑂𝐸, where the probability of a (1,1) is very low (≪ 1).
31
Beyond two parties
- Disjointness in the
coordinator model [B.-Ellen-
Oshman-Pitassi- Vaikuntanathan’13].
- 𝑙 players.
- Each 𝑞𝑗 holding a subset
Si ⊂ {1, … , 𝑜}
- Want to decide whether the
intersection 𝑇𝑗 is empty.
32
𝑙
𝐸𝑗𝑡𝑘 in the coordinator model
- 𝑙 players, input length 𝑜.
- Naïve protocol: 𝑃(𝑜 ⋅ 𝑙) communication.
- Turns out to be asymptotically optimal!
- The argument uses information complexity.
– The hard part is to design the hard distribution and the “right” information cost measure.
33
The “hard” distribution
- Si ⊂ {1, … , 𝑜}. Want to decide whether the
intersection 𝑇𝑗 is empty.
- Should have very few (close to 0)
intersections.
34
- Attempt #1:
– Plant many 0’s (e.g. 50%):
35
1 … 1 1 1 … 1 1 1 … 1 1 … 1 1
𝑙 𝑜
The “hard” distribution
Coordinator keeps querying players until she finds a 0: ~𝑃 𝑜 communication
The “hard” distribution
- Attempt #2:
– Plant one zero in each coordinate
36
1 … 1 1 1 1 1 1 … 1 1 1 … 1 1 1 1 1 1 … 1 1
𝑙 𝑜
Each player sends its 0’s: still 𝑃 𝑜 log 𝑜 communication
- “Mix” the two attempts.
– Each coordinate has an RV 𝑁𝑗 ∼ 𝐶1/3. – If 𝑁𝑗 = 0, plant many 0’s (e.g. 50%). – If 𝑁𝑗 = 1, plant a single 0.
37
1 … 1 1 1 … 1 1 1 1 1 … 1 1 … 1 1 1 … 1 1
𝑙 𝑜
The “hard” distribution
𝑁𝑗
The information cost notion
- Assume the coordinator knows the 𝑁𝑗’s,
player 𝑘 knows the 𝑌𝑗𝑘.
- The information cost:
– what the coordinator learns about the 𝑌𝑗𝑘 ’s+ – what each player learns about the 𝑁𝑗’s
- Proving that the sum total is Ω(𝑜 ⋅ 𝑙)
requires some work, but the hardest part are the definitions.
38
Intuition for hardness
- Focus on a single 𝑗, 𝑘 pair: 𝑗’th coordinate, 𝑘’th
player.
- (𝑁𝑗, 𝑌𝑗𝑘) are equally likely to be (0,0), (0,1) and
(1,1)
- If Mi = 1, then the coordinator needs to know Xij
(which is almost certainly 1 in this case).
- Either 𝑄𝑗 will learn about 𝑁𝑗, or will reveal too
much about 𝑌𝑗𝑘 when 𝑁𝑗 = 0.
39
𝑁𝑗 𝑌𝑗𝑘
Multiparty information complexity
- We don’t have a multiparty information
complexity theory for general distributions.
- There is a fair chance that the difficulty is
conceptual.
- One key difference between 2 players and
3+ players is the existence of secure multiparty computation.
40
Beyond communication
- Many applications to other interactive
communication regimes:
– Distributed joint computation & estimation; – Streaming; – Noisy coding…
- We will briefly discuss a non-
communication application: two prover games.
41
Two-prover games
- Closely connected to hardness of
approximation:
– Probabilistically Checkable Proofs and the Unique Games Conjecture.
- A nice way of looking at constraint
satisfaction problems.
42
The odd cycle game
- Alice and Bob want to convince Victoria
that the 7-cycle is 2-colorable.
- Asks them to color the same or adjacent
- vertices. Accepts if consistent.
43
A B
The odd cycle game
44
A B V
𝑊
𝑏
𝑊
𝑐
OK
𝑊
𝑏
𝑊
𝑐
The odd cycle game
45
A B V
𝑊
𝑏𝑊 𝑐
OK
𝑊
𝑏
𝑊
𝑐
The odd cycle game
46
A B V
𝑊
𝑏
𝑊
𝑐
𝑊
𝑏
𝑊
𝑐
The odd cycle game
- An example of a “unique game”.
- If the cycle is even: Alice and Bob win with
probability 1.
- For odd cycle of length 𝑛, win with
probability 𝑞1 = 1 − 1
2𝑛.
- What about winning many copies of the
game simultaneously?
47
Simultaneous challenges
- Alice gets 𝑊
𝑏 1, 𝑊 𝑏 2, 𝑊 𝑏 3, … , 𝑊 𝑏 𝑙 and returns a
vector of colors.
- Bob gets 𝑊
𝑐 1, 𝑊 𝑐 2, 𝑊 𝑐 3, … , 𝑊 𝑐 𝑙 and returns a
vector of colors.
- Avoid jail if all color pairs are consistent.
𝑊
𝑏 1
𝑊
𝑐 1
𝑊
𝑏 2 𝑊 𝑐 2
𝑊
𝑐 3
𝑊
𝑏 𝑙
𝑊
𝑐 𝑙
𝑊
𝑏 3
Parallel repetition
- Play 𝑙 = 𝑛2 copies.
- A naïve strategy: 1 −
1 2𝑛 𝑛2
= 𝑓−Θ 𝑛 ≪ 1
- Can one do better?
49
Parallel repetition
- Play 𝑙 = 𝑛2 copies.
- A naïve strategy: 1 −
1 2𝑛 𝑛2
= 𝑓−Θ 𝑛 ≪ 1
- It turns out that one can win 𝑛2 copies of the
- dd cycle game with a constant probability
[Raz’08].
- Proof by exhibiting a strategy.
50
Connection to foams
- Connected to “foams”: tilings of ℝ𝑒 with a
shape 𝐵 so that 𝐵 + ℤ𝑒 = ℝ𝑒.
- What can the smallest surface area of A
be?
51
[Feige-Kindler-O’Donnell’07]
Connection to foams
- Obvious upper bound: 𝑃(𝑒).
- Obvious lower bound (sphere of volume 1):
Ω( 𝑒).
- [Feige-Kindler-O’Donnell’07]: Noticed a
connection between the problems.
- [Kindler-O’Donnel-Rao-Wigderson’08]: a
construction of foams of surface area 𝑃( 𝑒) based on Raz’s strategy.
52
An information-theoretic view
53
A B
𝑊
𝑏
𝑊
𝑐
- “Advice” on where to cut the cycle wins the
game with probability 1 if the cut does not pass through the challenge edge.
An information-theoretic view
54
A B
𝑊
𝑏
𝑊
𝑐
- Merlin can give such advice at “information
cost” 𝑃
1 𝑛2 .
𝑊
𝑏
𝑊
𝑐
The distribution
55
𝑊
𝑐′
- KL-divergence between the two
distributions is Θ
1 𝑛2
- Statistical distance is Θ
1 𝑛
Taking 𝑛2 copies
- Total information revealed by Merlin:
𝑛2 ⋅ 𝑃
1 𝑛2 = 𝑃 1 .
- Can be simulated successfully with 𝑃 1
communication, or with no communication with probability Ω(1).
Parallel repetition
- Using similar intuition (but more technical work)
can obtain a general tight parallel repetition problem in the “small value” regime. [B.-Garg’15]
- If one copy of a game 𝐻 has success probability
𝜀 < 1/2, then 𝑛 copies have success probability < 𝜀Ω 𝑛 (*for “projection games”; for general games tight bound a bit more complicated)
- [Dinur-Steurer’14] obtained the result for
projection games using spectral techniques.
57
Challenges
- Information complexity beyond two
communicating parties.
- Continuous measures of complexity.
58
59