Information Complexity and Applications Mark Braverman Princeton - - PowerPoint PPT Presentation

information complexity and
SMART_READER_LITE
LIVE PREVIEW

Information Complexity and Applications Mark Braverman Princeton - - PowerPoint PPT Presentation

Information Complexity and Applications Mark Braverman Princeton University and IAS FoCM17 July 17, 2017 Coding vs complexity: a tale of two theories Coding Computational Complexity Goal: data transmission Goal: computation Different


slide-1
SLIDE 1

Information Complexity and Applications

FoCM’17 July 17, 2017 Mark Braverman Princeton University and IAS

slide-2
SLIDE 2

Coding vs complexity: a tale of two theories

Coding Computational Complexity Goal: data transmission Goal: computation Different channels Models of computation “Big” questions are answered with theorems “Big” questions are conjectures “𝐶𝑇𝐷1/3 can transmit ≈ 0.052 trits per application” “One day, we’ll prove EXP requires > 𝑜3 𝑂𝐵𝑂𝐸 gates”

slide-3
SLIDE 3

A key difference

  • Information theory is a very effective

language: fits many coding situations perfectly

  • Shannon’s channel coding theory is

“continuous”:

– Turn the channel into a continuous resource; – Separate the communication channel from how it is used

3

slide-4
SLIDE 4

Theory of computation is “discrete”

  • Von Neumann (~1948):

“…Thus formal logic is, by the nature of its approach, cut off from the best cultivated portions of mathematics, and forced onto the most difficult part of the mathematical terrain, into combinatorics. The theory of automata, … will have to share this unattractive property of formal logic. It will have to be, from the mathematical point of view, combinatorial rather than analytical.”

4

slide-5
SLIDE 5

Overview

  • Today: Will discuss the extension of the

information language to apply to problems in complexity theory.

5

slide-6
SLIDE 6

Background: Shannon’s entropy

  • Assume a lossless binary channel.
  • A message 𝑌 is distributed according to

some prior 𝜈.

  • The inherent amount of bits it takes to

transmit 𝑌 is given by its entropy 𝐼 𝑌 = 𝜈 𝑌 = 𝑦 log2(1/𝜈[𝑌 = 𝑦]).

6

communication channel

𝑌 ∼ 𝜈 A B

slide-7
SLIDE 7

Shannon’s Noiseless Coding Theorem

  • The cost of communicating many copies of

𝑌 scales as 𝐼(𝑌).

  • Shannon’s source coding theorem:

– Let 𝐷𝑜 𝑌 be the cost of transmitting 𝑜 independent copies of 𝑌. Then the amortized transmission cost lim

𝑜→∞ 𝐷𝑜(𝑌)/𝑜 = 𝐼 𝑌 .

  • Operationalizes 𝐼 𝑌 .

7

slide-8
SLIDE 8

8

  • Sending a uniform trit 𝑈 in {1,2,3}.
  • Using the prefix-free encoding {0,10,11}

sending on trit 𝑈1 costs 𝐷1 = 5/3 ≈ 1.667 bits.

  • Sending two trits (𝑈1𝑈2) costs 𝐷2 =

29 9 bits

using the encoding {000,001,010,011,100,101,110,1110,1111}. The cost per trit is 29/18 ≈ 1.611 < 𝐷1.

  • 𝐷1 + 𝐷1 ≠ 𝐷2.

𝐼(𝑌) is nicer than 𝐷𝑜(𝑌)

slide-9
SLIDE 9

𝐼(𝑌) is nicer than 𝐷𝑜(𝑌)

9

  • 𝐷1 =

15 9 , 𝐷2 = 29 9

  • 𝐷1 + 𝐷1 ≠ 𝐷2.
  • The entropy 𝐼(𝑈) = log23 ≈ 1.585.
  • We have 𝐼 𝑈

1𝑈 2 = log2 9 = 𝐼 𝑈 1 + 𝐼(𝑈2).

  • 𝐼 𝑈 is additive over independent variables.
  • 𝐷𝑜 = 𝑜 ⋅ log2 3 ± 𝑝(𝑜).
slide-10
SLIDE 10

Today

  • We will discuss generalizing information

and coding theory to interactive computation scenarios: “using interaction over a channel to solve a computational problem”

  • In Computer Science, the amount of

communication needed to solve a problem is studied by the area of communication complexity.

10

slide-11
SLIDE 11

Communication complexity [Yao’79]

  • Considers functionalities requiring

interactive computation.

  • Focus on the two party setting first.

11

A B

X Y

A & B implement a functionality F(X,Y).

F(X,Y)

e.g. F(X,Y) = “X=Y?”

slide-12
SLIDE 12

Communication complexity

A B

X Y

Goal: implement a functionality 𝐺(𝑌, 𝑍). A protocol 𝜌(𝑌, 𝑍) computing 𝐺(𝑌, 𝑍):

F(X,Y) m1(X,R) m2(Y,m1,R) m3(X,m1,m2,R)

Communication cost 𝐷𝐷 𝜌 = #of bits exchanged.

Shared randomness R

slide-13
SLIDE 13

Communication complexity

  • (Distributional) communication complexity with

input distribution 𝜈 and error 𝜁: 𝐷𝐷 𝐺, 𝜈, 𝜁 . Error ≤ 𝜁 w.r.t. 𝜈: 𝐷𝐷 𝐺, 𝜈, 𝜁 ≔ min

𝜌:𝜈 𝜌 𝑌,𝑍 ≠𝐺 𝑌,𝑍 ≤𝜁 𝐷𝐷(𝜌)

  • (Randomized/worst-case) communication

complexity: 𝐷𝐷(𝐺, 𝜁). Error ≤ 𝜁 on all inputs.

  • Yao’s minimax:

𝐷𝐷 𝐺, 𝜁 = max

𝜈

𝐷𝐷(𝐺, 𝜈, 𝜁).

13

slide-14
SLIDE 14

A tool for unconditional lower bounds about computation

  • Streaming;
  • Data structures;
  • Distributed computing;
  • VLSI design lower bounds;
  • Circuit complexity;
  • One of two main tools for unconditional lower

bounds.

  • Connections to other problems in complexity

theory (e.g. hardness amplification).

14

slide-15
SLIDE 15

Set disjointness and intersection

Alice and Bob each given a set 𝑌 ⊆ 1, … , 𝑜 , 𝑍 ⊆ {1, … , 𝑜} (can be viewed as vectors in 0,1 𝑜).

  • Intersection 𝐽𝑜𝑢𝑜 𝑌, 𝑍 = 𝑌 ∩ 𝑍.
  • Disjointness 𝐸𝑗𝑡𝑘𝑜 𝑌, 𝑍 = 1 if 𝑌 ∩ 𝑍 = ∅, and 0
  • therwise
  • A non-trivial theorem [Kalyanasundaram-Schnitger’87,

Razborov’92]: 𝐷𝐷 𝐸𝑗𝑡𝑘𝑜, 1/4 = Ω(𝑜).

  • Exercise: Solve 𝐸𝑗𝑡𝑘𝑜 with error → 0 (say, 1/𝑜) in

0.9𝑜 bits of communication. Can you do 0.6𝑜? 0.4𝑜?

slide-16
SLIDE 16

Direct sum

  • 𝐽𝑜𝑢𝑜 is just 𝑜 times 2-bit 𝐵𝑂𝐸.
  • ¬𝐸𝑗𝑡𝑘𝑜 is a disjunction of 2-bit 𝐵𝑂𝐸s.
  • What is the connection between the

communication cost of one 𝐵𝑂𝐸 and the communication cost of 𝑜 𝐵𝑂𝐸s?

  • Understanding the connection between the

hardness of a problem and the hardness of its pieces.

  • A natural approach to lower bounds.

16

slide-17
SLIDE 17

How does CC scale with copies?

  • 𝐷𝐷 𝐺𝑜, 𝜈𝑜, 𝜁 /𝑜 →?

Recall:

  • lim

𝑜→∞ 𝐷𝑜(𝑌)/𝑜 = 𝐼 𝑌

  • Information complexity is the corresponding

scaling limit for 𝐷𝐷 𝐺𝑜, 𝜈𝑜, 𝜁 /𝑜.

  • Helps understand problems composed of

smaller problems.

17

𝐷𝐷 𝐺, 𝜈, 𝜁 ?

slide-18
SLIDE 18

Interactive information complexity

  • Information complexity ::

communication complexity as

  • Shannon’s entropy ::

transmission cost

18

slide-19
SLIDE 19

Information theory in two slides

  • For two (potentially correlated) variables

𝑌, 𝑍, the conditional entropy of 𝑌 given 𝑍 is the amount of uncertainty left in 𝑌 given 𝑍: 𝐼 𝑌 𝑍 ≔ 𝐹𝑧~𝑍H X Y = y .

  • One can show 𝐼 𝑌𝑍 = 𝐼 𝑍 + 𝐼(𝑌|𝑍).
  • This important fact is knows as the chain

rule.

  • If 𝑌 ⊥ 𝑍, then

𝐼 𝑌𝑍 = 𝐼 𝑌 + 𝐼 𝑍 𝑌 = 𝐼 𝑌 + 𝐼 𝑍 .

19

slide-20
SLIDE 20

Mutual information

  • The mutual information is defined as

𝐽 𝑌; 𝑍 = 𝐼 𝑌 − 𝐼 𝑌 𝑍 = 𝐼 𝑍 − 𝐼(𝑍|𝑌)

  • “How much knowing 𝑌 reduce the

uncertainty of 𝑍?”

  • Conditional mutual information:

𝐽 𝑌; 𝑍 𝑎 ≔ 𝐼 𝑌 𝑎 − 𝐼(𝑌|𝑍𝑎)

  • Simple intuitive interpretation.

20

slide-21
SLIDE 21

The information cost of a protocol

  • Prior distribution: 𝑌, 𝑍 ∼ 𝜈.

A B

X Y Protocol π Protocol transcript Π

𝐽𝐷(𝜌, 𝜈) = 𝐽(Π; 𝑍|𝑌) + 𝐽(Π; 𝑌|𝑍) what Alice learns about Y + what Bob learns about X

Depends

  • n both

Π and 𝜈

slide-22
SLIDE 22

Example

  • 𝐺 is “𝑌 = 𝑍? ”.
  • 𝜈 is a distribution where 𝑌 = 𝑍 w.p. ½ and

(𝑌, 𝑍) are random w.p. ½. A B

X Y

𝐽𝐷(𝜌, 𝜈) = 𝐽(Π; 𝑍|𝑌) + 𝐽(Π; 𝑌|𝑍) ≈

what Alice learns about Y + what Bob learns about X

SHA-256(X) [256 bits]

X=Y? [1 bit]

129 = 130 bits 1 +

slide-23
SLIDE 23

The information complexity of a problem

  • Communication complexity:

𝐷𝐷 𝐺, 𝜈, 𝜁 ≔ min

𝜌 𝑑𝑝𝑛𝑞𝑣𝑢𝑓𝑡 𝐺 𝑥𝑗𝑢ℎ 𝑓𝑠𝑠𝑝𝑠 ≤𝜁

𝐷𝐷(𝜌).

  • Analogously:

𝐽𝐷 𝐺, 𝜈, 𝜁 ≔ inf

𝜌 𝑑𝑝𝑛𝑞𝑣𝑢𝑓𝑡 𝐺 𝑥𝑗𝑢ℎ 𝑓𝑠𝑠𝑝𝑠 ≤𝜁

𝐽𝐷(𝜌, 𝜈).

  • (Easy) fact: 𝐽𝐷 𝐺, 𝜈, 𝜁 ≤ 𝐷𝐷 𝐺, 𝜈, 𝜁 .

23

Needed!

slide-24
SLIDE 24

Information = amortized communication

  • Recall: lim

𝑜→∞ 𝐷𝑜(𝑌)/𝑜 = 𝐼 𝑌

Theorem: [B.-Rao’11]

  • lim

𝑜→∞ 𝐷𝐷(𝐺𝑜, 𝜈𝑜, 𝜁)/𝑜 = 𝐽𝐷 𝐺, 𝜈, 𝜁 .

  • Corollary:

lim

𝑜→∞ 𝐷𝐷(𝐽𝑜𝑢𝑜, 0+)/𝑜 = 𝐽𝐷 𝐵𝑂𝐸, 0

slide-25
SLIDE 25

The two-bit AND

  • Alice and Bob each have a bit 𝑌, 𝑍 ∈ {0,1}

distributed according to some 𝜈 on 0,1 2.

  • Want to compute 𝑌 ∧ 𝑍, while revealing to

each other as little as possible to each

  • thers’ inputs (w.r.t. the worst 𝜈).
  • Answer 𝐽𝐷(𝐵𝑂𝐸, 0) is a number between 1

and 2.

25

slide-26
SLIDE 26

The two-bit AND

Results [B.-Garg-Pankratov-Weinstein’13]:

  • 𝐽𝐷 𝐵𝑂𝐸, 0 ≈ 1.4922 bits.
  • Find the value of 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 for all

priors 𝜈 and exhibit the information- theoretically optimal protocol for computing the 𝐵𝑂𝐸 of two bits.

  • Studying 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 as a function

ℝ+4/ℝ+ → ℝ+ is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions).

26

slide-27
SLIDE 27

The two-bit AND

  • Studying 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 as a function

ℝ+4/ℝ+ → ℝ+ is a functional minimization problem subject to a family of constraints (cf. construction of harmonic functions).

  • We adopt a “guess and verify” strategy,

although the general question of computing the information complexity of a function from its truth table is a very interesting one.

27

slide-28
SLIDE 28

The optimal protocol for AND

A B 𝑌 ∈ {0,1} 𝑍 ∈ {0,1} If X=1, A=1 If X=0, A=U[0,1] If Y=1, B=1 If Y=0, B=U[0,1]

1

slide-29
SLIDE 29

The optimal protocol for AND

A B If X=1, A=1 If X=0, A=U[0,1] If Y=1, B=1 If Y=0, B=U[0,1]

1

“Raise your hand when your number is reached” 𝑌 ∈ {0,1} 𝑍 ∈ {0,1}

slide-30
SLIDE 30

Corollary: communication complexity of intersection

  • Corollary:

lim

𝜁→0 𝐷𝐷 𝐽𝑜𝑢𝑜, 𝜁 ≈ 1.4922 ⋅ 𝑜 ± 𝑝 𝑜 .

  • Specifically, e.g.

𝐷𝐷 𝐽𝑜𝑢𝑜, 1 𝑜 ≈ 1.4922 ⋅ 𝑜 ± 𝑝 𝑜 .

  • Note: Require 𝜕(1) rounds of interaction.

Using 𝑠 rounds results in +Θ

𝑜 𝑠2 cost!

30

slide-31
SLIDE 31

Communication complexity of Disjointness

  • With some additional work, obtain a tight

bound on the communication complexity of 𝐸𝑗𝑡𝑘𝑜 with tiny error: lim

𝜁→0 𝐷𝐷 𝐸𝑗𝑡𝑘𝑜, 𝜁 = 𝐷𝐸𝐽𝑇𝐾 ⋅ 𝑜 ± 𝑝(𝑜),

where 𝐷𝐸𝐽𝑇𝐾 ≔ max

𝜈:𝜈 1,1 =0 𝐽𝐷 𝐵𝑂𝐸, 𝜈, 0 ≈ 0.4827 …

  • Intuition: 𝐸𝑗𝑡𝑘𝑜 is an 𝑜-wise repetition of

𝐵𝑂𝐸, where the probability of a (1,1) is very low (≪ 1).

31

slide-32
SLIDE 32

Beyond two parties

  • Disjointness in the

coordinator model [B.-Ellen-

Oshman-Pitassi- Vaikuntanathan’13].

  • 𝑙 players.
  • Each 𝑞𝑗 holding a subset

Si ⊂ {1, … , 𝑜}

  • Want to decide whether the

intersection 𝑇𝑗 is empty.

32

𝑙

slide-33
SLIDE 33

𝐸𝑗𝑡𝑘 in the coordinator model

  • 𝑙 players, input length 𝑜.
  • Naïve protocol: 𝑃(𝑜 ⋅ 𝑙) communication.
  • Turns out to be asymptotically optimal!
  • The argument uses information complexity.

– The hard part is to design the hard distribution and the “right” information cost measure.

33

slide-34
SLIDE 34

The “hard” distribution

  • Si ⊂ {1, … , 𝑜}. Want to decide whether the

intersection 𝑇𝑗 is empty.

  • Should have very few (close to 0)

intersections.

34

slide-35
SLIDE 35
  • Attempt #1:

– Plant many 0’s (e.g. 50%):

35

1 … 1 1 1 … 1 1 1 … 1 1 … 1 1

𝑙 𝑜

The “hard” distribution

Coordinator keeps querying players until she finds a 0: ~𝑃 𝑜 communication

slide-36
SLIDE 36

The “hard” distribution

  • Attempt #2:

– Plant one zero in each coordinate

36

1 … 1 1 1 1 1 1 … 1 1 1 … 1 1 1 1 1 1 … 1 1

𝑙 𝑜

Each player sends its 0’s: still 𝑃 𝑜 log 𝑜 communication

slide-37
SLIDE 37
  • “Mix” the two attempts.

– Each coordinate has an RV 𝑁𝑗 ∼ 𝐶1/3. – If 𝑁𝑗 = 0, plant many 0’s (e.g. 50%). – If 𝑁𝑗 = 1, plant a single 0.

37

1 … 1 1 1 … 1 1 1 1 1 … 1 1 … 1 1 1 … 1 1

𝑙 𝑜

The “hard” distribution

𝑁𝑗

slide-38
SLIDE 38

The information cost notion

  • Assume the coordinator knows the 𝑁𝑗’s,

player 𝑘 knows the 𝑌𝑗𝑘.

  • The information cost:

– what the coordinator learns about the 𝑌𝑗𝑘 ’s+ – what each player learns about the 𝑁𝑗’s

  • Proving that the sum total is Ω(𝑜 ⋅ 𝑙)

requires some work, but the hardest part are the definitions.

38

slide-39
SLIDE 39

Intuition for hardness

  • Focus on a single 𝑗, 𝑘 pair: 𝑗’th coordinate, 𝑘’th

player.

  • (𝑁𝑗, 𝑌𝑗𝑘) are equally likely to be (0,0), (0,1) and

(1,1)

  • If Mi = 1, then the coordinator needs to know Xij

(which is almost certainly 1 in this case).

  • Either 𝑄𝑗 will learn about 𝑁𝑗, or will reveal too

much about 𝑌𝑗𝑘 when 𝑁𝑗 = 0.

39

𝑁𝑗 𝑌𝑗𝑘

slide-40
SLIDE 40

Multiparty information complexity

  • We don’t have a multiparty information

complexity theory for general distributions.

  • There is a fair chance that the difficulty is

conceptual.

  • One key difference between 2 players and

3+ players is the existence of secure multiparty computation.

40

slide-41
SLIDE 41

Beyond communication

  • Many applications to other interactive

communication regimes:

– Distributed joint computation & estimation; – Streaming; – Noisy coding…

  • We will briefly discuss a non-

communication application: two prover games.

41

slide-42
SLIDE 42

Two-prover games

  • Closely connected to hardness of

approximation:

– Probabilistically Checkable Proofs and the Unique Games Conjecture.

  • A nice way of looking at constraint

satisfaction problems.

42

slide-43
SLIDE 43

The odd cycle game

  • Alice and Bob want to convince Victoria

that the 7-cycle is 2-colorable.

  • Asks them to color the same or adjacent
  • vertices. Accepts if consistent.

43

A B

slide-44
SLIDE 44

The odd cycle game

44

A B V

𝑊

𝑏

𝑊

𝑐

OK

𝑊

𝑏

𝑊

𝑐

slide-45
SLIDE 45

The odd cycle game

45

A B V

𝑊

𝑏𝑊 𝑐

OK

𝑊

𝑏

𝑊

𝑐

slide-46
SLIDE 46

The odd cycle game

46

A B V

𝑊

𝑏

𝑊

𝑐

𝑊

𝑏

𝑊

𝑐

slide-47
SLIDE 47

The odd cycle game

  • An example of a “unique game”.
  • If the cycle is even: Alice and Bob win with

probability 1.

  • For odd cycle of length 𝑛, win with

probability 𝑞1 = 1 − 1

2𝑛.

  • What about winning many copies of the

game simultaneously?

47

slide-48
SLIDE 48

Simultaneous challenges

  • Alice gets 𝑊

𝑏 1, 𝑊 𝑏 2, 𝑊 𝑏 3, … , 𝑊 𝑏 𝑙 and returns a

vector of colors.

  • Bob gets 𝑊

𝑐 1, 𝑊 𝑐 2, 𝑊 𝑐 3, … , 𝑊 𝑐 𝑙 and returns a

vector of colors.

  • Avoid jail if all color pairs are consistent.

𝑊

𝑏 1

𝑊

𝑐 1

𝑊

𝑏 2 𝑊 𝑐 2

𝑊

𝑐 3

𝑊

𝑏 𝑙

𝑊

𝑐 𝑙

𝑊

𝑏 3

slide-49
SLIDE 49

Parallel repetition

  • Play 𝑙 = 𝑛2 copies.
  • A naïve strategy: 1 −

1 2𝑛 𝑛2

= 𝑓−Θ 𝑛 ≪ 1

  • Can one do better?

49

slide-50
SLIDE 50

Parallel repetition

  • Play 𝑙 = 𝑛2 copies.
  • A naïve strategy: 1 −

1 2𝑛 𝑛2

= 𝑓−Θ 𝑛 ≪ 1

  • It turns out that one can win 𝑛2 copies of the
  • dd cycle game with a constant probability

[Raz’08].

  • Proof by exhibiting a strategy.

50

slide-51
SLIDE 51

Connection to foams

  • Connected to “foams”: tilings of ℝ𝑒 with a

shape 𝐵 so that 𝐵 + ℤ𝑒 = ℝ𝑒.

  • What can the smallest surface area of A

be?

51

[Feige-Kindler-O’Donnell’07]

slide-52
SLIDE 52

Connection to foams

  • Obvious upper bound: 𝑃(𝑒).
  • Obvious lower bound (sphere of volume 1):

Ω( 𝑒).

  • [Feige-Kindler-O’Donnell’07]: Noticed a

connection between the problems.

  • [Kindler-O’Donnel-Rao-Wigderson’08]: a

construction of foams of surface area 𝑃( 𝑒) based on Raz’s strategy.

52

slide-53
SLIDE 53

An information-theoretic view

53

A B

𝑊

𝑏

𝑊

𝑐

  • “Advice” on where to cut the cycle wins the

game with probability 1 if the cut does not pass through the challenge edge.

slide-54
SLIDE 54

An information-theoretic view

54

A B

𝑊

𝑏

𝑊

𝑐

  • Merlin can give such advice at “information

cost” 𝑃

1 𝑛2 .

slide-55
SLIDE 55

𝑊

𝑏

𝑊

𝑐

The distribution

55

𝑊

𝑐′

  • KL-divergence between the two

distributions is Θ

1 𝑛2

  • Statistical distance is Θ

1 𝑛

slide-56
SLIDE 56

Taking 𝑛2 copies

  • Total information revealed by Merlin:

𝑛2 ⋅ 𝑃

1 𝑛2 = 𝑃 1 .

  • Can be simulated successfully with 𝑃 1

communication, or with no communication with probability Ω(1).

slide-57
SLIDE 57

Parallel repetition

  • Using similar intuition (but more technical work)

can obtain a general tight parallel repetition problem in the “small value” regime. [B.-Garg’15]

  • If one copy of a game 𝐻 has success probability

𝜀 < 1/2, then 𝑛 copies have success probability < 𝜀Ω 𝑛 (*for “projection games”; for general games tight bound a bit more complicated)

  • [Dinur-Steurer’14] obtained the result for

projection games using spectral techniques.

57

slide-58
SLIDE 58

Challenges

  • Information complexity beyond two

communicating parties.

  • Continuous measures of complexity.

58

slide-59
SLIDE 59

59

Thank You!