Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted - - PowerPoint PPT Presentation

β–Ά
lecture 14 planted sparse vector lecture outline
SMART_READER_LITE
LIVE PREVIEW

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted - - PowerPoint PPT Presentation

Lecture 14: Planted Sparse Vector Lecture Outline Part I: Planted Sparse Vector and 2 to 4 Norm Part II: SOS and 2 to 4 Norm on Random Subspaces Part III: Warmup: Showing 1 Part IV: 4-Norm Analysis Part V:


slide-1
SLIDE 1

Lecture 14: Planted Sparse Vector

slide-2
SLIDE 2

Lecture Outline

  • Part I: Planted Sparse Vector and 2 to 4 Norm
  • Part II: SOS and 2 to 4 Norm on Random

Subspaces

  • Part III: Warmup: Showing 𝑦

β‰ˆ 1

  • Part IV: 4-Norm Analysis
  • Part V: SOS-symmetry to the Rescue
  • Part VI: Observations and Loose Ends
  • Part VII: Open Problems
slide-3
SLIDE 3

Part I: Planted Sparse Vector and 2 to 4 Norm

slide-4
SLIDE 4
  • Planted Sparse Vector problem: Given the span
  • f 𝑒 βˆ’ 1 random vectors in β„π‘œ and one unit

vector 𝑀 ∈ β„π‘œ of sparsity 𝑙, can we recover 𝑀?

  • More precisely, let π‘Š be an n Γ— 𝑒 matrix where:

1. 𝑒 βˆ’ 1 columns of π‘Š are vectors of length β‰ˆ 1 chosen randomly from β„π‘œ

  • 2. One column of π‘Š is a unit vector 𝑀 with ≀ 𝑙

nonzero entries.

  • Given π‘Šπ‘† where 𝑆 is an arbitrary invertible 𝑒 Γ— 𝑒

matrix, can we recover 𝑀?

Planted Sparse Vector

slide-5
SLIDE 5
  • Theorem 1.4 [BKS14]: There is a constant 𝑑 > 0

and an algorithm based on constant degree SOS such that for every vector 𝑀0 supported on at most π‘‘π‘œ β‹… min{1, π‘œ/𝑒2} coordinates, if 𝑀1, … , 𝑀𝑒 are chosen independently at random from the Gaussian distribution on π‘†π‘œ, then given any basis for π‘Š = π‘‘π‘žπ‘π‘œ{𝑀0, … , 𝑀𝑒}, the algorithm

  • utputs an πœ—-approximation to 𝑀0 in

π‘žπ‘π‘šπ‘§(π‘œ, log(1/πœ—)) time.

Theorem Statement

slide-6
SLIDE 6
  • Random Distribution: We choose each entry of π‘Š

independently from 𝑂 0,

1 π‘œ , the normal

distribution with mean 0 and standard deviation

1 π‘œ

  • We then choose 𝑆 to be a random 𝑒 Γ— 𝑒
  • rthogonal/rotation matrix and take π‘Šπ‘† to be
  • ur input matrix.

Random Distribution

slide-7
SLIDE 7
  • Remark: If 𝑆 is any 𝑒 Γ— 𝑒 orthogonal/rotation

matrix then π‘Šπ‘† can also be chosen by taking each entry of π‘Š independently from 𝑂 0,

1 π‘œ .

  • Idea: Each row of π‘Š comes from a multivariate

normal distribution with covariance matrix 1

π‘œ 𝐽𝑒𝑒,

which is invariant under rotations

Random Distribution

slide-8
SLIDE 8
  • Planted Distribution: We choose each entry of

the first 𝑒 βˆ’ 1 columns of π‘Š independently from 𝑂 0,

1 π‘œ . The last column of π‘Š is our sparse unit

vector 𝑀.

  • We then choose 𝑆 to be a random 𝑒 Γ— 𝑒
  • rthogonal/rotation matrix and take π‘Šπ‘† to be
  • ur input matrix.

Planted Distribution

slide-9
SLIDE 9
  • We ask for an 𝑦 such that

1. π‘Šπ‘†π‘¦ = 1 2. π‘Šπ‘†π‘¦ is k-sparse (i.e. at most 𝑙 indices of π‘Šπ‘†π‘¦ are nonzero).

  • Hard to search for 𝑦 such that π‘Šπ‘†π‘¦ is k-sparse,

so we’ll need to relax the problem.

Output

slide-10
SLIDE 10
  • Key idea: All unit vectors have the same 2-norm.

However, sparse vectors will have higher 4-norm

  • 4-norm for a 𝑙-sparse unit vector in β„π‘œ is at

least

4 k β‹… 1

𝑙2 = 1

4 𝑙 (obtained by setting 𝑙

coordinates to

Β±1 𝑙 and the rest to 0)

  • Relaxation Attempt #1: Search for an 𝑦 such that

1. π‘Šπ‘†π‘¦ = 1 2. π‘Šπ‘†π‘¦ 4 β‰₯

1

4 𝑙

Distinguishing Sparse Vectors

slide-11
SLIDE 11
  • This is the 2 to 4 Norm Problem: Given a matrix

𝐡, find the vector 𝑦 which maximizes

𝐡𝑦 4 𝐡𝑦

2 to 4 Norm Problem

slide-12
SLIDE 12

Part II: SOS and 2 to 4 Norm on Random Subspaces

slide-13
SLIDE 13
  • Unfortunately, the 2 to 4 norm problem is hard

[BBH+12]:

– NP-hard to obtain an approximation ratio of 1 +

1 π‘œπ‘žπ‘π‘šπ‘§π‘šπ‘π‘•(π‘œ)

– Assuming ETH (the exponential time hypothesis), it is hard to approximate to within a constant factor.

  • Thus, we’ll need to relax our problem further.

2 to 4 Norm Hardness

slide-14
SLIDE 14
  • Relaxation: Find ΰ·¨

𝐹 which respects the following constraints:

1. π‘Šπ‘†π‘¦ 2 = σ𝑗=1

π‘œ

π‘Šπ‘†π‘¦ 𝑗

2 = 1

2. π‘Šπ‘†π‘¦ 4

4 = σ𝑗=1 π‘œ

π‘Šπ‘†π‘¦ 𝑗

4 β‰₯ 1 𝑙

SOS Relaxation

slide-15
SLIDE 15
  • Constraints:

1. π‘Šπ‘†π‘¦ 2 = σ𝑗=1

π‘œ

π‘Šπ‘†π‘¦ 𝑗

2 = 1

2. π‘Šπ‘†π‘¦ 4

4 = σ𝑗=1 π‘œ

π‘Šπ‘†π‘¦ 𝑗

4 β‰₯ 1 𝑙

  • To show that SOS distinguishes between the

random and planted distribution, it is sufficient to show that there is no ΰ·¨ 𝐹 which respects these constraints and has a PSD moment matrix 𝑁.

  • Remark: Although the 2 to 4 Norm problem is

hard in general, we just need to show that SOS can approximate it on random subspaces.

Showing a Distinguishing Algorithm

slide-16
SLIDE 16
  • Given a random subspace, what is the expected

value of the largest 4-norm of a unit vector in the subspace?

  • Trivial strategy: Any unit vector’s 4-norm is at

least

1

4 π‘œ.

  • Can we do better?

2 to 4 Norm on Random Subspaces

slide-17
SLIDE 17
  • Another strategy: Take a basis for this space and

take a linear combination which maximizes one coordinate (subject to having length 1)

  • If we add together 𝑒 random vectors with entries

β‰ˆ Β±

1 π‘œ, w.h.p. the result will have norm ΰ·©

Θ 𝑒 . Diving the resulting vector by ΰ·© Θ 𝑒 , the maximized entry will have magnitude ΰ·© Θ

𝑒 π‘œ ,

  • ther entries will have magnitude ΰ·©

O

1 π‘œ

2 to 4 Norm on Random Subspaces

slide-18
SLIDE 18
  • Calling our final result π‘₯, w.h.p. the maximized

entry of π‘₯ contributes ΰ·© Θ

𝑒2 π‘œ2 to π‘₯ 4 4 while the

  • ther entries contribute ΰ·©

Θ

1 π‘œ .

  • It turns out that this strategy is essentially
  • ptimal. Thus, with high probability the

maximum 4-norm of a unit vector in a d- dimensional random subspace will be ෩ Θ max

𝑒 π‘œ , 1

4 π‘œ

.

2 to 4 Norm on Random Subspaces

slide-19
SLIDE 19
  • Planted dist: max 4-norm β‰₯

1

4 𝑙

  • Random dist: max 4-norm is ΰ·©

Θ max

𝑒 π‘œ , 1

4 π‘œ

.

  • IF SOS can certify the upper bound for a

random subspace, this gives a distinguishing algorithm when max

𝑒 π‘œ , 1

4 π‘œ β‰ͺ

1

4 𝑙 (which

happens when 𝑒 ≀ π‘œ and 𝑙 β‰ͺ π‘œ or when 𝑒 β‰₯ π‘œ and k β‰ͺ π‘œ2

𝑒2)

Algorithm Boundary

slide-20
SLIDE 20

Part III: Warmup: Showing 𝑦 β‰ˆ 1

slide-21
SLIDE 21
  • Take π‘₯ = π‘Šπ‘†π‘¦.
  • We expect that π‘₯

β‰ˆ 𝑦 . Since we require that π‘₯ = 1, this implies that we will have 𝑦 β‰ˆ 1

  • To check that π‘₯

β‰ˆ 𝑦 , observe that π‘₯ 2

2 =

π‘¦π‘ˆ RV T VR x. Thus, it is sufficient to show that RV T VR β‰ˆ 𝐽𝑒.

Showing 𝑦 β‰ˆ 1

slide-22
SLIDE 22
  • We have that RV T VR β‰ˆ 𝐽𝑒 because the

columns of π‘Šπ‘† are 𝑒 random unit vectors (where 𝑒 β‰ͺ π‘œ) and are thus approximately

  • rthonormal.
  • However, we will use graph matrices to analyze

the 4-norm, so as a warm-up, let’s check that RV T VR β‰ˆ 𝐽𝑒 using graph matrices.

Checking RV T VR β‰ˆ 𝐽𝑒

slide-23
SLIDE 23
  • So far we have worked over {βˆ’1, +1}𝑛.
  • How can we use graph matrices over 𝑂 0,1 𝑛?
  • Key idea: Look at the Fourier characters over

𝑂(0,1).

Graph Matrices Over 𝑂(0,1)

slide-24
SLIDE 24
  • Inner product on 𝑂 0,1 : 𝑔 β‹… 𝑕 =

πΉπ‘¦βˆΌπ‘‚ 0,1 𝑔 𝑦 𝑕(𝑦)

  • Fourier characters: Hermite polynomials
  • The first few Hermite polynomials (up to

normalization) are as follows:

1. β„Ž0 = 1 2. β„Ž1 = 𝑦 3. β„Ž2 = 𝑦2 βˆ’ 1 4. β„Ž3 = 𝑦3 βˆ’ 3𝑦

  • To normalize, divide β„Žπ‘˜ by π‘˜!

Fourier Analysis Over 𝑂(0,1)

slide-25
SLIDE 25
  • Graph matrices over {βˆ’1,1}𝑛: 1 and 𝑦 are a

basis for functions over {βˆ’1,1}. We represent 𝑦 by an edge and 1 by the absence of an edge

  • Graph matrices over 𝑂 0,1 𝑛: {β„Žπ‘˜} are a basis

for functions over 𝑂(0,1). We represent β„Žπ‘˜ by a multi-edge with multiplicity π‘˜.

Graph Matrices Over 𝑂(0,1)

slide-26
SLIDE 26
  • For convenience, take 𝐡 =

π‘œπ‘†π‘Š and think of the entries of 𝐡 as the input. Now each entry of 𝐡 is chosen independently from 𝑂(0,1)

  • π΅π‘—π‘˜ is represented by an edge from node 𝑗 to

node π‘˜.

  • In class challenge: What is RV T VR in terms
  • f graph matrices?

Graph Matrices for RV T VR

π‘˜1 𝑗 𝑗 π‘˜2

Γ—

1 π‘œ π‘œ π‘œ 𝑒 𝑒

slide-27
SLIDE 27
  • In class challenge answer:

Graph Matrices for RV T VR

π‘˜1 𝑗 𝑗 π‘˜2

Γ—

1 π‘œ π‘œ π‘œ 𝑒 𝑒

=

d n d 𝑉 π‘Š π‘˜1 π‘˜2 𝑗 1 π‘œ

+

d n 𝑉 = π‘Š π‘˜ 𝑗 d n 𝑉 = π‘Š π‘˜ 𝑗 1 π‘œ

+

2 π‘œ

slide-28
SLIDE 28
  • Here we have two different types of vertices,
  • ne for the rows of 𝐡 (which has π‘œ possibilities)

and one for the columns of 𝐡 (which has 𝑒 possibilities)

  • Can generalize the rough norm bounds to handle

multiple types of vertices (writing this up is on my to-do list)

Generalizing Rough Norm Bounds

slide-29
SLIDE 29
  • Generalized rough norm bounds:
  • Each isolated vertex outside of 𝑉 and π‘Š

contributes a factor equal to the number of possibilities for that vertex

  • Each vertex in the minimum separator (which

minimizes the total number of possibilities for its vertices) contributes nothing

  • Each other vertex contributes a factor equal to

the square root of the number of possibilities for that vertex

Generalizing Rough Norm Bounds

slide-30
SLIDE 30

Norm Bounds for RV T VR

π‘˜1 𝑗 𝑗 π‘˜2

Γ—

1 π‘œ π‘œ π‘œ 𝑒 𝑒

=

d n d 𝑉 π‘Š π‘˜1 π‘˜2 𝑗 1 π‘œ

+

d n 𝑉 = π‘Š π‘˜ 𝑗 d n 𝑉 = π‘Š π‘˜ 𝑗 1 π‘œ

+

2 π‘œ

ΰ·¨ 𝑃

𝑒 π‘œ

ΰ·¨ 𝑃

1 π‘œ

= 𝐽𝑒𝑒

slide-31
SLIDE 31

Part IV: 4-Norm Analysis

slide-32
SLIDE 32
  • We want to bound

1 π‘œ 𝐡𝑦 4 4

  • Take 𝐢 to be the matrix with entries 𝐢𝑗,(π‘˜1,π‘˜2) =

π΅π‘—π‘˜1π΅π‘—π‘˜2

  • 1

π‘œ 𝐡𝑦 4 4

= 1

π‘œ2 𝑦 βŠ— 𝑦 π‘ˆπΆπ‘ˆπΆ(𝑦 βŠ— 𝑦)

  • Can try to bound πΆπ‘ˆπΆ

4-Norm Analysis

slide-33
SLIDE 33
  • Picture for πΆπ‘ˆπΆ:

Picture for πΆπ‘ˆπΆ

π‘˜1 𝑗 π‘œ 𝑒 π‘˜1 𝑗 π‘œ 𝑒 𝑗 π‘œ π‘˜1 π‘˜2 2

+ +

π‘˜3 𝑗 π‘œ 𝑒 π‘˜3 𝑗 π‘œ 𝑒 𝑗 π‘œ π‘˜3 π‘˜4 2

+ + Γ—

slide-34
SLIDE 34
  • If 𝑒 ≀

π‘œ, the target norm bound on πΆπ‘ˆπΆ is ΰ·© O(π‘œ), giving a bound of ΰ·© O

1 π‘œ on π‘Šπ‘†π‘¦ 4 4.

  • If 𝑒 β‰₯

π‘œ, the target norm bound on πΆπ‘ˆπΆ is ΰ·© O 𝑒2 , giving a bound of ΰ·© O

𝑒2 π‘œ2 on π‘Šπ‘†π‘¦ 4 4

Targets

slide-35
SLIDE 35

Casework

d n 𝑉 π‘˜1 𝑗 d π‘˜2 d π‘Š π‘˜3 d π‘˜4 𝑗 π‘œ π‘˜1 π‘˜2

Γ—

𝑗 π‘œ π‘˜3 π‘˜4

Norm ΰ·¨ 𝑃 𝑒 π‘œ if 𝑒 ≀ π‘œ, norm ΰ·¨ 𝑃 𝑒2 if 𝑒 β‰₯ π‘œ

slide-36
SLIDE 36

Casework

d n 𝑉 π‘˜1 𝑗 d π‘˜2 π‘Š d π‘˜4 𝑗 π‘œ π‘˜1 π‘˜2

Γ—

𝑗 π‘œ π‘˜1 π‘˜4

Norm ΰ·¨ 𝑃 π‘’π‘œ Note: 0 or 2 edges between 𝑗 and π‘˜1

slide-37
SLIDE 37

Casework

d n 𝑉 = π‘Š π‘˜1 𝑗 d π‘˜2 𝑗 π‘œ π‘˜1 π‘˜2

Γ—

𝑗 π‘œ π‘˜1 π‘˜2

= π‘œπ½π‘’ + Norm ΰ·¨ 𝑃 π‘œ Note: 0 or 2 edges between 𝑗 and π‘˜1, 0 or 2 edges between 𝑗 and π‘˜2

slide-38
SLIDE 38

Casework

d n 𝑉 π‘˜1 𝑗 d π‘Š π‘˜3 d π‘˜4 𝑗 π‘œ π‘˜1

Γ—

𝑗 π‘œ π‘˜3 π‘˜4

Norm ΰ·¨ 𝑃 π‘œπ‘’3 Too large! Note: 0 or 2 edges between 𝑗 and π‘˜1 Note: 0 or 2 edges between 𝑗 and π‘˜1

slide-39
SLIDE 39

Casework

n 𝑉 𝑗 d π‘Š π‘˜1 d π‘˜4 𝑗 π‘œ π‘˜1

Γ—

𝑗 π‘œ π‘˜1 π‘˜4

Norm ΰ·¨ 𝑃 π‘’π‘œ Note: 1 or 3 edges between 𝑗 and π‘˜1 Note: 0 or 2 edges between 𝑗 and π‘˜1

slide-40
SLIDE 40

Casework

d n 𝑉 π‘˜1 𝑗 d π‘Š π‘˜2 𝑗 π‘œ π‘˜1

Γ—

𝑗 π‘œ π‘˜2

Norm ΰ·¨ 𝑃 π‘œπ‘’ Too large! Note: 0 or 2 edges between 𝑗 and π‘˜1 and between 𝑗 and π‘˜2 Note: 0 or 2 edges between 𝑗 and π‘˜1 and between 𝑗 and π‘˜2

slide-41
SLIDE 41

Casework

n 𝑗 d 𝑉 = π‘Š π‘˜1 𝑗 π‘œ π‘˜1

Γ—

𝑗 π‘œ π‘˜1

Turns out to be 3𝐽𝑒 + Norm ΰ·¨ 𝑃( π‘œ) Note: 0 or 2 edges between 𝑗 and π‘˜1 on both ends Note: 0,2, or 4 edges between 𝑗 and π‘˜1

slide-42
SLIDE 42
  • Most cases have sufficiently small norm.
  • Two cases have a norm which is too large, so

norm bounds alone are not enough…

Summary

slide-43
SLIDE 43

Part V: SOS-Symmetry to the Rescue

slide-44
SLIDE 44
  • Instead of looking at max

π‘₯: π‘₯ =1π‘₯π‘ˆπΆπ‘ˆπΆπ‘₯, we only

need to upper bound max

𝑦: 𝑦 =1 𝑦 βŠ— 𝑦 π‘ˆπΆπ‘ˆπΆ(𝑦 βŠ— 𝑦)

  • As far as 𝑦 βŠ— 𝑦 π‘ˆπΆπ‘ˆπΆ(𝑦 βŠ— 𝑦) is concerned, we

can rearrange indices in pieces of πΆπ‘ˆπΆ.

Key Idea: Rearranging Indices

slide-45
SLIDE 45

Rearranging Indices Case #1

d n 𝑉 π‘˜1 𝑗 d π‘Š π‘˜2 𝑗 π‘œ π‘˜1

Γ—

𝑗 π‘œ π‘˜2 d n 𝑉 = π‘Š π‘˜1 𝑗 d π‘˜2 𝑗 π‘œ π‘˜1 π‘˜2

Γ—

𝑗 π‘œ π‘˜1 π‘˜2 rearranging indices

slide-46
SLIDE 46

Rearranging Indices Case #2

rearranging indices d n 𝑉 π‘˜1 𝑗 d π‘Š π‘˜3 d π‘˜4 𝑗 π‘œ π‘˜1

Γ—

𝑗 π‘œ π‘˜3 π‘˜4 d n 𝑉 π‘˜1 𝑗 d π‘˜2 π‘Š d π‘˜4 𝑗 π‘œ π‘˜1 π‘˜2

Γ—

𝑗 π‘œ π‘˜1 π‘˜4

slide-47
SLIDE 47
  • For the two cases whose norm is too high, their

norm can be reduced by rearranging indices.

  • This proves the upper bound on

max

𝑦: 𝑦 =1 𝑦 βŠ— 𝑦 π‘ˆπΆπ‘ˆπΆ(𝑦 βŠ— 𝑦)

Effect of Rearranging Indices

slide-48
SLIDE 48

Part VI: Observations and Loose Ends

slide-49
SLIDE 49
  • Note: This 4-norm analysis roughly

corresponds to p.33-37 of [BBH+12]

  • Remark: When 𝑒 β‰ͺ

π‘œ, with a slightly more careful analysis we can show that 𝑦 βŠ— 𝑦 π‘ˆπΆπ‘ˆπΆ 𝑦 βŠ— 𝑦 = 3 Β± 𝑝 1 𝑦 2

4,

matching the results in [BBH+12].

Observations: 4-Norm Analysis

slide-50
SLIDE 50
  • How can we handle arbitrary 𝑆 rather than a

random orthogonal 𝑆 (i.e. any span of the vectors)?

  • SOS handles it automatically!
  • Idea: The SOS-symmetry and 𝑁 ≽ 0 constraints

are invariant under linear transformations of the

  • variables. Thus, having a different 𝑆 merely

applies a linear transformation to the pseudo- expectation values.

Loose Ends: Arbitrary 𝑆

slide-51
SLIDE 51
  • We have only shown a distinguishing algorithm

between the random and planted cases. How can we find the planted sparse vector 𝑀 exactly?

  • Can be done in two steps:
  • 1. The analysis shows that degree 4 SOS will output a

vector 𝑀′ which is highly correlated with 𝑀 (because the random part of the subspace has nothing with high 4-norm)

  • 2. Using 𝑀′ as a guide, find 𝑀. This can be done by

minimizing then 𝑀1 norm of a vector 𝑀 in the subspace subject to 𝑀 β‹… 𝑀′ = 1, see [BKS14] for details.

Loose Ends: Finding 𝑀 Exactly

slide-52
SLIDE 52

Part VII: Open Problems

slide-53
SLIDE 53
  • What more can we say when 𝑒 ≫

π‘œ?

  • More specifically, can we find a better algorithm

using more than the 4-norm? Is there an SOS lower bound showing that 𝑙 =

π‘œ2 𝑒2 is tight?

Open Problems

slide-54
SLIDE 54

References

  • [BBH+12] B. Barak, F. G. S. L. BrandΓ£o, A. W. Harrow, J. A. Kelner, D. Steurer, and Y.
  • Zhou. Hypercontractivity, sum-of-squares proofs, and their applications. STOC p.

307–326, 2012.

  • [BKS14] B. Barak, J. A. Kelner, and D. Steurer. Rounding Sum of Squares Relaxations.

STOC 2014.