approximation algorithms I David Steurer Cornell Cargese Workshop, - - PowerPoint PPT Presentation

▶

Jan 31, 2024 96 likes •457 views

SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014 encoded as low-degree polynomial in meta-task 2 example: () = ,

SLIDE 1

Cargese Workshop, 2014

SUM-OF-SQUARES method and

approximation algorithms I

David Steurer

Cornell

SLIDE 2

meta-task given: functions 𝑔

1, … , 𝑔 𝑛: ±1 𝑜 → ℝ

find: solution 𝑦 ∈ ±1 𝑜 to 𝑔

1 = 0, … , 𝑔 𝑛 = 0

encoded as low-degree polynomial in ℝ 𝑦

example: 𝑔(𝑦) = 𝑗,𝑘∈ 𝑜 𝑥𝑗𝑘 ⋅ 𝑦𝑗 − 𝑦𝑘

examples: combinatorial optimization problem on graph 𝐻

MAX CUT:

𝑀𝐻 = 1 − 𝜁 over ±1 𝑜

MAX BISECTION:

𝑀𝐻 = 1 − 𝜁, 𝑗 𝑦𝑗 = 0 over ±1 𝑜

where 1 − 𝜁 is guess for optimum value

Laplacian 𝑀𝐻 =

1 𝐹 𝐻 𝑗𝑘∈𝐹 𝐻 1 4 𝑦𝑗 − 𝑦𝑘 2

goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation (“on the edge intractability”  need strongest possible relaxations)

SLIDE 3

price of convexity: individual solutions  distributions over solutions price of tractability: can only enforce “efficiently checkable knowledge” about solutions individual solutions distributions over solutions “pseudo-distributions over solutions”

(consistent with efficiently checkable knowledge)

given: functions 𝑔

1, … , 𝑔 𝑛: ±1 𝑜 → ℝ

find: solution 𝑦 ∈ ±1 𝑜 to 𝑔

1 = 0, … , 𝑔 𝑛 = 0

meta-task goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation

SLIDE 4

distribution 𝐸 over ±1 𝑜 function 𝐸: ±1 𝑜 → ℝ non-negativity: 𝐸 𝑦 ≥ 0 for all 𝑦 ∈ ±1 𝑜 normalization: 𝑦∈ ±1 𝐸 𝑦 = 1 distribution 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: ±1 𝑜 → ℝ

𝔽𝐸𝑔

1 2 + ⋯ + 𝑔 𝑛 2 = 0

(equivalently: ℙ𝐸 ∀𝑗. 𝑔

𝑗 ≠ 0 = 0)

examples uniform distribution: 𝐸 = 2−𝑜 fixed 2-bit parity: 𝐸 𝑦 = (1 + 𝑦1𝑦2)/2𝑜 examples fixed 2-bit parity distribution satisfies 𝑦1𝑦2 = 1 uniform distribution does not satisfy 𝑔 = 0 for any 𝑔 ≠ 0

convex: 𝐸, 𝐸′ satisfy conditions  𝐸 + 𝐸′ /2 satisfies conditions

# function values is exponential  need careful representation # independent inequalities is exponential  not efficiently checkable

SLIDE 5

distribution 𝐸 over ±1 𝑜 function 𝐸: ±1 𝑜 → ℝ non-negativity: 𝐸 𝑦 ≥ 0 for all 𝑦 ∈ ±1 𝑜 normalization: 𝑦∈ ±1 𝐸 𝑦 = 1 distribution 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: ±1 𝑜 → ℝ

𝔽𝐸𝑔

1 2 + ⋯ + 𝑔 𝑛 2 = 0

(equivalently: ℙ𝐸 ∀𝑗. 𝑔

𝑗 ≠ 0 = 0)

deg.-𝑒 pseudo-distribution 𝐸 𝑦∈ ±1 𝑜 𝐸 𝑦 𝑔 𝑦 2 ≥ 0 for every deg.-𝑒/2 polynomial 𝑔 convenient notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 “pseudo-expectation of 𝑔 under 𝐸” 𝔽𝐸 deg.-2𝑜 pseudo-distributions are actual distributions (point-indicators 𝟐 𝑦 have deg. 𝑜  𝐸 𝑦 = 𝔽𝐸𝟐 𝑦

≥ 0) pseudo-

SLIDE 6

deg.-𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ non-negativity: 𝔽𝐸𝑔2 ≥ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: ±1 𝑜 → ℝ

𝔽𝐸𝑔

1 2 + ⋯ + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “pseudo-expectation of 𝑔 under 𝐸”

SLIDE 7

deg.-𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ non-negativity: 𝔽𝐸𝑔2 ≥ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: ±1 𝑜 → ℝ

𝔽𝐸𝑔

1 2 + ⋯ + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “pseudo-expectation of 𝑔 under 𝐸” claim: can compute such 𝐸 in time 𝑜𝑃(𝑒) if it exists (otherwise, certify that no

solution to original problem exists)

(can assume 𝐸 is deg.-𝑒 polynomial  separation problem min

𝑔

𝔽𝐸𝑔2 is 𝑜𝑒-

dim. eigenvalue prob.  𝑜𝑃(𝑒)-time via grad. descent / ellipsoid method)

[Shor, Parrilo, Lasserre]

SLIDE 8

surprising property: 𝔽𝐸𝑔 ≥ 0 for many* low-degree polynomials 𝑔 such that 𝑔 ≥ 0 follows from 𝑔

1 = 0, … , 𝑔 𝑛 = 0 by “explicit proof”

soon: examples of such properties and how to exploit them deg.-𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ non-negativity: 𝔽𝐸𝑔2 ≥ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: ±1 𝑜 → ℝ

𝔽𝐸𝑔

1 2 + ⋯ + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “pseudo-expectation of 𝑔 under 𝐸”

SLIDE 9

surprising property: 𝔽𝐸𝑔 ≥ 0 for many* low-degree polynomials 𝑔 such that 𝑔 ≥ 0 follows from 𝑔

1 = 0, … , 𝑔 𝑛 = 0 by “explicit proof”

deg.-𝑒 pseudo-distr. 𝐸: ±1 𝑜 → ℝ non-negativity: 𝔽𝐸𝑔2 ≥ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: ±1 𝑜 → ℝ

𝔽𝐸𝑔

1 2 + ⋯ + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 ⋅ 𝑕 = 0 whenever deg 𝑕 ≤ 𝑒 − deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , “pseudo-expectation of 𝑔 under 𝐸” soon: examples of such properties and how to exploit them emerging algorithm-design paradigm: analyze algorithm pretending that underlying actual distribution exists; verify only afterwards that low-deg. pseudo-distr.’s satisfy required properties

pseudo-distr. over

ptimal solutions

approximate solution (to original problem) efficient algorithm deg.-𝑒 part of actual distr.

ver optimal solutions

𝑜𝑝(𝑒)-time algorithms cannot* distinguish between deg.-𝑒 pseudo-distributions and deg.-𝑒 part of actual distr.’s

SLIDE 10

dual view (sum-of-squares proof system)

either ∃ deg.-𝑒 pseudo-distribution 𝐸 over ±1 𝑜 satisfying 𝑔

1 = 0, … , 𝑔 𝑛 = 0

∃ 𝑕1, … , 𝑕𝑛 and ℎ1, … , ℎ𝑙 such that 𝑗 𝑔

𝑗 ⋅ 𝑕𝑗 + 𝑘 ℎ𝑘 2 = −1 over ±1 𝑜

and deg 𝑔

𝑗 + deg 𝑕𝑗 ≤ 𝑒 and deg ℎ𝑗 ≤ 𝑒/2

derivation of unsatisfiable constraint −1 ≥ 0 from 𝑔

1 = 0, … , 𝑔 𝑛 = 0 over ±1 𝑜

−1

𝐸

𝑔 𝑔

𝑔

𝑛

𝐿𝑒 = 𝑔 = 𝑗 𝑔

𝑗 ⋅ 𝑕𝑗 + 𝑘 ℎ𝑘 2

𝐿𝑒 if −1 ∉ 𝐿𝑒 then ∃ separating hyperplane 𝐸 with 𝔽𝐸 − 1 = −1 and 𝔽𝐸𝑔 ≥ 0 for all 𝑔 ∈ 𝐿𝑒

SLIDE 11

pseudo-distribution satisfies all local properties of ±𝟐 𝒐 claim suppose 𝑔 ≥ 0 is 𝑒/2-junta over ±1 𝑜 (depends on ≤ 𝑒/2 coordinates) then, 𝔽𝐸𝑔 ≥ 0 proof: 𝑔 has degree ≤ 𝑒/2  𝔽𝐸𝑔 = 𝔽𝐸 𝑔

2 ≥ 0

corollary for any set 𝑇 of ≤ 𝑒 coordinates, marginal 𝐸′ = 𝑦𝑇 𝐸 is actual distribution 𝐸′ 𝑦𝑇 =

𝑦 𝑜 ∖𝑇

𝐸 𝑦𝑇, 𝑦 𝑜 ∖𝑇 = 𝔽𝐸𝟐 𝑦𝑇 ≥ 0 𝑒-junta (also captured by LP methods, e.g., Sherali–Adams hierarchies … ) example: triangle inequalities over ±1 𝑜

𝔽𝐸 𝑦𝑗 − 𝑦𝑘

2 + 𝑦𝑘 − 𝑦𝑙 2 − 𝑦𝑗 − 𝑦𝑙 2 ≥ 0

SLIDE 12

conditioning pseudo-distributions claim ∀𝑗 ∈ 𝑜 , 𝜏 ∈ ±1 . 𝐸′ = 𝑦 ∣ 𝑦𝑘 = 𝜏 𝐸 is deg.- 𝑒 − 2 pseudo-distr. proof 𝐸′ 𝑦 =

1 ℙ𝐸 𝑦𝑘=𝜏 𝐸 𝑦 ⋅ 1 𝑦𝑘=𝜏

 𝔽𝐸′𝑔2 ∝ 𝔽𝐸1 𝑦𝑘=𝜏 𝑔2 = 𝔽𝐸 1 𝑦𝑘=𝜏 𝑔

≥ 0 deg 𝑔 ≤ (𝑒 − 2)/2 deg 𝟐 𝑦𝑘=𝜏 𝑔 ≤ 𝑒/2 (also captured by LP methods, e.g., Sherali–Adams hierarchies … )

SLIDE 13

pseudo-covariances are covariances of distributions over ℝ𝒐 claim there exists a (Gaussian) distr. 𝜊 over ℝ𝑜 such that 𝔽𝐸𝑦 = 𝔽 𝜊 and 𝔽𝐸𝑦𝑦𝑈 = 𝔽 𝜊𝜊𝑈 let 𝜈 = 𝔽𝐸𝑦 and 𝑁 = 𝔽𝐸 𝑦 − 𝜈 𝑦 − 𝜈 𝑈 choose 𝜊 to be Gaussian with mean 𝜈 and covariance 𝑁 matrix 𝑁 p.s.d. because 𝑤𝑈𝑁𝑤 = 𝔽𝐸 𝑤𝑈𝑦 2 ≥ 0 for all 𝑤 ∈ ℝ𝑜 consequence: 𝔽𝐸𝑟 = 𝔽 𝜊 𝑟 for every 𝑟 of deg. 2 square of linear form proof

SLIDE 14

claim for every univariate 𝑞 ≥ 0 over ℝ and every 𝑜-variate polynomial 𝑟 with deg 𝑞 ⋅ deg 𝑟 ≤ 𝑒, 𝔽𝐸𝑞 𝑟 𝑦 ≥ 0 enough to show: 𝑞 is sum of squares choose: minimizer 𝛽 of 𝑞 proof by induction on deg 𝑞 squares sum of squares by ind. hyp.

𝛽

𝑞 𝛽 ≥ 0

then: p= 𝑞 𝛽 + 𝑦 − 𝛽 2 ⋅ 𝑞′ for some polynomial 𝑄′ with deg 𝑞′ < deg 𝑞 ℝ pseudo-distr.’s satisfy (compositions of) low-deg. univariate properties useful class of non-local higher-deg. inequalities 𝑞

SLIDE 15

MAX CUT

given: deg.-𝑒 pseudo-distr. 𝐸 over ±1 𝑜, satisfies 𝑀𝐻 = 1 − 𝜁

𝑀𝐻 =

1 𝐹 𝐻 𝑗𝑘∈𝐹 𝐻 1 4 𝑦𝑗 − 𝑦𝑘 2

goal: find 𝑧 ∈ ±1 𝑜 with 𝑀𝐻 𝑧 ≥ 1 − 𝑃 𝜁 algorithm sample from Gaussian distr. 𝜊 over ℝ𝑜 with 𝔽 𝜊𝜊𝑈 = 𝔽𝐸 𝑦𝑦𝑈

utput 𝑧 = sgn 𝜊

analysis claim: ℙ𝐸 𝑦𝑗 ≠ 𝑦𝑘 = 1 − 𝜃 ⇒ ℙ 𝑧𝑗 ≠ 𝑧𝑘 ≥ 1 − 𝑃 𝜃 proof: 𝜊𝑗, 𝜊𝑘 satisfies −𝔽 𝜊𝑗𝜊𝑘 = − 𝔽𝐸𝑦𝑗𝑦𝑘 = 1 − 𝑃 𝜃 and 𝔽𝜊𝑗

2 = 𝔽𝜊𝑘 2 = 1

 (tedious calculation)  ℙ sgn 𝜊𝑗 ≠ sgn 𝜊𝑘 ≥ 1 − 𝑃 𝜃

[Goeman-Williamson]

SLIDE 16

low global correlation in (pseudo-)distributions claim ∀𝑠. ∃ deg.- 𝑒 − 2𝑠 pseudo-distribution 𝐸′, obtained by conditioning 𝐸, Avg𝑗,𝑘∈ 𝑜 𝐽𝐸′ 𝑦𝑗, 𝑦𝑘 ≤ 1/𝑠

[Barak-Raghavendra-S., Raghavendra-Tan]

proof potential Avg𝑗∈ 𝑜 𝐼 𝑦𝑗 ; greedily condition on variables to maximize potential decrease until global correlation is low mutual information: 𝐽 𝑦, 𝑧 = 𝐼 𝑦 − 𝐼 𝑦 𝑧 potential decrease ≥ Avg𝑗∈ 𝑜 𝐼 𝑦𝑗 − Avg𝑘∈ 𝑜 Avg𝑗∈ 𝑜 𝐼 𝑦𝑗 ∣ 𝑦𝑘 = Avg𝑗,𝑘∈ 𝑜 𝐽𝐸′ 𝑦𝑗, 𝑦𝑘 how often do we need to condition?  only need to condition ≤ 𝑠 times

SLIDE 17

MAX BISECTION

given: deg.-𝑒 pseudo-distr. 𝐸 over ±1 𝑜, satisfies 𝑀𝐻 = 1 − 𝜁, 𝑗 𝑦𝑗 = 0 goal: find 𝑧 ∈ ±1 𝑜 with 𝑀𝐻 𝑧 ≥ 1 − 𝑃 𝜁 and 𝑗 𝑧𝑗 = 0 𝑒 = 1/𝜁𝑃 1 algorithm let 𝐸′ be conditioning of 𝐸 with global correlation ≤ 𝜁𝑃 1 sample Gaussian 𝜊 with same deg.-2 moments as 𝐸′

utput 𝑧 with 𝑧𝑗 = sgn(𝜊𝑗 − 𝑢𝑗) (choose 𝑢𝑗 ∈ ℝ so that 𝔽 𝑧𝑗 =

𝔽𝐸𝑦𝑗) analysis almost as before: ℙ𝐸′ 𝑦𝑗 ≠ 𝑦𝑘 ≥ 1 − 𝜃 ⇒ ℙ 𝑧𝑗 ≠ 𝑧𝑘 ≥ 1 − 𝑃 𝜃

(𝑢𝑗 = 0 is worst case  same analysis as MAX CUT)

new: 𝐽 𝑦𝑗, 𝑦𝑘 ≤ 𝜁𝑃 1 ⇒ 𝔽𝑧𝑗𝑧𝑘 = 𝔽 𝑦𝑗𝑦𝑘 ± 𝜁𝑃(1)  𝔽 𝑗 𝑧𝑗 ≤ 𝔽 𝑗 𝑧𝑗 2 1/2 = 𝔽 𝑗 𝑦𝑗 2 1/2 + 𝜁𝑃 1 ⋅ 𝑜 = 𝜁𝑃 1 ⋅ 𝑜  get bisection 𝑧′ from 𝑧 by correcting 𝜁𝑃(1) fraction of vertices

[Raghavendra-Tan]

𝔽 𝑗 𝑦𝑗 2 = 0

SLIDE 18

Cargese Workshop, 2014

SUM-OF-SQUARES method and

approximation algorithms II

David Steurer

Cornell

SLIDE 19

sparse vector given: linear subspace 𝑉 ⊆ ℝ𝑜 (represented by some basis), parameter 𝑙 ∈ 𝑜 promise: ∃𝑤0 ∈ 𝑉 such that 𝑤0 is 𝑙-sparse (and 𝑤0 ∈ 0, ±1 𝑜) goal: find 𝑙-sparse vector 𝑤 ∈ 𝑉 efficient approximation algorithm for 𝑙 = Ω 𝑜 would be major step toward refuting Khot’s Unique Games Conjecture and improved guarantees for MAX CUT, VERTEX COVER, … planted / average-case version (benchmark for unsupervised learning tasks) subspace 𝑉 spanned by 𝑒 − 1 random vectors and some 𝑙-sparse vector 𝑤0 previous best algorithms only work for very sparse vectors

𝑙 𝑜 ≤ 1/ 𝑒

[Spielman-Wang-Wright, Demanet-Hand]

here: deg.-4 pseudo-distributions work for

𝑙 𝑜 = Ω 𝑜 up to 𝑒 ≤ 𝑃

𝑜

[Barak-Kelner-S.]

SLIDE 20

limitations of ℓ∞/ℓ1 (previous best algorithm; exact via linear programming) limitations of std. SDP relaxation for ℓ2/ℓ1 (best proxy for sparsity) analytical proxy for sparsity if vector 𝑤 is 𝑙-sparse then

𝑤 ∞ 𝑤 1 ≥ 1 𝑙 , 𝑤 2

𝑤 1

2 ≥

1 𝑙 , and 𝑤 4

𝑤 2

4 ≥

1 𝑙

(tight if 𝑤 ∈ 0, ±1 𝑜)

𝑤 = sum of 𝑒 random ±1 vectors with same first coordinate ‖ ‖ 𝑤 ∞ ≥ 𝑒 , ‖ ‖ 𝑤 1 ≤ 𝑒 + 𝑜 𝑒  ratio ≈

𝑒 𝑜

 ℓ∞/ℓ1 algorithm fails for

𝑙 𝑜 ≥ 1 𝑒

“ideal object”: distribution 𝐸 over ℓ2 unit sphere of subspace 𝑉 ℓ1-constraint: 𝔽𝐸 𝑤 1

2 ≤ 𝑙

tractable relaxation: 𝑗,𝑘 𝔽𝐸𝑤𝑗𝑤𝑘 ≤ 𝑙

not a low-deg. polynomial in 𝑤  unclear how to represent (also NP-hard in worst-case)

[d'Aspremont-El Ghaoui-Jordan-Lanckriet]

but: for uniform distr. 𝐸 over ℓ2 sphere of 𝑒-dim. rand. subspace 𝑗,𝑘 𝔽𝐸𝑤𝑗𝑤𝑘 ≈

𝑜 𝑒  same limitation as ℓ∞/ℓ1

SLIDE 21

deg.-𝑒 pseudo-distr. 𝐸: 𝑤 ∈ 𝑉; 𝑤 2 = 1 → ℝ over unit ℓ2-sphere of 𝑉 degree-𝒆 SOS relaxation for ℓ4/ℓ2 pseudo-distribution satisfies 𝑤 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ 𝑤∈𝑉;

𝑤 =1

𝐸 ⋅ 𝑔 (only consider polynomials  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: 𝔽𝐸ℎ(𝑤)2 ≥ 0 for every ℎ of deg. ≤ 𝑒/2

rthogonality:

𝔽𝐸 𝑤 4

4 − 1 𝑙 ⋅ 𝑕(𝑤) = 0 for every 𝑕 of deg. ≤ 𝑒 − 4

SLIDE 22

set of deg.-𝑒 pseudo-distributions = convex set with 𝑜𝑃 𝑒 -time separation oracle separation problem given: function 𝐸 (represented as deg.-𝑒 polynomial) check: quadratic form 𝑔 ↦ 𝔽𝐸𝑔2 is p.s.d. or output violated constraint 𝔽𝐸𝑔2 < 0 how to find pseudo-distributions? deg.-𝑒 pseudo-distr. 𝐸: 𝑤 ∈ 𝑉; 𝑤 2 = 1 → ℝ over unit ℓ2-sphere of 𝑉 degree-𝒆 SOS relaxation for ℓ4/ℓ2 pseudo-distribution satisfies 𝑤 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ 𝑤∈𝑉;

𝑤 =1

𝐸 ⋅ 𝑔 (only consider polynomials  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: 𝔽𝐸ℎ(𝑤)2 ≥ 0 for every ℎ of deg. ≤ 𝑒/2

rthogonality:

𝔽𝐸 𝑤 4

4 − 1 𝑙 ⋅ 𝑕(𝑤) = 0 for every 𝑕 of deg. ≤ 𝑒 − 4

SLIDE 23

rule of thumb: set of deg.-𝑒 pseudo-moments 𝔽𝐸𝑔 ∣ deg 𝑔 ≤ 𝑒 difficult* to distinguish / separate from deg.-𝑒 moments of actual distr. of solutions

(* unless you invest 𝑜Ω 𝑒 time to distinguish)

also: values 𝔽𝐸𝑔 ∣ deg 𝑔 > 𝑒 do not carry additional information  no need to look at them how to use pseudo-distributions? deg.-𝑒 pseudo-distr. 𝐸: 𝑤 ∈ 𝑉; 𝑤 2 = 1 → ℝ over unit ℓ2-sphere of 𝑉 degree-𝒆 SOS relaxation for ℓ4/ℓ2 pseudo-distribution satisfies 𝑤 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ 𝑤∈𝑉;

𝑤 =1

𝐸 ⋅ 𝑔 (only consider polynomials  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: 𝔽𝐸ℎ(𝑤)2 ≥ 0 for every ℎ of deg. ≤ 𝑒/2

rthogonality:

𝔽𝐸 𝑤 4

4 − 1 𝑙 ⋅ 𝑕(𝑤) = 0 for every 𝑕 of deg. ≤ 𝑒 − 4

SLIDE 24

dual view (SOS certificates) 𝑤 4

4 − 1 𝑙 ⋅ 𝑕 + 𝑘 ℎ𝑘 2 = −1 over 𝑤 ∈ 𝑉; 𝑤 2 = 1

for some 𝑕 of deg. ≤ 𝑒 − 4 and {ℎ𝑘} of deg. ≤ 𝑒/2

⇔ no deg.-𝑒 pseudo-distr. exists ( no solution exists)

for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate directly) deg.-𝑒 pseudo-distr. 𝐸: 𝑤 ∈ 𝑉; 𝑤 2 = 1 → ℝ over unit ℓ2-sphere of 𝑉 degree-𝒆 SOS relaxation for ℓ4/ℓ2 pseudo-distribution satisfies 𝑤 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ 𝑤∈𝑉;

𝑤 =1

𝐸 ⋅ 𝑔 (only consider polynomials  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: 𝔽𝐸ℎ(𝑤)2 ≥ 0 for every ℎ of deg. ≤ 𝑒/2

rthogonality:

𝔽𝐸 𝑤 4

4 − 1 𝑙 ⋅ 𝑕(𝑤) = 0 for every 𝑕 of deg. ≤ 𝑒 − 4

SLIDE 25

Cauchy–Schwarz inequality Hölder’s inequality ℓ4-triangle inequality 𝔽𝐸 𝑣, 𝑤 ≤ 𝔽𝐸 𝑣 2 1/2 𝔽𝐸 𝑤 2 1/2 let 𝐸 = 𝑣, 𝑤 be a deg.-4 pseudo-distribution over ℝ𝑜 × ℝ𝑜 𝔽𝐸 𝑗 𝑣𝑗

3 ⋅ 𝑤𝑗 ≤

𝔽𝐸 𝑣 4

4 3/4

𝔽𝐸 𝑤 4

4 1/4

𝔽𝐸 𝑣 + 𝑤 4

4 ≤

𝔽𝐸 𝑣 4

4 1/4 +

𝔽𝐸 𝑤 4

4 1/4

following inequalities hold as expected (same as for distributions) general properties of pseudo-distributions

SLIDE 26

claim let 𝑉′ ⊆ ℝ𝑜 be a random 𝑒-dim. subspace with 𝑒 ≪ 𝑜 let 𝑄′ be the orthogonal projector into 𝑉′ then w.h.p, 𝑄′𝑤 4

4 = 𝑃 1 𝑜

𝑤 2

4 − 𝑘 ℎ𝑘 𝑤 2 over 𝑤 ∈ ℝ𝑜 for ℎ𝑘’s of deg. 4

[Barak-Brandao-Harrow-Kelner-S.-Zhou]

proof sketch (SOS certificate for classical inequality 𝑄′𝑤 4

4 ≤ 𝑃 1 𝑜

𝑤 2

basis change: let 𝑦 = 𝐶𝑈𝑤 where 𝐶’s columns are orthonormal basis of 𝑉 (so that 𝑄′ = 𝐶𝐶𝑈)  𝑄′𝑤 4

4 = 1 𝑜2 𝑗 𝑐𝑗, 𝑦 4 with 𝑐1, … , 𝑐𝑜 close to i.i.d.

standard Gaussian vectors (so that 𝔽𝑐 𝑐, 𝑦 2 = 𝑦 2

2 and 𝔽𝑐 𝑐, 𝑦 4 = 3 ⋅

𝑦 2

enough to show:

1 𝑜 𝑗=1 𝑜

𝑐𝑗, 𝑦 4 = 𝑃 1 ⋅ 𝔽𝑐 𝑐, 𝑦 4 − 𝑘 ℎ𝑘

′ 𝑦 2

reduce to deg. 2:

1 𝑜 𝑗=1 𝑜

𝑐𝑗

⊗2, 𝑧 2 ≤ 𝑃 1 ⋅ 𝔽𝑐 𝑐⊗2, 𝑧 2 (𝑧 = 𝑦⊗2)

 use concentration inequalities for quadratic forms (aka matrices)

SLIDE 27

given: some basis of subspace 𝑉 = span 𝑉′ ∪ 𝑤0 ⊆ ℝ𝑜, where 𝑉′ ⊆ ℝ𝑜 random 𝑒-dim. subspace, and 𝑤0 ∈ ℝ𝑜 with 𝑤0 ⊥ 𝑉′, 𝑤0 4

4 = 1 𝑙, and 𝑤0 2 4 = 1 (e.g., 𝑙-sparse)

approximation algorithm for planted sparse vector compute deg.-4 pseudo-distr. 𝐸 = {𝑤} over unit ball of 𝑉 satisfying 𝑤 4

4 = 1 𝑙

goal: find unit vector 𝑥 with 𝑥, 𝑤0 2 ≥ 1 − 𝑃 𝑙/𝑜 1/4 algorithm sample Gaussian distr. 𝑥 with 𝔽 𝑥𝑥𝑈 = 𝔽𝐸𝑤𝑤𝑈 and renormalize analysis claim: 𝔽𝐸 𝑤, 𝑤0 2 ≥ 1 − 𝑃 𝑙/𝑜 1/4 ( Gaussian 𝑥 almost 1-dim.)

SLIDE 28

analysis claim: 𝔽𝐸 𝑤, 𝑤0 2 ≥ 1 − 𝑃 𝑙/𝑜 1/4 ( Gaussian 𝑥 almost 1-dim.)

1 𝑙1/4 =

𝔽𝐸 𝑤 4

4 1/4

(𝐸 satisfies ‖ ‖ 𝑤 4

4 =

1 𝑙 )

= 𝔽𝐸 𝑤, 𝑤0 𝑤0 + 𝑄′𝑤 4

4 1/4

(same function)

≤ 𝔽𝐸 𝑤, 𝑤0 𝑤0 4

4 1 4 +

𝔽𝐸 𝑄′𝑤 4

4 1 4

(ℓ4-triangle inequ.)

≤

1 𝑙

1 4 ⋅

𝔽𝐸 𝑤, 𝑤0 4

1 4 + 𝑃 1 𝑜1/4

(SOS cert. for 𝑉′)

 𝔽𝐸 𝑤, 𝑤0 4 ≥ 1 − 𝑃 𝑙/𝑜 1/4  𝔽𝐸 𝑤, 𝑤0 2 ≥ 1 − 𝑃 𝑙/𝑜 1/4

(because 𝑤, 𝑤0 4 = 1 − 𝑄′𝑤 2

𝑤, 𝑤0 2)

given: some basis of subspace 𝑉 = span 𝑉′ ∪ 𝑤0 ⊆ ℝ𝑜, where 𝑉′ ⊆ ℝ𝑜 random 𝑒-dim. subspace, and 𝑤0 ∈ ℝ𝑜 with 𝑤0 ⊥ 𝑉′, 𝑤0 4

4 = 1 𝑙, and 𝑤0 2 4 = 1 (e.g., 𝑙-sparse)

approximation algorithm for planted sparse vector goal: find unit vector 𝑥 with 𝑥, 𝑤0 2 ≥ 1 − 𝑃 𝑙/𝑜 1/4

SLIDE 29

Cauchy–Schwarz inequality Hölder’s inequality ℓ4-triangle inequality 𝔽𝐸 𝑣, 𝑤 ≤ 𝔽𝐸 𝑣 2 1/2 𝔽𝐸 𝑤 2 1/2 let 𝐸 = 𝑣, 𝑤 be a deg.-4 pseudo-distribution over ℝ𝑜 × ℝ𝑜 𝔽𝐸 𝑗 𝑣𝑗

3 ⋅ 𝑤𝑗 ≤

𝔽𝐸 𝑣 4

4 3/4

𝔽𝐸 𝑤 4

4 1/4

𝔽𝐸 𝑣 + 𝑤 4

4 ≤

𝔽𝐸 𝑣 4

4 1/4 +

𝔽𝐸 𝑤 4

4 1/4

following inequalities hold as expected (same as for distributions) general properties of pseudo-distributions

SLIDE 30

products of pseudo-distributions claim suppose 𝐸, 𝐸′: Ω → ℝ is deg.-𝑒 pseudo-distr. over Ω then, 𝐸 ⊗ 𝐸′: Ω × Ω → ℝ is deg.-𝑒 pseudo-distr. over Ω × Ω proof tensor products of positive semidefinite matrices are positive semidefinite

SLIDE 31

Cauchy–Schwarz inequality 𝔽𝐸 𝑣, 𝑤 ≤ 𝔽𝐸 𝑣 2

2 1/2

𝔽𝐸 𝑤 2

2 1/2

let 𝐸 = 𝑣, 𝑤 be a deg.-2 pseudo-distribution over ℝ𝑜 × ℝ𝑜 𝔽𝐸 𝑣, 𝑤

= 𝔽𝐸⊗𝐸 𝑣, 𝑤 𝑣′, 𝑤′

(𝐸 ⊗ 𝐸′ is product pseudo-distr.)

= 𝔽𝐸⊗𝐸 𝑗𝑘 𝑣𝑗𝑤𝑗𝑣𝑘

′𝑤𝑘 ′

≤

1 2

𝔽𝐸⊗𝐸 𝑗𝑘 𝑣𝑗

2 𝑤𝑘 ′ 2 + 𝑗𝑘 𝑣𝑘 ′ 2𝑤𝑗 2

(2𝑏𝑐 = 𝑏2 + 𝑐2 − 𝑏 − 𝑐 2)

=

1 2

𝔽𝐸⊗𝐸 𝑣 2

2 𝑤′ 2 2 + 𝑣′ 2 2 𝑤 2 2

= 𝔽𝐸 𝑣 2

2 ⋅

𝔽𝐸 𝑤 2

(𝐸 ⊗ 𝐸′ is product pseudo-distr.)

proof

SLIDE 32

let 𝐸 = 𝑣, 𝑤 be a deg.-4 pseudo-distribution over ℝ𝑜 × ℝ𝑜 proof Hölder’s inequality 𝔽𝐸 𝑗 𝑣𝑗

3 ⋅ 𝑤𝑗 ≤

𝔽𝐸 𝑣 4

4 3/4

𝔽𝐸 𝑤 4

4 1/4

𝔽𝐸 𝑗 𝑣𝑗

3 ⋅ 𝑤𝑗

≤ 𝔽𝐸 𝑗 𝑣𝑗

4 1 2 ⋅

𝔽𝐸 𝑗 𝑣𝑗

2 ⋅ 𝑤𝑗 2 1/2

(Cauchy-Schwarz)

≤ 𝔽𝐸 𝑗 𝑣𝑗

4 1 2 ⋅

𝔽𝐸 𝑗 𝑣𝑗

4 ⋅

𝔽𝐸 𝑗 𝑤𝑗

4 1/4

(Cauchy-Schwarz)

we also used: {𝑣, 𝑤} deg-4 pseudo-distr.  𝑣 ⊗ 𝑣, 𝑣 ⊗ 𝑤 deg.-2 pseudo-distr. (every deg.-2 poly. in 𝑣 ⊗ 𝑣, 𝑣 ⊗ 𝑤 is deg.-4 poly. in 𝑣, 𝑤 )

SLIDE 33

let 𝐸 = 𝑣, 𝑤 be a deg.-4 pseudo-distribution over ℝ𝑜 × ℝ𝑜 ℓ4-triangle inequality 𝔽𝐸 𝑣 + 𝑤 4

4 1/4 ≤

𝔽𝐸 𝑣 4

4 1/4 +

𝔽𝐸 𝑤 4

4 1/4

proof expand 𝑣 + 𝑤 4

4 in terms of 𝑗 𝑣𝑗 4, 𝑗 𝑣𝑗 3𝑤𝑗, 𝑗 𝑣𝑗 2𝑤𝑗 2, 𝑗 𝑣𝑗𝑤𝑗 3, 𝑗 𝑤𝑗 4

bound pseudo-expect. of “mixed terms” using Cauchy-Schwarz / Hölder check that total is equal to right-hand side

SLIDE 34

tensor decomposition given: tensor 𝑈 ≈ 𝑗 𝑏𝑗

⊗4 (in spectral norm) for nice 𝑏1, … , 𝑏𝑛 ∈ ℝ𝑜

goal: find set of vectors 𝐶 ≈ ±𝑏1, … , ±𝑏𝑛 for simplicity: orthonormal and 𝑛 = 𝑜 approach show “uniqueness”: 𝑗 𝑏𝑗

⊗4 ≈ 𝑗 𝑐𝑗 ⊗4 ⇒ ±𝑏1, … , ±𝑏𝑛 ≈ ±𝑐1, … , ±𝑐𝑛

show that uniqueness proof translates to SOS certificate  any pseudo-distribution over decomposition is “concentrated” on unique decomposition ±𝑏1, … , ±𝑏𝑛  recover decomposition by reweighing pseudo-distribution by log 𝑜 degree polynomial (approximation to 𝜀 function)