approximation algorithms I David Steurer Cornell Cargese Workshop, - - PowerPoint PPT Presentation

β–Ά
approximation algorithms i
SMART_READER_LITE
LIVE PREVIEW

approximation algorithms I David Steurer Cornell Cargese Workshop, - - PowerPoint PPT Presentation

SUM - OF - SQUARES method and approximation algorithms I David Steurer Cornell Cargese Workshop, 2014 encoded as low-degree polynomial in meta-task 2 example: () = ,


slide-1
SLIDE 1

Cargese Workshop, 2014

SUM-OF-SQUARES method and

approximation algorithms I

David Steurer

Cornell

slide-2
SLIDE 2

meta-task given: functions 𝑔

1, … , 𝑔 𝑛: Β±1 π‘œ β†’ ℝ

find: solution 𝑦 ∈ Β±1 π‘œ to 𝑔

1 = 0, … , 𝑔 𝑛 = 0

encoded as low-degree polynomial in ℝ 𝑦

example: 𝑔(𝑦) = 𝑗,π‘˜βˆˆ π‘œ π‘₯π‘—π‘˜ β‹… 𝑦𝑗 βˆ’ π‘¦π‘˜

2

examples: combinatorial optimization problem on graph 𝐻

MAX CUT:

𝑀𝐻 = 1 βˆ’ 𝜁 over Β±1 π‘œ

MAX BISECTION:

𝑀𝐻 = 1 βˆ’ 𝜁, 𝑗 𝑦𝑗 = 0 over Β±1 π‘œ

where 1 βˆ’ 𝜁 is guess for optimum value

Laplacian 𝑀𝐻 =

1 𝐹 𝐻 π‘—π‘˜βˆˆπΉ 𝐻 1 4 𝑦𝑗 βˆ’ π‘¦π‘˜ 2

goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation (β€œon the edge intractability” οƒ  need strongest possible relaxations)

slide-3
SLIDE 3

price of convexity: individual solutions οƒ  distributions over solutions price of tractability: can only enforce β€œefficiently checkable knowledge” about solutions individual solutions distributions over solutions β€œpseudo-distributions over solutions”

(consistent with efficiently checkable knowledge)

given: functions 𝑔

1, … , 𝑔 𝑛: Β±1 π‘œ β†’ ℝ

find: solution 𝑦 ∈ Β±1 π‘œ to 𝑔

1 = 0, … , 𝑔 𝑛 = 0

meta-task goal: develop SDP-based algorithms with provable guarantees in terms of complexity and approximation

slide-4
SLIDE 4

distribution 𝐸 over Β±1 π‘œ function 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝐸 𝑦 β‰₯ 0 for all 𝑦 ∈ Β±1 π‘œ normalization: π‘¦βˆˆ Β±1 𝐸 𝑦 = 1 distribution 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: Β±1 π‘œ β†’ ℝ

𝔽𝐸𝑔

1 2 + β‹― + 𝑔 𝑛 2 = 0

(equivalently: ℙ𝐸 βˆ€π‘—. 𝑔

𝑗 β‰  0 = 0)

examples uniform distribution: 𝐸 = 2βˆ’π‘œ fixed 2-bit parity: 𝐸 𝑦 = (1 + 𝑦1𝑦2)/2π‘œ examples fixed 2-bit parity distribution satisfies 𝑦1𝑦2 = 1 uniform distribution does not satisfy 𝑔 = 0 for any 𝑔 β‰  0

convex: 𝐸, 𝐸′ satisfy conditions οƒ  𝐸 + 𝐸′ /2 satisfies conditions

# function values is exponential οƒ  need careful representation # independent inequalities is exponential οƒ  not efficiently checkable

slide-5
SLIDE 5

distribution 𝐸 over Β±1 π‘œ function 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝐸 𝑦 β‰₯ 0 for all 𝑦 ∈ Β±1 π‘œ normalization: π‘¦βˆˆ Β±1 𝐸 𝑦 = 1 distribution 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: Β±1 π‘œ β†’ ℝ

𝔽𝐸𝑔

1 2 + β‹― + 𝑔 𝑛 2 = 0

(equivalently: ℙ𝐸 βˆ€π‘—. 𝑔

𝑗 β‰  0 = 0)

deg.-𝑒 pseudo-distribution 𝐸 π‘¦βˆˆ Β±1 π‘œ 𝐸 𝑦 𝑔 𝑦 2 β‰₯ 0 for every deg.-𝑒/2 polynomial 𝑔 convenient notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 β€œpseudo-expectation of 𝑔 under 𝐸” 𝔽𝐸 deg.-2π‘œ pseudo-distributions are actual distributions (point-indicators 𝟐 𝑦 have deg. π‘œ οƒ  𝐸 𝑦 = π”½πΈπŸ 𝑦

2

β‰₯ 0) pseudo-

slide-6
SLIDE 6

deg.-𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝔽𝐸𝑔2 β‰₯ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: Β±1 π‘œ β†’ ℝ

𝔽𝐸𝑔

1 2 + β‹― + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œpseudo-expectation of 𝑔 under 𝐸”

slide-7
SLIDE 7

deg.-𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝔽𝐸𝑔2 β‰₯ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: Β±1 π‘œ β†’ ℝ

𝔽𝐸𝑔

1 2 + β‹― + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œpseudo-expectation of 𝑔 under 𝐸” claim: can compute such 𝐸 in time π‘œπ‘ƒ(𝑒) if it exists (otherwise, certify that no

solution to original problem exists)

(can assume 𝐸 is deg.-𝑒 polynomial οƒ  separation problem min

𝑔

𝔽𝐸𝑔2 is π‘œπ‘’-

  • dim. eigenvalue prob. οƒ  π‘œπ‘ƒ(𝑒)-time via grad. descent / ellipsoid method)

[Shor, Parrilo, Lasserre]

slide-8
SLIDE 8

surprising property: 𝔽𝐸𝑔 β‰₯ 0 for many* low-degree polynomials 𝑔 such that 𝑔 β‰₯ 0 follows from 𝑔

1 = 0, … , 𝑔 𝑛 = 0 by β€œexplicit proof”

soon: examples of such properties and how to exploit them deg.-𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝔽𝐸𝑔2 β‰₯ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: Β±1 π‘œ β†’ ℝ

𝔽𝐸𝑔

1 2 + β‹― + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œpseudo-expectation of 𝑔 under 𝐸”

slide-9
SLIDE 9

surprising property: 𝔽𝐸𝑔 β‰₯ 0 for many* low-degree polynomials 𝑔 such that 𝑔 β‰₯ 0 follows from 𝑔

1 = 0, … , 𝑔 𝑛 = 0 by β€œexplicit proof”

deg.-𝑒 pseudo-distr. 𝐸: Β±1 π‘œ β†’ ℝ non-negativity: 𝔽𝐸𝑔2 β‰₯ 0 for every deg.-𝑒/2 poly. 𝑔 normalization: 𝔽𝐸1 = 1 pseudo-distr. 𝐸 satisfies 𝑔

1 = 0, … , 𝑔 𝑛 = 0 for some 𝑔 𝑗: Β±1 π‘œ β†’ ℝ

𝔽𝐸𝑔

1 2 + β‹― + 𝑔 𝑛 2 = 0

(equivalently: 𝔽𝐸𝑔

𝑗 β‹… 𝑕 = 0 whenever deg 𝑕 ≀ 𝑒 βˆ’ deg 𝑔 𝑗)

notation: 𝔽𝐸𝑔 ≔ 𝑦 𝐸 𝑦 𝑔 𝑦 , β€œpseudo-expectation of 𝑔 under 𝐸” soon: examples of such properties and how to exploit them emerging algorithm-design paradigm: analyze algorithm pretending that underlying actual distribution exists; verify only afterwards that low-deg. pseudo-distr.’s satisfy required properties

pseudo-distr. over

  • ptimal solutions

approximate solution (to original problem) efficient algorithm deg.-𝑒 part of actual distr.

  • ver optimal solutions

π‘œπ‘(𝑒)-time algorithms cannot* distinguish between deg.-𝑒 pseudo-distributions and deg.-𝑒 part of actual distr.’s

slide-10
SLIDE 10

dual view (sum-of-squares proof system)

either βˆƒ deg.-𝑒 pseudo-distribution 𝐸 over Β±1 π‘œ satisfying 𝑔

1 = 0, … , 𝑔 𝑛 = 0

  • r

βˆƒ 𝑕1, … , 𝑕𝑛 and β„Ž1, … , β„Žπ‘™ such that 𝑗 𝑔

𝑗 β‹… 𝑕𝑗 + π‘˜ β„Žπ‘˜ 2 = βˆ’1 over Β±1 π‘œ

and deg 𝑔

𝑗 + deg 𝑕𝑗 ≀ 𝑒 and deg β„Žπ‘— ≀ 𝑒/2

derivation of unsatisfiable constraint βˆ’1 β‰₯ 0 from 𝑔

1 = 0, … , 𝑔 𝑛 = 0 over Β±1 π‘œ

βˆ’1

𝐸

𝑔 𝑔

1

𝑔

2

𝑔

𝑛

𝐿𝑒 = 𝑔 = 𝑗 𝑔

𝑗 β‹… 𝑕𝑗 + π‘˜ β„Žπ‘˜ 2

𝐿𝑒 if βˆ’1 βˆ‰ 𝐿𝑒 then βˆƒ separating hyperplane 𝐸 with 𝔽𝐸 βˆ’ 1 = βˆ’1 and 𝔽𝐸𝑔 β‰₯ 0 for all 𝑔 ∈ 𝐿𝑒

slide-11
SLIDE 11

pseudo-distribution satisfies all local properties of ±𝟐 𝒐 claim suppose 𝑔 β‰₯ 0 is 𝑒/2-junta over Β±1 π‘œ (depends on ≀ 𝑒/2 coordinates) then, 𝔽𝐸𝑔 β‰₯ 0 proof: 𝑔 has degree ≀ 𝑒/2 οƒ  𝔽𝐸𝑔 = 𝔽𝐸 𝑔

2 β‰₯ 0

corollary for any set 𝑇 of ≀ 𝑒 coordinates, marginal 𝐸′ = 𝑦𝑇 𝐸 is actual distribution 𝐸′ 𝑦𝑇 =

𝑦 π‘œ βˆ–π‘‡

𝐸 𝑦𝑇, 𝑦 π‘œ βˆ–π‘‡ = π”½πΈπŸ 𝑦𝑇 β‰₯ 0 𝑒-junta (also captured by LP methods, e.g., Sherali–Adams hierarchies … ) example: triangle inequalities over Β±1 π‘œ

𝔽𝐸 𝑦𝑗 βˆ’ π‘¦π‘˜

2 + π‘¦π‘˜ βˆ’ 𝑦𝑙 2 βˆ’ 𝑦𝑗 βˆ’ 𝑦𝑙 2 β‰₯ 0

slide-12
SLIDE 12

conditioning pseudo-distributions claim βˆ€π‘— ∈ π‘œ , 𝜏 ∈ Β±1 . 𝐸′ = 𝑦 ∣ π‘¦π‘˜ = 𝜏 𝐸 is deg.- 𝑒 βˆ’ 2 pseudo-distr. proof 𝐸′ 𝑦 =

1 ℙ𝐸 π‘¦π‘˜=𝜏 𝐸 𝑦 β‹… 1 π‘¦π‘˜=𝜏

οƒ  𝔽𝐸′𝑔2 ∝ 𝔽𝐸1 π‘¦π‘˜=𝜏 𝑔2 = 𝔽𝐸 1 π‘¦π‘˜=𝜏 𝑔

2

β‰₯ 0 deg 𝑔 ≀ (𝑒 βˆ’ 2)/2 deg 𝟐 π‘¦π‘˜=𝜏 𝑔 ≀ 𝑒/2 (also captured by LP methods, e.g., Sherali–Adams hierarchies … )

slide-13
SLIDE 13

pseudo-covariances are covariances of distributions over ℝ𝒐 claim there exists a (Gaussian) distr. 𝜊 over β„π‘œ such that 𝔽𝐸𝑦 = 𝔽 𝜊 and π”½πΈπ‘¦π‘¦π‘ˆ = 𝔽 πœŠπœŠπ‘ˆ let 𝜈 = 𝔽𝐸𝑦 and 𝑁 = 𝔽𝐸 𝑦 βˆ’ 𝜈 𝑦 βˆ’ 𝜈 π‘ˆ choose 𝜊 to be Gaussian with mean 𝜈 and covariance 𝑁 matrix 𝑁 p.s.d. because π‘€π‘ˆπ‘π‘€ = 𝔽𝐸 π‘€π‘ˆπ‘¦ 2 β‰₯ 0 for all 𝑀 ∈ β„π‘œ consequence: π”½πΈπ‘Ÿ = 𝔽 𝜊 π‘Ÿ for every π‘Ÿ of deg. 2 square of linear form proof

slide-14
SLIDE 14

claim for every univariate π‘ž β‰₯ 0 over ℝ and every π‘œ-variate polynomial π‘Ÿ with deg π‘ž β‹… deg π‘Ÿ ≀ 𝑒, π”½πΈπ‘ž π‘Ÿ 𝑦 β‰₯ 0 enough to show: π‘ž is sum of squares choose: minimizer 𝛽 of π‘ž proof by induction on deg π‘ž squares sum of squares by ind. hyp.

𝛽

π‘ž 𝛽 β‰₯ 0

then: p= π‘ž 𝛽 + 𝑦 βˆ’ 𝛽 2 β‹… π‘žβ€² for some polynomial 𝑄′ with deg π‘žβ€² < deg π‘ž ℝ pseudo-distr.’s satisfy (compositions of) low-deg. univariate properties useful class of non-local higher-deg. inequalities π‘ž

slide-15
SLIDE 15

MAX CUT

given: deg.-𝑒 pseudo-distr. 𝐸 over Β±1 π‘œ, satisfies 𝑀𝐻 = 1 βˆ’ 𝜁

𝑀𝐻 =

1 𝐹 𝐻 π‘—π‘˜βˆˆπΉ 𝐻 1 4 𝑦𝑗 βˆ’ π‘¦π‘˜ 2

goal: find 𝑧 ∈ Β±1 π‘œ with 𝑀𝐻 𝑧 β‰₯ 1 βˆ’ 𝑃 𝜁 algorithm sample from Gaussian distr. 𝜊 over β„π‘œ with 𝔽 πœŠπœŠπ‘ˆ = 𝔽𝐸 π‘¦π‘¦π‘ˆ

  • utput 𝑧 = sgn 𝜊

analysis claim: ℙ𝐸 𝑦𝑗 β‰  π‘¦π‘˜ = 1 βˆ’ πœƒ β‡’ β„™ 𝑧𝑗 β‰  π‘§π‘˜ β‰₯ 1 βˆ’ 𝑃 πœƒ proof: πœŠπ‘—, πœŠπ‘˜ satisfies βˆ’π”½ πœŠπ‘—πœŠπ‘˜ = βˆ’ π”½πΈπ‘¦π‘—π‘¦π‘˜ = 1 βˆ’ 𝑃 πœƒ and π”½πœŠπ‘—

2 = π”½πœŠπ‘˜ 2 = 1

οƒ  (tedious calculation) οƒ  β„™ sgn πœŠπ‘— β‰  sgn πœŠπ‘˜ β‰₯ 1 βˆ’ 𝑃 πœƒ

[Goeman-Williamson]

slide-16
SLIDE 16

low global correlation in (pseudo-)distributions claim βˆ€π‘ . βˆƒ deg.- 𝑒 βˆ’ 2𝑠 pseudo-distribution 𝐸′, obtained by conditioning 𝐸, Avg𝑗,π‘˜βˆˆ π‘œ 𝐽𝐸′ 𝑦𝑗, π‘¦π‘˜ ≀ 1/𝑠

[Barak-Raghavendra-S., Raghavendra-Tan]

proof potential Avgπ‘—βˆˆ π‘œ 𝐼 𝑦𝑗 ; greedily condition on variables to maximize potential decrease until global correlation is low mutual information: 𝐽 𝑦, 𝑧 = 𝐼 𝑦 βˆ’ 𝐼 𝑦 𝑧 potential decrease β‰₯ Avgπ‘—βˆˆ π‘œ 𝐼 𝑦𝑗 βˆ’ Avgπ‘˜βˆˆ π‘œ Avgπ‘—βˆˆ π‘œ 𝐼 𝑦𝑗 ∣ π‘¦π‘˜ = Avg𝑗,π‘˜βˆˆ π‘œ 𝐽𝐸′ 𝑦𝑗, π‘¦π‘˜ how often do we need to condition? οƒ  only need to condition ≀ 𝑠 times

slide-17
SLIDE 17

MAX BISECTION

given: deg.-𝑒 pseudo-distr. 𝐸 over Β±1 π‘œ, satisfies 𝑀𝐻 = 1 βˆ’ 𝜁, 𝑗 𝑦𝑗 = 0 goal: find 𝑧 ∈ Β±1 π‘œ with 𝑀𝐻 𝑧 β‰₯ 1 βˆ’ 𝑃 𝜁 and 𝑗 𝑧𝑗 = 0 𝑒 = 1/πœπ‘ƒ 1 algorithm let 𝐸′ be conditioning of 𝐸 with global correlation ≀ πœπ‘ƒ 1 sample Gaussian 𝜊 with same deg.-2 moments as 𝐸′

  • utput 𝑧 with 𝑧𝑗 = sgn(πœŠπ‘— βˆ’ 𝑒𝑗) (choose 𝑒𝑗 ∈ ℝ so that 𝔽 𝑧𝑗 =

𝔽𝐸𝑦𝑗) analysis almost as before: ℙ𝐸′ 𝑦𝑗 β‰  π‘¦π‘˜ β‰₯ 1 βˆ’ πœƒ β‡’ β„™ 𝑧𝑗 β‰  π‘§π‘˜ β‰₯ 1 βˆ’ 𝑃 πœƒ

(𝑒𝑗 = 0 is worst case οƒ  same analysis as MAX CUT)

new: 𝐽 𝑦𝑗, π‘¦π‘˜ ≀ πœπ‘ƒ 1 β‡’ π”½π‘§π‘—π‘§π‘˜ = 𝔽 π‘¦π‘—π‘¦π‘˜ Β± πœπ‘ƒ(1) οƒ  𝔽 𝑗 𝑧𝑗 ≀ 𝔽 𝑗 𝑧𝑗 2 1/2 = 𝔽 𝑗 𝑦𝑗 2 1/2 + πœπ‘ƒ 1 β‹… π‘œ = πœπ‘ƒ 1 β‹… π‘œ οƒ  get bisection 𝑧′ from 𝑧 by correcting πœπ‘ƒ(1) fraction of vertices

[Raghavendra-Tan]

𝔽 𝑗 𝑦𝑗 2 = 0

slide-18
SLIDE 18

Cargese Workshop, 2014

SUM-OF-SQUARES method and

approximation algorithms II

David Steurer

Cornell

slide-19
SLIDE 19

sparse vector given: linear subspace 𝑉 βŠ† β„π‘œ (represented by some basis), parameter 𝑙 ∈ π‘œ promise: βˆƒπ‘€0 ∈ 𝑉 such that 𝑀0 is 𝑙-sparse (and 𝑀0 ∈ 0, Β±1 π‘œ) goal: find 𝑙-sparse vector 𝑀 ∈ 𝑉 efficient approximation algorithm for 𝑙 = Ξ© π‘œ would be major step toward refuting Khot’s Unique Games Conjecture and improved guarantees for MAX CUT, VERTEX COVER, … planted / average-case version (benchmark for unsupervised learning tasks) subspace 𝑉 spanned by 𝑒 βˆ’ 1 random vectors and some 𝑙-sparse vector 𝑀0 previous best algorithms only work for very sparse vectors

𝑙 π‘œ ≀ 1/ 𝑒

[Spielman-Wang-Wright, Demanet-Hand]

here: deg.-4 pseudo-distributions work for

𝑙 π‘œ = Ξ© π‘œ up to 𝑒 ≀ 𝑃

π‘œ

[Barak-Kelner-S.]

slide-20
SLIDE 20

limitations of β„“βˆž/β„“1 (previous best algorithm; exact via linear programming) limitations of std. SDP relaxation for β„“2/β„“1 (best proxy for sparsity) analytical proxy for sparsity if vector 𝑀 is 𝑙-sparse then

𝑀 ∞ 𝑀 1 β‰₯ 1 𝑙 , 𝑀 2

2

𝑀 1

2 β‰₯

1 𝑙 , and 𝑀 4

4

𝑀 2

4 β‰₯

1 𝑙

(tight if 𝑀 ∈ 0, Β±1 π‘œ)

𝑀 = sum of 𝑒 random Β±1 vectors with same first coordinate β€– β€– 𝑀 ∞ β‰₯ 𝑒 , β€– β€– 𝑀 1 ≀ 𝑒 + π‘œ 𝑒 οƒ  ratio β‰ˆ

𝑒 π‘œ

οƒ  β„“βˆž/β„“1 algorithm fails for

𝑙 π‘œ β‰₯ 1 𝑒

β€œideal object”: distribution 𝐸 over β„“2 unit sphere of subspace 𝑉 β„“1-constraint: 𝔽𝐸 𝑀 1

2 ≀ 𝑙

tractable relaxation: 𝑗,π‘˜ π”½πΈπ‘€π‘—π‘€π‘˜ ≀ 𝑙

not a low-deg. polynomial in 𝑀 οƒ  unclear how to represent (also NP-hard in worst-case)

[d'Aspremont-El Ghaoui-Jordan-Lanckriet]

but: for uniform distr. 𝐸 over β„“2 sphere of 𝑒-dim. rand. subspace 𝑗,π‘˜ π”½πΈπ‘€π‘—π‘€π‘˜ β‰ˆ

π‘œ 𝑒 οƒ  same limitation as β„“βˆž/β„“1

slide-21
SLIDE 21

deg.-𝑒 pseudo-distr. 𝐸: 𝑀 ∈ 𝑉; 𝑀 2 = 1 β†’ ℝ over unit β„“2-sphere of 𝑉 degree-𝒆 SOS relaxation for β„“4/β„“2 pseudo-distribution satisfies 𝑀 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ π‘€βˆˆπ‘‰;

𝑀 =1

𝐸 β‹… 𝑔 (only consider polynomials οƒ  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: π”½πΈβ„Ž(𝑀)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑒/2

  • rthogonality:

𝔽𝐸 𝑀 4

4 βˆ’ 1 𝑙 β‹… 𝑕(𝑀) = 0 for every 𝑕 of deg. ≀ 𝑒 βˆ’ 4

slide-22
SLIDE 22

set of deg.-𝑒 pseudo-distributions = convex set with π‘œπ‘ƒ 𝑒 -time separation oracle separation problem given: function 𝐸 (represented as deg.-𝑒 polynomial) check: quadratic form 𝑔 ↦ 𝔽𝐸𝑔2 is p.s.d. or output violated constraint 𝔽𝐸𝑔2 < 0 how to find pseudo-distributions? deg.-𝑒 pseudo-distr. 𝐸: 𝑀 ∈ 𝑉; 𝑀 2 = 1 β†’ ℝ over unit β„“2-sphere of 𝑉 degree-𝒆 SOS relaxation for β„“4/β„“2 pseudo-distribution satisfies 𝑀 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ π‘€βˆˆπ‘‰;

𝑀 =1

𝐸 β‹… 𝑔 (only consider polynomials οƒ  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: π”½πΈβ„Ž(𝑀)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑒/2

  • rthogonality:

𝔽𝐸 𝑀 4

4 βˆ’ 1 𝑙 β‹… 𝑕(𝑀) = 0 for every 𝑕 of deg. ≀ 𝑒 βˆ’ 4

slide-23
SLIDE 23

rule of thumb: set of deg.-𝑒 pseudo-moments 𝔽𝐸𝑔 ∣ deg 𝑔 ≀ 𝑒 difficult* to distinguish / separate from deg.-𝑒 moments of actual distr. of solutions

(* unless you invest π‘œΞ© 𝑒 time to distinguish)

also: values 𝔽𝐸𝑔 ∣ deg 𝑔 > 𝑒 do not carry additional information οƒ  no need to look at them how to use pseudo-distributions? deg.-𝑒 pseudo-distr. 𝐸: 𝑀 ∈ 𝑉; 𝑀 2 = 1 β†’ ℝ over unit β„“2-sphere of 𝑉 degree-𝒆 SOS relaxation for β„“4/β„“2 pseudo-distribution satisfies 𝑀 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ π‘€βˆˆπ‘‰;

𝑀 =1

𝐸 β‹… 𝑔 (only consider polynomials οƒ  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: π”½πΈβ„Ž(𝑀)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑒/2

  • rthogonality:

𝔽𝐸 𝑀 4

4 βˆ’ 1 𝑙 β‹… 𝑕(𝑀) = 0 for every 𝑕 of deg. ≀ 𝑒 βˆ’ 4

slide-24
SLIDE 24

dual view (SOS certificates) 𝑀 4

4 βˆ’ 1 𝑙 β‹… 𝑕 + π‘˜ β„Žπ‘˜ 2 = βˆ’1 over 𝑀 ∈ 𝑉; 𝑀 2 = 1

for some 𝑕 of deg. ≀ 𝑒 βˆ’ 4 and {β„Žπ‘˜} of deg. ≀ 𝑒/2

⇔ no deg.-𝑒 pseudo-distr. exists (οƒ  no solution exists)

for approximation algorithms: need pseudo-distr. to extract approx. solution (hard to exploit non-existence of SOS certificate directly) deg.-𝑒 pseudo-distr. 𝐸: 𝑀 ∈ 𝑉; 𝑀 2 = 1 β†’ ℝ over unit β„“2-sphere of 𝑉 degree-𝒆 SOS relaxation for β„“4/β„“2 pseudo-distribution satisfies 𝑀 4

4 = 1/𝑙

notation: 𝔽𝐸𝑔 ≔ π‘€βˆˆπ‘‰;

𝑀 =1

𝐸 β‹… 𝑔 (only consider polynomials οƒ  easy to integrate) normalization: 𝔽𝐸1 = 1 non-negativity: π”½πΈβ„Ž(𝑀)2 β‰₯ 0 for every β„Ž of deg. ≀ 𝑒/2

  • rthogonality:

𝔽𝐸 𝑀 4

4 βˆ’ 1 𝑙 β‹… 𝑕(𝑀) = 0 for every 𝑕 of deg. ≀ 𝑒 βˆ’ 4

slide-25
SLIDE 25

Cauchy–Schwarz inequality HΓΆlder’s inequality β„“4-triangle inequality 𝔽𝐸 𝑣, 𝑀 ≀ 𝔽𝐸 𝑣 2 1/2 𝔽𝐸 𝑀 2 1/2 let 𝐸 = 𝑣, 𝑀 be a deg.-4 pseudo-distribution over β„π‘œ Γ— β„π‘œ 𝔽𝐸 𝑗 𝑣𝑗

3 β‹… 𝑀𝑗 ≀

𝔽𝐸 𝑣 4

4 3/4

𝔽𝐸 𝑀 4

4 1/4

𝔽𝐸 𝑣 + 𝑀 4

4 ≀

𝔽𝐸 𝑣 4

4 1/4 +

𝔽𝐸 𝑀 4

4 1/4

following inequalities hold as expected (same as for distributions) general properties of pseudo-distributions

slide-26
SLIDE 26

claim let 𝑉′ βŠ† β„π‘œ be a random 𝑒-dim. subspace with 𝑒 β‰ͺ π‘œ let 𝑄′ be the orthogonal projector into 𝑉′ then w.h.p, 𝑄′𝑀 4

4 = 𝑃 1 π‘œ

𝑀 2

4 βˆ’ π‘˜ β„Žπ‘˜ 𝑀 2 over 𝑀 ∈ β„π‘œ for β„Žπ‘˜β€™s of deg. 4

[Barak-Brandao-Harrow-Kelner-S.-Zhou]

proof sketch (SOS certificate for classical inequality 𝑄′𝑀 4

4 ≀ 𝑃 1 π‘œ

𝑀 2

4)

basis change: let 𝑦 = πΆπ‘ˆπ‘€ where 𝐢’s columns are orthonormal basis of 𝑉 (so that 𝑄′ = πΆπΆπ‘ˆ) οƒ  𝑄′𝑀 4

4 = 1 π‘œ2 𝑗 𝑐𝑗, 𝑦 4 with 𝑐1, … , π‘π‘œ close to i.i.d.

standard Gaussian vectors (so that 𝔽𝑐 𝑐, 𝑦 2 = 𝑦 2

2 and 𝔽𝑐 𝑐, 𝑦 4 = 3 β‹…

𝑦 2

4)

enough to show:

1 π‘œ 𝑗=1 π‘œ

𝑐𝑗, 𝑦 4 = 𝑃 1 β‹… 𝔽𝑐 𝑐, 𝑦 4 βˆ’ π‘˜ β„Žπ‘˜

β€² 𝑦 2

reduce to deg. 2:

1 π‘œ 𝑗=1 π‘œ

𝑐𝑗

βŠ—2, 𝑧 2 ≀ 𝑃 1 β‹… 𝔽𝑐 π‘βŠ—2, 𝑧 2 (𝑧 = π‘¦βŠ—2)

οƒ  use concentration inequalities for quadratic forms (aka matrices)

slide-27
SLIDE 27

given: some basis of subspace 𝑉 = span 𝑉′ βˆͺ 𝑀0 βŠ† β„π‘œ, where 𝑉′ βŠ† β„π‘œ random 𝑒-dim. subspace, and 𝑀0 ∈ β„π‘œ with 𝑀0 βŠ₯ 𝑉′, 𝑀0 4

4 = 1 𝑙, and 𝑀0 2 4 = 1 (e.g., 𝑙-sparse)

approximation algorithm for planted sparse vector compute deg.-4 pseudo-distr. 𝐸 = {𝑀} over unit ball of 𝑉 satisfying 𝑀 4

4 = 1 𝑙

goal: find unit vector π‘₯ with π‘₯, 𝑀0 2 β‰₯ 1 βˆ’ 𝑃 𝑙/π‘œ 1/4 algorithm sample Gaussian distr. π‘₯ with 𝔽 π‘₯π‘₯π‘ˆ = π”½πΈπ‘€π‘€π‘ˆ and renormalize analysis claim: 𝔽𝐸 𝑀, 𝑀0 2 β‰₯ 1 βˆ’ 𝑃 𝑙/π‘œ 1/4 (οƒ  Gaussian π‘₯ almost 1-dim.)

slide-28
SLIDE 28

analysis claim: 𝔽𝐸 𝑀, 𝑀0 2 β‰₯ 1 βˆ’ 𝑃 𝑙/π‘œ 1/4 (οƒ  Gaussian π‘₯ almost 1-dim.)

1 𝑙1/4 =

𝔽𝐸 𝑀 4

4 1/4

(𝐸 satisfies β€– β€– 𝑀 4

4 =

1 𝑙 )

= 𝔽𝐸 𝑀, 𝑀0 𝑀0 + 𝑄′𝑀 4

4 1/4

(same function)

≀ 𝔽𝐸 𝑀, 𝑀0 𝑀0 4

4 1 4 +

𝔽𝐸 𝑄′𝑀 4

4 1 4

(β„“4-triangle inequ.)

≀

1 𝑙

1 4 β‹…

𝔽𝐸 𝑀, 𝑀0 4

1 4 + 𝑃 1 π‘œ1/4

(SOS cert. for 𝑉′)

οƒ  𝔽𝐸 𝑀, 𝑀0 4 β‰₯ 1 βˆ’ 𝑃 𝑙/π‘œ 1/4 οƒ  𝔽𝐸 𝑀, 𝑀0 2 β‰₯ 1 βˆ’ 𝑃 𝑙/π‘œ 1/4

(because 𝑀, 𝑀0 4 = 1 βˆ’ 𝑄′𝑀 2

2

𝑀, 𝑀0 2)

given: some basis of subspace 𝑉 = span 𝑉′ βˆͺ 𝑀0 βŠ† β„π‘œ, where 𝑉′ βŠ† β„π‘œ random 𝑒-dim. subspace, and 𝑀0 ∈ β„π‘œ with 𝑀0 βŠ₯ 𝑉′, 𝑀0 4

4 = 1 𝑙, and 𝑀0 2 4 = 1 (e.g., 𝑙-sparse)

approximation algorithm for planted sparse vector goal: find unit vector π‘₯ with π‘₯, 𝑀0 2 β‰₯ 1 βˆ’ 𝑃 𝑙/π‘œ 1/4

slide-29
SLIDE 29

Cauchy–Schwarz inequality HΓΆlder’s inequality β„“4-triangle inequality 𝔽𝐸 𝑣, 𝑀 ≀ 𝔽𝐸 𝑣 2 1/2 𝔽𝐸 𝑀 2 1/2 let 𝐸 = 𝑣, 𝑀 be a deg.-4 pseudo-distribution over β„π‘œ Γ— β„π‘œ 𝔽𝐸 𝑗 𝑣𝑗

3 β‹… 𝑀𝑗 ≀

𝔽𝐸 𝑣 4

4 3/4

𝔽𝐸 𝑀 4

4 1/4

𝔽𝐸 𝑣 + 𝑀 4

4 ≀

𝔽𝐸 𝑣 4

4 1/4 +

𝔽𝐸 𝑀 4

4 1/4

following inequalities hold as expected (same as for distributions) general properties of pseudo-distributions

slide-30
SLIDE 30

products of pseudo-distributions claim suppose 𝐸, 𝐸′: Ξ© β†’ ℝ is deg.-𝑒 pseudo-distr. over Ξ© then, 𝐸 βŠ— 𝐸′: Ξ© Γ— Ξ© β†’ ℝ is deg.-𝑒 pseudo-distr. over Ξ© Γ— Ξ© proof tensor products of positive semidefinite matrices are positive semidefinite

slide-31
SLIDE 31

Cauchy–Schwarz inequality 𝔽𝐸 𝑣, 𝑀 ≀ 𝔽𝐸 𝑣 2

2 1/2

𝔽𝐸 𝑀 2

2 1/2

let 𝐸 = 𝑣, 𝑀 be a deg.-2 pseudo-distribution over β„π‘œ Γ— β„π‘œ 𝔽𝐸 𝑣, 𝑀

2

= π”½πΈβŠ—πΈ 𝑣, 𝑀 𝑣′, 𝑀′

(𝐸 βŠ— 𝐸′ is product pseudo-distr.)

= π”½πΈβŠ—πΈ π‘—π‘˜ π‘£π‘—π‘€π‘—π‘£π‘˜

β€²π‘€π‘˜ β€²

≀

1 2

π”½πΈβŠ—πΈ π‘—π‘˜ 𝑣𝑗

2 π‘€π‘˜ β€² 2 + π‘—π‘˜ π‘£π‘˜ β€² 2𝑀𝑗 2

(2𝑏𝑐 = 𝑏2 + 𝑐2 βˆ’ 𝑏 βˆ’ 𝑐 2)

=

1 2

π”½πΈβŠ—πΈ 𝑣 2

2 𝑀′ 2 2 + 𝑣′ 2 2 𝑀 2 2

= 𝔽𝐸 𝑣 2

2 β‹…

𝔽𝐸 𝑀 2

2

(𝐸 βŠ— 𝐸′ is product pseudo-distr.)

proof

slide-32
SLIDE 32

let 𝐸 = 𝑣, 𝑀 be a deg.-4 pseudo-distribution over β„π‘œ Γ— β„π‘œ proof HΓΆlder’s inequality 𝔽𝐸 𝑗 𝑣𝑗

3 β‹… 𝑀𝑗 ≀

𝔽𝐸 𝑣 4

4 3/4

𝔽𝐸 𝑀 4

4 1/4

𝔽𝐸 𝑗 𝑣𝑗

3 β‹… 𝑀𝑗

≀ 𝔽𝐸 𝑗 𝑣𝑗

4 1 2 β‹…

𝔽𝐸 𝑗 𝑣𝑗

2 β‹… 𝑀𝑗 2 1/2

(Cauchy-Schwarz)

≀ 𝔽𝐸 𝑗 𝑣𝑗

4 1 2 β‹…

𝔽𝐸 𝑗 𝑣𝑗

4 β‹…

𝔽𝐸 𝑗 𝑀𝑗

4 1/4

(Cauchy-Schwarz)

we also used: {𝑣, 𝑀} deg-4 pseudo-distr. οƒ  𝑣 βŠ— 𝑣, 𝑣 βŠ— 𝑀 deg.-2 pseudo-distr. (every deg.-2 poly. in 𝑣 βŠ— 𝑣, 𝑣 βŠ— 𝑀 is deg.-4 poly. in 𝑣, 𝑀 )

slide-33
SLIDE 33

let 𝐸 = 𝑣, 𝑀 be a deg.-4 pseudo-distribution over β„π‘œ Γ— β„π‘œ β„“4-triangle inequality 𝔽𝐸 𝑣 + 𝑀 4

4 1/4 ≀

𝔽𝐸 𝑣 4

4 1/4 +

𝔽𝐸 𝑀 4

4 1/4

proof expand 𝑣 + 𝑀 4

4 in terms of 𝑗 𝑣𝑗 4, 𝑗 𝑣𝑗 3𝑀𝑗, 𝑗 𝑣𝑗 2𝑀𝑗 2, 𝑗 𝑣𝑗𝑀𝑗 3, 𝑗 𝑀𝑗 4

bound pseudo-expect. of β€œmixed terms” using Cauchy-Schwarz / HΓΆlder check that total is equal to right-hand side

slide-34
SLIDE 34

tensor decomposition given: tensor π‘ˆ β‰ˆ 𝑗 𝑏𝑗

βŠ—4 (in spectral norm) for nice 𝑏1, … , 𝑏𝑛 ∈ β„π‘œ

goal: find set of vectors 𝐢 β‰ˆ ±𝑏1, … , ±𝑏𝑛 for simplicity: orthonormal and 𝑛 = π‘œ approach show β€œuniqueness”: 𝑗 𝑏𝑗

βŠ—4 β‰ˆ 𝑗 𝑐𝑗 βŠ—4 β‡’ ±𝑏1, … , ±𝑏𝑛 β‰ˆ ±𝑐1, … , ±𝑐𝑛

show that uniqueness proof translates to SOS certificate οƒ  any pseudo-distribution over decomposition is β€œconcentrated” on unique decomposition ±𝑏1, … , ±𝑏𝑛 οƒ  recover decomposition by reweighing pseudo-distribution by log π‘œ degree polynomial (approximation to πœ€ function)