Approximating the Permanent of Positive Semidefinite Matrices Nima - - PowerPoint PPT Presentation

approximating the permanent of positive semidefinite
SMART_READER_LITE
LIVE PREVIEW

Approximating the Permanent of Positive Semidefinite Matrices Nima - - PowerPoint PPT Presentation

Approximating the Permanent of Positive Semidefinite Matrices Nima Anari Joint work with Leonid Shayan Amin Gurvits Oveis Gharan Saberi 1 / 14 Example a b M c d det M ad bc per M ad bc Determinant Permanent det ( M )


slide-1
SLIDE 1

Approximating the Permanent of Positive Semidefinite Matrices

Nima Anari Joint work with Leonid Shayan Amin Gurvits Oveis Gharan Saberi

1 / 14

slide-2
SLIDE 2

Determinant

det(M) = ∑

σ∈Sn

sgn(σ)M1,σ(1) . . . Mn,σ(n)

Permanent

per(M) = ∑

σ∈Sn

M1,σ(1) . . . Mn,σ(n)

Example

M a b c d det M ad bc per M ad bc

2 / 14

slide-3
SLIDE 3

Determinant

det(M) = ∑

σ∈Sn

sgn(σ)M1,σ(1) . . . Mn,σ(n)

Permanent

per(M) = ∑

σ∈Sn

M1,σ(1) . . . Mn,σ(n)

2 × 2 Example

M = [ a b c d ] det(M) = ad − bc per(M) = ad + bc

2 / 14

slide-4
SLIDE 4

Complexity of Permanent

#P-hard to compute per(M) for 0/1 matrices [Valiant’79]. #P-hard to compute sign of per M [Aaronson’11]. #P-hard to compute per M for M [Grier-Schaefger’16].

3 / 14

slide-5
SLIDE 5

Complexity of Permanent

#P-hard to compute per(M) for 0/1 matrices [Valiant’79]. #P-hard to compute sign of per(M) [Aaronson’11]. #P-hard to compute per M for M [Grier-Schaefger’16].

3 / 14

slide-6
SLIDE 6

Complexity of Permanent

#P-hard to compute per(M) for 0/1 matrices [Valiant’79]. #P-hard to compute sign of per(M) [Aaronson’11]. #P-hard to compute per(M) for M ⪰ 0 [Grier-Schaefger’16].

  • 3 / 14
slide-7
SLIDE 7

Approximating the Permanent

Additive ±ϵ |M|n approximation [Gurvits’05].

Positive Matrices (M )

Permanent is always nonnegative: per M Randomized

  • approximation

(FRPAS) [Jerrum-Sinclair-Vigoda’04]. Deterministic

n-approximation

[Gurvits-Samorodnitsky’14].

PSD Matrices (M )

Permanent is always nonnegative: per M Deterministic n -approximation [Marcus’63]: M Mn n. Improved to

n k n k -approximation in

time

O k log n

[Lieb’66].

4 / 14

slide-8
SLIDE 8

Approximating the Permanent

Additive ±ϵ |M|n approximation [Gurvits’05].

Positive Matrices (M ≥ 0)

Permanent is always nonnegative: per(M) ≥ 0. Randomized

  • approximation

(FRPAS) [Jerrum-Sinclair-Vigoda’04]. Deterministic

n-approximation

[Gurvits-Samorodnitsky’14].

PSD Matrices (M )

Permanent is always nonnegative: per M Deterministic n -approximation [Marcus’63]: M Mn n. Improved to

n k n k -approximation in

time

O k log n

[Lieb’66].

4 / 14

slide-9
SLIDE 9

Approximating the Permanent

Additive ±ϵ |M|n approximation [Gurvits’05].

Positive Matrices (M ≥ 0)

Permanent is always nonnegative: per(M) ≥ 0. Randomized (1 + ϵ)-approximation (FRPAS) [Jerrum-Sinclair-Vigoda’04]. Deterministic

n-approximation

[Gurvits-Samorodnitsky’14].

PSD Matrices (M )

Permanent is always nonnegative: per M Deterministic n -approximation [Marcus’63]: M Mn n. Improved to

n k n k -approximation in

time

O k log n

[Lieb’66].

4 / 14

slide-10
SLIDE 10

Approximating the Permanent

Additive ±ϵ |M|n approximation [Gurvits’05].

Positive Matrices (M ≥ 0)

Permanent is always nonnegative: per(M) ≥ 0. Randomized (1 + ϵ)-approximation (FRPAS) [Jerrum-Sinclair-Vigoda’04]. Deterministic 2n-approximation [Gurvits-Samorodnitsky’14].

PSD Matrices (M ⪰ 0)

Permanent is always nonnegative: per(M) ≥ 0. Deterministic n -approximation [Marcus’63]: M Mn n. Improved to

n k n k -approximation in

time

O k log n

[Lieb’66].

4 / 14

slide-11
SLIDE 11

Approximating the Permanent

Additive ±ϵ |M|n approximation [Gurvits’05].

Positive Matrices (M ≥ 0)

Permanent is always nonnegative: per(M) ≥ 0. Randomized (1 + ϵ)-approximation (FRPAS) [Jerrum-Sinclair-Vigoda’04]. Deterministic 2n-approximation [Gurvits-Samorodnitsky’14].

PSD Matrices (M ⪰ 0)

Permanent is always nonnegative: per(M) ≥ 0. Deterministic n!-approximation [Marcus’63]: M1,1 . . . Mn,n. Improved to

n k n k -approximation in

time

O k log n

[Lieb’66].

4 / 14

slide-12
SLIDE 12

Approximating the Permanent

Additive ±ϵ |M|n approximation [Gurvits’05].

Positive Matrices (M ≥ 0)

Permanent is always nonnegative: per(M) ≥ 0. Randomized (1 + ϵ)-approximation (FRPAS) [Jerrum-Sinclair-Vigoda’04]. Deterministic 2n-approximation [Gurvits-Samorodnitsky’14].

PSD Matrices (M ⪰ 0)

Permanent is always nonnegative: per(M) ≥ 0. Deterministic n!-approximation [Marcus’63]: M1,1 . . . Mn,n. Improved to

n! k!n/k -approximation in

time 2O(k+log(n)) [Lieb’66].

4 / 14

slide-13
SLIDE 13

Theorem [A-Gurvits-Oveis Gharan-Saberi'17]

The permanent of PSD matrices M ∈ Cn×n can be approximated, in deterministic polynomial time, within (eγ+1)n ≃ 4.84n.

5 / 14

slide-14
SLIDE 14

Complex Gaussians

Re(z) Im(z) z ∼ CN(0, 1) P [z] = 1 πe−|z|2 Standard multivariate complex Gaussian: z z zn i.i.d. and zi . General (circularly-symmetric) complex Gaussian: g Cz g CC

Wick's Formula

g gn per CC

6 / 14

slide-15
SLIDE 15

Complex Gaussians

Re(z) Im(z) z ∼ CN(0, 1) P [z] = 1 πe−|z|2 Standard multivariate complex Gaussian: z = (z1, . . . , zn) i.i.d. and zi ∼ CN(0, 1). General (circularly-symmetric) complex Gaussian: g Cz g CC

Wick's Formula

g gn per CC

6 / 14

slide-16
SLIDE 16

Complex Gaussians

Re(z) Im(z) z ∼ CN(0, 1) P [z] = 1 πe−|z|2 Standard multivariate complex Gaussian: z = (z1, . . . , zn) i.i.d. and zi ∼ CN(0, 1). General (circularly-symmetric) complex Gaussian: g = Cz, g ∼ CN(0, CC†).

Wick's Formula

g gn per CC

6 / 14

slide-17
SLIDE 17

Complex Gaussians

Re(z) Im(z) z ∼ CN(0, 1) P [z] = 1 πe−|z|2 Standard multivariate complex Gaussian: z = (z1, . . . , zn) i.i.d. and zi ∼ CN(0, 1). General (circularly-symmetric) complex Gaussian: g = Cz, g ∼ CN(0, CC†).

Wick's Formula

E [ |g1|2 . . . |gn|2] = per(CC†).

6 / 14

slide-18
SLIDE 18

Schur Power

The Schur power of an n × n matrix M is n!               . . . . . . . . . . . . Mσ(1),τ(1) . . . Mσ(n),τ(n) . . . . . . . . . . . .     

  • n!

The Schur power is a minor of M

n.

M = schur M The permanent is an eigenvalue: schur M per M M = per M Permanent is monotone w.r.t. :

Permanent is Loewner-Monotone

M M = per M per M

7 / 14

slide-19
SLIDE 19

Schur Power

The Schur power of an n × n matrix M is n!               . . . . . . . . . . . . Mσ(1),τ(1) . . . Mσ(n),τ(n) . . . . . . . . . . . .     

  • n!

The Schur power is a minor of M⊗n. M ⪰ 0 = ⇒ schur(M) ⪰ 0 The permanent is an eigenvalue: schur M per M M = per M Permanent is monotone w.r.t. :

Permanent is Loewner-Monotone

M M = per M per M

7 / 14

slide-20
SLIDE 20

Schur Power

The Schur power of an n × n matrix M is n!               . . . . . . . . . . . . Mσ(1),τ(1) . . . Mσ(n),τ(n) . . . . . . . . . . . .     

  • n!

The Schur power is a minor of M⊗n. M ⪰ 0 = ⇒ schur(M) ⪰ 0 The permanent is an eigenvalue: schur(M)1 = per(M)1. M ⪰ 0 = ⇒ per(M) ≥ 0 Permanent is monotone w.r.t. :

Permanent is Loewner-Monotone

M M = per M per M

7 / 14

slide-21
SLIDE 21

Schur Power

The Schur power of an n × n matrix M is n!               . . . . . . . . . . . . Mσ(1),τ(1) . . . Mσ(n),τ(n) . . . . . . . . . . . .     

  • n!

The Schur power is a minor of M⊗n. M ⪰ 0 = ⇒ schur(M) ⪰ 0 The permanent is an eigenvalue: schur(M)1 = per(M)1. M ⪰ 0 = ⇒ per(M) ≥ 0 Permanent is monotone w.r.t. ⪰:

Permanent is Loewner-Monotone

M1 ⪰ M2 ⪰ 0 = ⇒ per(M1) ≥ per(M2) ≥ 0

7 / 14

slide-22
SLIDE 22

Approximation using Monotonicity

Permanent is monotone w.r.t. ⪰: D ⪰ M ⪰ vv† = ⇒ per(D) ≥ per(M) ≥ per(vv†).

Theorem [A-Gurvits-Oveis Gharan-Saberi'17]

For any M there exist diagonal matrix D and rank-1 matrix vv such that D M vv and per D

n per vv .

8 / 14

slide-23
SLIDE 23

Approximation using Monotonicity

Permanent is monotone w.r.t. ⪰: D ⪰ M ⪰ vv† = ⇒ per(D) ≥ per(M) ≥ per(vv†).

Theorem [A-Gurvits-Oveis Gharan-Saberi'17]

For any M ⪰ 0 there exist diagonal matrix D and rank-1 matrix vv† such that D ⪰ M ⪰ vv†, and per(D) ≤ 4.85n per(vv†).

8 / 14

slide-24
SLIDE 24

Computing the Approximation

Solve and output the following infD per(D), subject to D ⪰ M. Equivalently solve the convex program infD log per D subject to M D No such convex program for the best rank-1 matrix.

9 / 14

slide-25
SLIDE 25

Computing the Approximation

Solve and output the following infD per(D), subject to D ⪰ M. Equivalently solve the convex program infD−1 log(per((D−1)−1), subject to M−1 ⪰ D−1 ⪰ 0. No such convex program for the best rank-1 matrix.

9 / 14

slide-26
SLIDE 26

Sketch of Proof

Renormalize rows and columns to assume D = I. By duality, there is B with diag B such that I M B : B MB B is called a correlation matrix. Let P projimag B . Then M P because x imag B = x By = Mx MBy By x Px Prove the “PSD Van der Waerden”

PSD Van der Waerden [A-Gurvits-Oveis Gharan-Saberi'17]

If B is a correlation matrix and P the orthogonal projection onto the image of B, then per P

n

10 / 14

slide-27
SLIDE 27

Sketch of Proof

Renormalize rows and columns to assume D = I. By duality, there is B ⪰ 0 with diag(B) = 1 such that (I − M)B = 0: B = MB. B is called a correlation matrix. Let P projimag B . Then M P because x imag B = x By = Mx MBy By x Px Prove the “PSD Van der Waerden”

PSD Van der Waerden [A-Gurvits-Oveis Gharan-Saberi'17]

If B is a correlation matrix and P the orthogonal projection onto the image of B, then per P

n

10 / 14

slide-28
SLIDE 28

Sketch of Proof

Renormalize rows and columns to assume D = I. By duality, there is B ⪰ 0 with diag(B) = 1 such that (I − M)B = 0: B = MB. B is called a correlation matrix. Let P = projimag(B). Then M ⪰ P because x ∈ imag(B) = ⇒ x = By = ⇒ Mx = MBy = By = x = Px. Prove the “PSD Van der Waerden”

PSD Van der Waerden [A-Gurvits-Oveis Gharan-Saberi'17]

If B is a correlation matrix and P the orthogonal projection onto the image of B, then per P

n

10 / 14

slide-29
SLIDE 29

Sketch of Proof

Renormalize rows and columns to assume D = I. By duality, there is B ⪰ 0 with diag(B) = 1 such that (I − M)B = 0: B = MB. B is called a correlation matrix. Let P = projimag(B). Then M ⪰ P because x ∈ imag(B) = ⇒ x = By = ⇒ Mx = MBy = By = x = Px. Prove the “PSD Van der Waerden”

PSD Van der Waerden [A-Gurvits-Oveis Gharan-Saberi'17]

If B is a correlation matrix and P the orthogonal projection onto the image of B, then per(P) ≥ 4.85−n.

10 / 14

slide-30
SLIDE 30

PSD Van der Waerden

Given correlation matrix B (i.e. B ⪰ 0 and diag(B) = 1), want to show per(projimag(B)) ≥ 4.85−n. Show for some unit vector v imag B per vv

n

Let B be the Gram matrix of unit vectors u

  • un. Generate v by

normalizing the projection vector of u un onto some direction g v g u g un g u g un

11 / 14

slide-31
SLIDE 31

PSD Van der Waerden

Given correlation matrix B (i.e. B ⪰ 0 and diag(B) = 1), want to show per(projimag(B)) ≥ 4.85−n. Show for some unit vector v ∈ imag(B) per(vv†) ≥ 4.85−n. Let B be the Gram matrix of unit vectors u

  • un. Generate v by

normalizing the projection vector of u un onto some direction g v g u g un g u g un

11 / 14

slide-32
SLIDE 32

PSD Van der Waerden

Given correlation matrix B (i.e. B ⪰ 0 and diag(B) = 1), want to show per(projimag(B)) ≥ 4.85−n. Show for some unit vector v ∈ imag(B) per(vv†) ≥ 4.85−n. Let B be the Gram matrix of unit vectors u1, . . . , un. Generate v by normalizing the projection vector of u1, . . . , un onto some direction g v = [g†u1 . . . g†un] |[g†u1 . . . g†un]|.

11 / 14

slide-33
SLIDE 33

GM-AM Ratio

Let u be a random vector (e.g., uniformly sampled from u1, . . . , un). Define the GM-AM ratio as: eE[log(|u|2)] E [ |u|2] The GM-AM ratio is always . Equality happens when u .

Lemma [A-Gurvits-Oveis Gharan-Saberi'17]

If u is a random unit vector, there exists g such that the GM-AM ratio of g u is at least e

u u

g

12 / 14

slide-34
SLIDE 34

GM-AM Ratio

Let u be a random vector (e.g., uniformly sampled from u1, . . . , un). Define the GM-AM ratio as: eE[log(|u|2)] E [ |u|2] The GM-AM ratio is always ≤ 1. Equality happens when |u| = 1.

Lemma [A-Gurvits-Oveis Gharan-Saberi'17]

If u is a random unit vector, there exists g such that the GM-AM ratio of g u is at least e

u u

g

12 / 14

slide-35
SLIDE 35

GM-AM Ratio

Let u be a random vector (e.g., uniformly sampled from u1, . . . , un). Define the GM-AM ratio as: eE[log(|u|2)] E [ |u|2] The GM-AM ratio is always ≤ 1. Equality happens when |u| = 1.

Lemma [A-Gurvits-Oveis Gharan-Saberi'17]

If u is a random unit vector, there exists g such that the GM-AM ratio of g†u is at least e−γ.

  • u1
  • u2

g

12 / 14

slide-36
SLIDE 36

Complex Gaussians Come Back

Let g be a standard complex Gaussian. Then with positive probability we have: GM-AM(g†u) ≥ E [ eE

[ log(|g†u|

2)

]]

E [ |g†u|2] ≥ eE

[ log(|g†u|

2)

]

E [ |g†u|2] But log g u and g u

13 / 14

slide-37
SLIDE 37

Complex Gaussians Come Back

Let g be a standard complex Gaussian. Then with positive probability we have: GM-AM(g†u) ≥ E [ eE

[ log(|g†u|

2)

]]

E [ |g†u|2] ≥ eE

[ log(|g†u|

2)

]

E [ |g†u|2] But E [ log(

  • g†u
  • 2

) ] = −γ, and E [

  • g†u
  • 2]

= 1.

13 / 14

slide-38
SLIDE 38

Conclusion and Open Questions

(eγ+1)n-approximation for the permanent of PSD matrices. Analysis is tight. Can we improve by sandwiching between block-diagonal matrices and higher rank matrices? Use lifts to get better approximations? Markov chains to get

  • approximation?

14 / 14

slide-39
SLIDE 39

Conclusion and Open Questions

(eγ+1)n-approximation for the permanent of PSD matrices. Analysis is tight. Can we improve by sandwiching between block-diagonal matrices and higher rank matrices? Use lifts to get better approximations? Markov chains to get

  • approximation?

14 / 14

slide-40
SLIDE 40

Conclusion and Open Questions

(eγ+1)n-approximation for the permanent of PSD matrices. Analysis is tight. Can we improve by sandwiching between block-diagonal matrices and higher rank matrices? Use lifts to get better approximations? Markov chains to get

  • approximation?

14 / 14

slide-41
SLIDE 41

Conclusion and Open Questions

(eγ+1)n-approximation for the permanent of PSD matrices. Analysis is tight. Can we improve by sandwiching between block-diagonal matrices and higher rank matrices? Use lifts to get better approximations? Markov chains to get (1 + ϵ)-approximation?

14 / 14

slide-42
SLIDE 42

Conclusion and Open Questions

(eγ+1)n-approximation for the permanent of PSD matrices. Analysis is tight. Can we improve by sandwiching between block-diagonal matrices and higher rank matrices? Use lifts to get better approximations? Markov chains to get (1 + ϵ)-approximation?

14 / 14