Phase Retrieval using Partial Unitary Sensing Matrices Rishabh - - PowerPoint PPT Presentation

phase retrieval using partial unitary sensing matrices
SMART_READER_LITE
LIVE PREVIEW

Phase Retrieval using Partial Unitary Sensing Matrices Rishabh - - PowerPoint PPT Presentation

Phase Retrieval using Partial Unitary Sensing Matrices Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki 1 The Phase Retrieval Problem Recover unknown x from y = | Ax | x C n : signal vector. y R m :


slide-1
SLIDE 1

Phase Retrieval using Partial Unitary Sensing Matrices

Rishabh Dudeja, Milad Bakhshizadeh, Junjie Ma, Arian Maleki

1

slide-2
SLIDE 2

The Phase Retrieval Problem

◮ Recover unknown x⋆ from

y = |Ax⋆|

◮ x⋆ ∈ Cn: signal vector. ◮ y ∈ Rm: measurements. ◮ A: sensing matrix. ◮ δ = m/n: sampling ratio. 2

slide-3
SLIDE 3

Popular sensing matrices (in theoretical work)

◮ Popular sensing matrices:

◮ Gaussian: Aij i.i.d.

∼ CN 0, 1/n

◮ Coded diffraction pattern (CDP):

ACDP =

  

F D1 . . . F DL

  

◮ Dl = Diag(eiφ(l) 1 , . . . eiφ(l) n ) ◮ F : Fourier matrix ◮ φ1, . . . , φn : independent uniform phases

◮ Objective: Which matrix performs better from a purely theoretical

standpoint?

3

slide-4
SLIDE 4

Flashback to compressed sensing:

◮ Performance of partial orthogonal versus Gaussian on LASSO

◮ Noiseless measurements: Same phase transition ◮ Noisy measurements: Partial orthogonal (Fourier) is better

Related work:

◮ Originally observed Donoho, Tanner (2009) ◮ Phase transition analysis of Gaussian matrices Donoho, Tanner (2006) ◮ Mean square error calculation of Gaussian matrices Donohoa, Maleki, Montanari (2011), Bayati, Montanari (2011), Thrampoulidis, Oymak, Hassibi (2015) ◮ Mean square error calculation of partial orthogonal matrices: Tulino, Verdue, Caire (2013), Thrampoulidis,Hassibi (2015) 4

slide-5
SLIDE 5

This talk : Spectral Estimator

  • P. Netrapalli, P. Jain & S. Sanghavi (2015)

◮ The spectral estimator ˆ

x : the leading eigenvector of the matrix M

= AHT A

◮ T ∆

= Diag(T (y1), . . . T (ym))

◮ T : R≥0 → [0, 1] is a continuous trimming function 5

slide-6
SLIDE 6

This talk : Spectral Estimator

  • P. Netrapalli, P. Jain & S. Sanghavi (2015)

◮ The spectral estimator ˆ

x : the leading eigenvector of the matrix M

= AHT A

◮ T ∆

= Diag(T (y1), . . . T (ym))

◮ T : R≥0 → [0, 1] is a continuous trimming function

◮ Population Behaviour: E M = λ1x⋆xH ⋆ + λ2(In − x⋆xH ⋆ )

◮ λ1 = E T|Z|2 ◮ λ2 = E T ◮ Z ∼ CN (0, 1) ◮ T = T (|Z|/

√ δ).

5

slide-7
SLIDE 7

CDP behaves like oversampled Haar model

ρ = 1 n|xH

⋆ ˆ

x|.

1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ ρ2 T (y) = δy2/(δy2 + 0.1) T (y) = δy2/(δy2 + √ δ − 1) T (y) = δy2I(

  • δy2 ≤ 2)/4

Oversampled Haar model explains CDP.

6

slide-8
SLIDE 8

Refined objective

◮ Compare the spectral estimator on

◮ Gaussian: Aij ∼ N(0, 1

n).

◮ Oversampled Haar:

Hm ∼ Unif(U(m)), A = HmSm,n, y = |Ax⋆|.

◮ Sm,n: selects the columns randomly.

◮ We use the asymptotic framework

  • 1. m, n → ∞
  • 2. m/n → δ

7

slide-9
SLIDE 9

Sharp Asymptotics: Gaussian Sensing matrices

  • Y. Lu and G. Li (2016)

◮ For δ > 1: ∃ A spectral estimator ˆ

x with |ˆ xHx⋆|2/n → ρ2 > 0.

  • Y. Lu and G. Li (2016), M. Mondelli and A. Montanari (2017)

◮ Lu and Li also showed how ρ can be calculated.

8

slide-10
SLIDE 10

Main Result: oversampled Haar matrices

Theorem: We have, |xH

⋆ ˆ

x|2 n

P

  • 0,

ψ1(τ⋆) <

δ δ−1,

ρ2

T (δ),

ψ1(τ⋆) >

δ δ−1,

where, Λ(τ) = τ − 1 − 1/δ E

1 τ−T

, ψ1(τ) = E

  • |Z|2

τ−T

  • E
  • 1

τ−T

, τ ∈ [1, ∞). And, τ⋆ = arg min Λ(τ), Z ∼ CN (0, 1) , T = T (|Z|/ √ δ).

9

slide-11
SLIDE 11

Application: Optimal Trimming Functions

◮ Weak recovery threshold of T :

δT

= inf{δ ≥ 1 : ρ2

T (δ) > 0}

10

slide-12
SLIDE 12

Application: Optimal Trimming Functions

◮ Weak recovery threshold of T :

δT

= inf{δ ≥ 1 : ρ2

T (δ) > 0} ◮ For oversampled Haar measurement matrix, the optimal trimming

function is T⋆(y) = 1 − 1 δy2 , δT⋆ = 2.

10

slide-13
SLIDE 13

Application: Optimal Trimming Functions

◮ Weak recovery threshold of T :

δT

= inf{δ ≥ 1 : ρ2

T (δ) > 0} ◮ For oversampled Haar measurement matrix, the optimal trimming

function is T⋆(y) = 1 − 1 δy2 , δT⋆ = 2.

◮ For Gaussian Sensing: δT⋆ = δIT = 1.

Luo, Alghamadi and Lu (2018). 10

slide-14
SLIDE 14

Remainder of the Talk: A sketch of the proof.

11

slide-15
SLIDE 15

Main ingredient: free probability

◮ Classical probability theory:

◮ Consider two independent random variables X ∼ fX(x), Y ∼ fY (y): ◮ fX+Y (t) = fX(t) ∗ fY (t) =

fX(z)fY (t − z)dz.

◮ fXY (t) =

z fX(x)fY ( z x) 1 |x|dx.

12

slide-16
SLIDE 16

Main ingredient: free probability

◮ Classical probability theory:

◮ Consider two independent random variables X ∼ fX(x), Y ∼ fY (y): ◮ fX+Y (t) = fX(t) ∗ fY (t) =

fX(z)fY (t − z)dz.

◮ fXY (t) =

z fX(x)fY ( z x) 1 |x|dx.

◮ Free probability theory (for random matrices):

◮ Let X and Y be “freely independent” ◮ Let µX(z) denote empirical spectral distribution of X ◮ µX+Y (z) = µx(z) ⊞ µy(z) (free "additive" convolution) ◮ µXY (z) = µx(z) ⊠ µy(z) (free "multiplicative" convolution) 12

slide-17
SLIDE 17

Step 1: Reduction to Rank-1 Additive Deformation

◮ Rotational invariance =

⇒ assume x⋆ = √ne1.

◮ Partition M:

M =

  • AH

1 T A1

AH

1 T A−1

AH

−1T A1

AH

−1T A−1

  • 13
slide-18
SLIDE 18

Step 1: Reduction to Rank-1 Additive Deformation

◮ Rotational invariance =

⇒ assume x⋆ = √ne1.

◮ Partition M:

M =

  • AH

1 T A1

AH

1 T A−1

AH

−1T A1

AH

−1T A−1

  • Proposition (Lu & Li, 2017) : a ∈ R, µ ∈ R≥0.

M(a) =

  • a

qH q P

  • ,

M(µ) = P + µqqH. ∃ µeff(a): (a) λ1(M(a)) = λ1( M(µeff(a))). (b) |eH

1 v1|2 = daλ1(M(a))

13

slide-19
SLIDE 19

Step 1: Reduction to Rank-1 Additive Deformation

◮ Rotational invariance =

⇒ assume x⋆ = √ne1.

◮ Partition M:

M =

  • AH

1 T A1

AH

1 T A−1

AH

−1T A1

AH

−1T A−1

  • Proposition (Lu & Li, 2017) : a ∈ R, µ ∈ R≥0.

M(a) =

  • a

qH q P

  • ,

M(µ) = P + µqqH. ∃ µeff(a): (a) λ1(M(a)) = λ1( M(µeff(a))). (b) |eH

1 v1|2 = daλ1(M(a))

New Goal: Analyze L(µ)

= λ1(AH

−1(T + µT A1(T A1)H)A−1)

13

slide-20
SLIDE 20

Why free probability?

◮ Analyze L(µ) = λ1(AH −1(T + µT A1(T A1)H)A−1).

14

slide-21
SLIDE 21

Why free probability?

◮ Analyze L(µ) = λ1(AH −1(T + µT A1(T A1)H)A−1). ◮ A−1, A1 are dependent.

14

slide-22
SLIDE 22

Conclusion

◮ Compared oversampled Haar sensing matrix with Gaussian

◮ Oversampled Haar sensing matrix with optimal trimming: δ = 2 ◮ Gaussian matrix with optimal trimming: δ = 1 15

slide-23
SLIDE 23

Conclusion

◮ Compared oversampled Haar sensing matrix with Gaussian

◮ Oversampled Haar sensing matrix with optimal trimming: δ = 2 ◮ Gaussian matrix with optimal trimming: δ = 1

◮ Oversampled Haar approximates the CDP sensing matrices

1 2 3 4 5 6 7 8 9 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 δ ρ2 T (y) = δy2/(δy2 + 0.1) T (y) = δy2/(δy2 + √ δ − 1) T (y) = δy2I(

  • δy2 ≤ 2)/4

Oversampled Haar model explains CDP; ρ = 1

n|xH ∗ ˆ

x|.

15