Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - - PowerPoint PPT Presentation

sparse regression codes
SMART_READER_LITE
LIVE PREVIEW

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale - - PowerPoint PPT Presentation

Sparse Regression Codes Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016 . . . . . . . . . . .


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Sparse Regression Codes

Andrew Barron Ramji Venkataramanan Yale University University of Cambridge Joint work with Antony Joseph, Sanghee Cho, Cynthia Rush, Adam Greig, Tuhin Sarkar, Sekhar Tatikonda ISIT 2016

1 / 31

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Part II of the tutorial:

  • Approximate message passing (AMP) decoding
  • Power-allocation schemes to improve finite block-length

performance

(Joint work with Cynthia Rush and Adam Greig)

2 / 31

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC Decoding

A: β:

0, √nP2, 0, √nPL, 0,

, 0

0, M columns M columns M columns Section 1 Section 2 Section L

T

n rows 0, √nP1,

Channel output y = Aβ + ε Want efficient algorithm to decode β from y

3 / 31

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMP for Compressed Sensing

  • Approximation of loopy belief propagation for dense graphs

[Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ]

  • Compressed sensing (CS): Want to recover β from

y = Aβ + ε A is n × N measurement matrix, β i.i.d. with known prior

β1 β2 βN

In CS, we often solve LASSO: ˆ β = arg minβ ∥y − Aβ∥2

2 + λ∥β∥1

4 / 31

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Min-Sum Message Passing for LASSO

Want to compute ˆ β = arg minβ ∑n

i=1 (yi − (Aβ)i)2 + λ ∑N j=1|βj|

β1 β2 βN

5 / 31

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Min-Sum Message Passing for LASSO

Want to compute ˆ β = arg minβ ∑n

i=1 (yi − (Aβ)i)2 + λ ∑N j=1|βj|

β1 β2 βN

For j = 1, . . . , N, i = 1, . . . , n: Mt

j→i(βj) = λ|βj| +

i′∈[n]\i

ˆ Mt−1

i′→j(βj)

5 / 31

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Min-Sum Message Passing for LASSO

Want to compute ˆ β = arg minβ ∑n

i=1 (yi − (Aβ)i)2 + λ ∑N j=1|βj|

β1 β2 βN

For j = 1, . . . , N, i = 1, . . . , n: Mt

j→i(βj) = λ|βj| +

i′∈[n]\i

ˆ Mt−1

i′→j(βj)

ˆ Mt

i→j(βj) = min β\βj

(yi − (Aβ)i)2 + ∑

j′∈[N]\j

Mt

j′→i(βj′)

5 / 31

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

β1 β2 βN

Mt

j→i(βj) = λ|βj| +

i′∈[n]\i

ˆ Mt−1

i′→j(βj)

ˆ Mt

i→j(βj) = min β\βj

(yi − (Aβ)i)2 + ∑

j′∈[N]\j

Mt

j′→i(βj′)

But computing these messages is infeasible: — Each message needs to be computed for all βj ∈ R — There are nN such messages Further, the factor graph is not anything like a tree!

6 / 31

slide-9
SLIDE 9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Quadratic Approximation of messages

β1 β2 βN

Messages approximated by two numbers: rt

i→j = yi −

j′∈[N]\i

Aij′ βt

j′→i

7 / 31

slide-10
SLIDE 10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Quadratic Approximation of messages

β1 β2 βN

Messages approximated by two numbers: rt

i→j = yi −

j′∈[N]\i

Aij′ βt

j′→i

βt+1

j→i = ηt

( ∑

i′∈[n]\i

Ai′j rt

i′→j

)

  • For LASSO, ηt is the soft-thresholding operator
  • We still have nN messages in each step . . .

7 / 31

slide-11
SLIDE 11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rt

i→j = yi −

j′∈[N]

Aij′ βt

j′→i + Aijβt j→i

βt+1

j→i = ηt

( ∑

i′∈[n]

Ai′j rt

i′→j − Aijrt i→j

) Using Taylor approximations . . .

8 / 31

slide-12
SLIDE 12

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The AMP algorithm for LASSO

[Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] rt = y − Aβt + rt−1 ∥βt∥0 n βt+1 = ηt(ATrt + βt)

  • AMP iteratively produces estimates β0 = 0, β1, . . . , βt, . . .
  • rt is the ‘modified residual’ after step t
  • ηt denoises the effective observation to produce βt+1

9 / 31

slide-13
SLIDE 13

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The AMP algorithm for LASSO

[Donoho-Montanari-Maleki ’09, Rangan ’11, Krzakala et al ’11. . . ] rt = y − Aβt + rt−1 ∥βt∥0 n βt+1 = ηt(ATrt + βt)

  • AMP iteratively produces estimates β0 = 0, β1, . . . , βt, . . .
  • rt is the ‘modified residual’ after step t
  • ηt denoises the effective observation to produce βt+1

The momentum term in rt ensures that asymptotically ATrt + βt ≈ β + τtZt where Zt is N(0, I) ⇒ The effective observation ATrt + xt is true signal observed in independent Gaussian noise

9 / 31

slide-14
SLIDE 14

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMP for SPARC Decoding

A: β:

0, √nP2, 0, √nPL, 0,

, 0

0, M columns M columns M columns Section 1 Section 2 Section L

T

n rows 0, √nP1,

y = Aβ + ε, ε i.i.d. ∼ N(0, σ2) SPARC decoding is a different optimization problem from LASSO:

  • Want arg minβ∥Y − Aβ∥2 s.t. β is a SPARC message
  • β has one non-zero per section, section size M → ∞
  • The undersampling ratio n/(ML) → 0.

Let us revisit the (approximated) min-sum updates . . .

10 / 31

slide-15
SLIDE 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Approximated Min-Sum

β1 β2 βN

rt

i→j = yi −

j′∈[N]\i

Aij′ βt

j′→i

βt+1

j→i = ηt,j

( ∑

i′∈[n]\i

Ai′j rt

i′→j

  • statt,j

) If for j ∈ [N], statt,j is approximately distributed as βj + τtZt,j, then the Bayes optimal choice of ηt,j is . . .

11 / 31

slide-16
SLIDE 16

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Approximated Min-Sum

β1 β2 βN

rt

i→j = yi −

j′∈[N]\i

Aij′ βt

j′→i

βt+1

j→i = ηt,j

( ∑

i′∈[n]\i

Ai′j rt

i′→j

  • statt,j

) If for j ∈ [N], statt,j is approximately distributed as βj + τtZt,j, then the Bayes optimal choice of ηt,j is . . .

11 / 31

slide-17
SLIDE 17

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayes Optimal ηt

ηt(statt = s) = E[β | β + τtZ = s]: ηt,j(s) = √ nPℓ exp ( sj √nPℓ/τ 2

t

) ∑

j′∈secℓ exp

( sj′√nPℓ/τ 2

t

), j ∈ section ℓ. Note that βt+1 is

  • the MMSE estimate of β given the observation β + τtZt
  • ∝ the posterior probability of entry j of βj being non-zero

12 / 31

slide-18
SLIDE 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bayes Optimal ηt

ηt(statt = s) = E[β | β + τtZ = s]: ηt,j(s) = √ nPℓ exp ( sj √nPℓ/τ 2

t

) ∑

j′∈secℓ exp

( sj′√nPℓ/τ 2

t

), j ∈ section ℓ. Note that βt+1 is

  • the MMSE estimate of β given the observation β + τtZt
  • ∝ the posterior probability of entry j of βj being non-zero

rt

i→j = yi −

j′∈[N]

Aij′ βt

j′→i + Aijβt j→i

βt+1

j→i = ηt

( ∑

i′∈[n]

Ai′j rt

i′→j − Aijrt i→j

) Using Taylor approximations . . .

12 / 31

slide-19
SLIDE 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMP Decoder

Set β0 = 0. For t ≥ 0: rt = y − Aβt + rt−1 τ 2

t−1

( P − ∥βt∥2 n ) , βt+1

j

= ηt,j(ATrt + βt), for j = 1, . . . , ML

13 / 31

slide-20
SLIDE 20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

AMP Decoder

Set β0 = 0. For t ≥ 0: rt = y − Aβt + rt−1 τ 2

t−1

( P − ∥βt∥2 n ) , βt+1

j

= ηt,j(ATrt + βt), for j = 1, . . . , ML ηt,j(s) = √ nPℓ exp ( sj √nPℓ/τ 2

t

) ∑

j′∈secℓ exp

( sj′√nPℓ/τ 2

t

), j ∈ section ℓ. βt+1 is the MMSE estimate of β given that βt + ATrt ≈ β + τtZt

13 / 31

slide-21
SLIDE 21

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The statistic βt + ATr t

Suppose rt = y − Aβt βt + ATrt = β + ATε

  • N(0,σ2)

+ (I − ATA)

  • ≈ N (0,1/n)

(βt − β)

14 / 31

slide-22
SLIDE 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The statistic βt + ATr t

rt = y − Aβt + rt−1 τ 2

t−1

( P − ∥βt∥2 n ) βt + ATrt = β + ATw

N(0,σ2)

+ (I − ATA)(βt − β) + ATrt−1 τ 2

t−1

( P − ∥βt∥2 n )

  • Momentum term asymptotically cancels out dependence

between (I − ATA) and (β − βt) so that βt + ATrt ≈ β + τtZt, where τ 2

t = σ2 + 1 nE∥β − βt∥2

  • Recall that the plain residual does not give statistic with the

desired representation

14 / 31

slide-23
SLIDE 23

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Iteratively compute variances τ 2

t

τ 2

0 = σ2 + P

τ 2

t = σ2 + P(1 − xt(τt−1)),

t ≥ 1 where xt(τt−1) =

L

ℓ=1

Pℓ P E [ exp ( √nPℓ

τt−1 (Uℓ 1 + √nPℓ τt−1 )

) exp ( √nPℓ

τt−1 (Uℓ 1 + √nPℓ τt−1 )

) + ∑M

j=2 exp

( √nPℓ

τt−1 Uℓ j

) ] {Uℓ

j } are i.i.d. ∼ N(0, 1)

With statt = β + τtZ:

1 n E∥β − βt∥2 = P(1 − xt) and 1 n E[βTβt] = 1 n E∥βt∥2 = Pxt

  • xt: Expected power-weighted fraction of correctly

decoded sections after step t

  • P(1 − xt): interference due to undecoded sections

15 / 31

slide-24
SLIDE 24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xt vs. t

SPARC: M = 512, L = 1024, snr = 15, R = 7C, Pℓ ∝ 2−2Cℓ/L

xt = 1 nP E[βTβt], x∗

t = 1

nP βTβt “Power-weighted fraction of correctly decoded sections in βt”

16 / 31

slide-25
SLIDE 25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

State Evolution

statt = ATrt + βt ≈ β + τtZt with τ 2

t = σ2 + P(1 − xt), where xt = x(τ 2 t−1):

x(τt−1) =

L

ℓ=1

Pℓ P E [ exp ( √nPℓ

τt−1 (Uℓ 1 + √nPℓ τt−1 )

) exp ( √nPℓ

τt−1 (Uℓ 1 + √nPℓ τt−1 )

) + ∑M

j=2 exp

( √nPℓ

τt−1 Uℓ j

) ]

KEY Property

  • Starting from 0, xt increases with t for a finite number of

steps Tn where xTn ≈ 1

  • Starting from τ 2

0 = σ2 + P, the τ 2 t decreases with t until ≈ σ2

  • AMP has effectively converted the A matrix to an identity!

17 / 31

slide-26
SLIDE 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

State Evolution Asymptotics

Lemma [Rush, Greig, Venkataramanan ’15]: ¯ x(τ) := lim x(τ) = lim

L→∞ L

ℓ=1

Pℓ P 1{cℓ > 2Rτ 2} where cℓ = LPℓ and R is in nats. In the large system limit:

  • In step t + 1 all sections with power cℓ > 2Rτ 2

t will be

decodable, i.e., sent terms have weights very close to 1

  • Other sections will not be decodable
  • Can use this to understand decoding progression for any

power allocation

18 / 31

slide-27
SLIDE 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Asymptotics for Pℓ ∝ e−2Cℓ/L

¯ xt := lim xt = (1 + snr) − (1 + snr)1−ξt−1 snr , ¯ τ 2

t := lim τ 2 t = σ2 + P(1 − ¯

xt) = σ2 (1 + snr)1−ξt−1 where ξ−1 = 0 and for t ≥ 0, ξt = min {( 1 2C log ( C R ) + ξt−1 ) , 1 } .

19 / 31

slide-28
SLIDE 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Asymptotics for Pℓ ∝ e−2Cℓ/L

¯ xt := lim xt = (1 + snr) − (1 + snr)1−ξt−1 snr , ¯ τ 2

t := lim τ 2 t = σ2 + P(1 − ¯

xt) = σ2 (1 + snr)1−ξt−1 where ξ−1 = 0 and for t ≥ 0, ξt = min {( 1 2C log ( C R ) + ξt−1 ) , 1 } . For R < C, ¯ xt ↗ 1 and ¯ τ 2

t ↘ σ2 in exactly T ∗ = ⌈ 2C log(C/R)⌉ steps

Run AMP decoder for T ∗ steps to get β1, . . . , βT ∗ → ˆ β

19 / 31

slide-29
SLIDE 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Performance of AMP Decoder

The section error rate of a decoder for SPARC S is Esec(S) := 1 L

L

ℓ=1

1{ˆ βℓ ̸= βℓ}. Theorem [Rush, Greig, Venkataramanan ’15]: Fix rate R < C, and b > 0. Consider a sequence of rate R SPARCs {Sn} indexed by block length n, with design matrix parameters L and M = Lb, and power allocation ∝ e−2Cℓ/L. Then the section error rate of the AMP decoder converges to zero almost surely, i.e., for any ϵ > 0, lim

n0→∞ P (Esec(Sn) < ϵ, ∀n ≥ n0) = 1.

20 / 31

slide-30
SLIDE 30

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Error Exponent

Theorem [Rush, Venkataramanan ’16]: For sufficiently large n, L, the section error rate of the AMP decoder satisfies P (Esec(Sn) > ϵ) ≤ P (∥βT ∗ − β∥2 n > ϵ σ2 ln(1+snr)

4

) ≤ K exp ( −κLϵ2/(log M)2T ∗)

21 / 31

slide-31
SLIDE 31

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Proof Idea

Steps:

  • 1. Characterize the conditional distribution of statistic and

residual in terms of i.i.d. Gaussian plus deviation term: Show (ATrt + βt − β)|past,β,ε

d

= τt Zt + ∆t (rt − ε)|past,β,ε

d

= √ τ 2

t − σ2 Z ′ t + ∆′ t

22 / 31

slide-32
SLIDE 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Proof Idea

Steps:

  • 1. Characterize the conditional distribution of statistic and

residual in terms of i.i.d. Gaussian plus deviation term: Show (ATrt + βt − β)|past,β,ε

d

= τt Zt + ∆t (rt − ε)|past,β,ε

d

= √ τ 2

t − σ2 Z ′ t + ∆′ t

∆t =

t−1

r=0

(αt

r − ˆ

αt

r)hr+1 +

[(∥mt

⊥∥

√n − τ ⊥

t

) I − ∥mt

⊥∥

√n P∥

Qt+1

] Zt + Qt+1 (QT

t+1Qt+1

n )−1 (BT

t+1m⊥ t

n − QT

t+1qt ⊥

n ) ∆′

t = . . .

22 / 31

slide-33
SLIDE 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Proof Idea

Steps:

  • 1. Characterize the conditional distribution of statistic and

residual in terms of i.i.d. Gaussian plus deviation term.

  • 2. Inductively obtain concentration results to show the deviation

terms are small with high probability.

  • Result also shows that ∥rt∥2

n

concentrates around τ 2

t

  • So could use these in the decoder instead of precomputing

“Finite Sample analysis of AMP”: talk by Cynthia Rush at 17:30

  • n Monday

22 / 31

slide-34
SLIDE 34

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Empirical Performance

SPARC: snr = 15,

Power allocation plays a key role at finite block lengths!

23 / 31

slide-35
SLIDE 35

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Power Allocation

Give a picture Pℓ = { κ · e−2aCℓ/L, 0 < ℓ

L ≤ f

κ · e−2aCf , f < ℓ

L ≤ 1

Parameters a, f ∈ [0, 1]:

  • Smaller a makes the exponential decay gentler, allocating less

power to initial sections

  • f controls where you start flattening
  • Given R and snr, can optimize over all a, f that give

asymptotic ¯ xT = 1 for some finite T

24 / 31

slide-36
SLIDE 36

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Power Allocation

Give a picture Pℓ = { κ · e−2aCℓ/L, 0 < ℓ

L ≤ f

κ · e−2aCf , f < ℓ

L ≤ 1

Parameters a, f ∈ [0, 1]:

  • Smaller a makes the exponential decay gentler, allocating less

power to initial sections

  • f controls where you start flattening
  • Given R and snr, can optimize over all a, f that give

asymptotic ¯ xT = 1 for some finite T Can we have a non-parametric algorithm to get optimal power allocation for a given R and snr ?

24 / 31

slide-37
SLIDE 37

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithmic Power Allocation

¯ x(τ 2

t ) = limL→∞

∑L

ℓ=1 Pℓ P 1{cℓ > 2Rτ 2 t } where cℓ = LPℓ

  • Fix target number of decoding steps T ∗
  • Asymptotically want to decode L/T ∗ sections in each step

With τ 2

0 = σ2 + P, set cℓ = 2Rτ 2 0 + δ for ℓ ≤ L/T ∗

25 / 31

slide-38
SLIDE 38

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithmic Power Allocation

¯ x(τ 2

t ) = limL→∞

∑L

ℓ=1 Pℓ P 1{cℓ > 2Rτ 2 t } where cℓ = LPℓ

  • Fix target number of decoding steps T ∗
  • Asymptotically want to decode L/T ∗ sections in each step

τ 2

1 = σ2 + P(1 − ¯

x(τ 2

0 ))

Compare 2Rτ 2

1 + δ with allocating remaining power equally, choose

the greater for the next L/T ∗ sections

25 / 31

slide-39
SLIDE 39

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithmic Power Allocation

¯ x(τ 2

t ) = limL→∞

∑L

ℓ=1 Pℓ P 1{cℓ > 2Rτ 2 t } where cℓ = LPℓ

  • Fix target number of decoding steps T ∗
  • Asymptotically want to decode L/T ∗ sections in each step

τ 2

2 = σ2 + P(1 − ¯

x(τ 2

1 ))

Compare 2Rτ 2

2 + δ with allocating remaining power equally, choose

the greater for the next L/T ∗ sections

25 / 31

slide-40
SLIDE 40

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithmic Power Allocation

¯ x(τ 2

t ) = limL→∞

∑L

ℓ=1 Pℓ P 1{cℓ > 2Rτ 2 t } where cℓ = LPℓ

  • Fix target number of decoding steps T ∗
  • Asymptotically want to decode L/T ∗ sections in each step

τ 2

3 = σ2 + P(1 − ¯

x(τ 2

3 ))

Compare 2Rτ 2

3 + δ with allocating remaining power equally, choose

the greater for the next L/T ∗ sections

25 / 31

slide-41
SLIDE 41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Algorithmic Power Allocation

¯ x(τ 2

t ) = limL→∞

∑L

ℓ=1 Pℓ P 1{cℓ > 2Rτ 2 t } where cℓ = LPℓ

  • Fix target number of decoding steps T ∗
  • Asymptotically want to decode L/T ∗ sections in each step

τ 2

3 = σ2 + P(1 − ¯

x(τ 2

3 ))

Compare 2Rτ 2

3 + δ with allocating remaining power equally, choose

the greater for the next L/T ∗ sections

25 / 31

slide-42
SLIDE 42

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

SPARC: snr = 15

  • Algorithmic power alloc.performs close to (or even beats!)

numerically optimized for wide range of snr and R values

  • Optimal value of tolerance δ varies (slightly) with snr

26 / 31

slide-43
SLIDE 43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Spatially-coupled design matrices [Barbier, Krzakala ’15]: Another way to improve performance at finite block-lengths

           

F

J J J J J J J 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

w

1 1 1

×

  • Band-diagonal structure for A; each block consists of random

rows of a Hadamard matrix

  • First few sections of β are oversampled to kick-start the

decoding progression

  • Good empirical results at finite block lengths
  • (Non-rigorous) Replica analysis indicates that this is

asymptotically capacity-achieving

27 / 31

slide-44
SLIDE 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Complexity of AMP Decoder

Complexity determined by matrix-vector mults. Aβt and ATrt For Gaussian A:

  • Complexity and memory both O(nN)
  • Hard to implement for very large M, L, e.g, M = L = 1024

28 / 31

slide-45
SLIDE 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Complexity of AMP Decoder

Complexity determined by matrix-vector mults. Aβt and ATrt For Gaussian A:

  • Complexity and memory both O(nN)
  • Hard to implement for very large M, L, e.g, M = L = 1024

For practical implementation, use Hadamard design:

  • For N = 2m, Hadamard matrix Hm ∈ RN×N:

Hm = [Hm−1 Hm−1 Hm−1 −Hm−1 ] , H0 = 1.

  • Design matrix A: Pick n random rows of Hm
  • Multiplications via fast Walsh-Hadamard Transform ⇒

Complexity: O(N log N) ∼ n1+ϵ

  • Don’t need to store A in memory!

28 / 31

slide-46
SLIDE 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Comparison of SPARC decoders

In both decoders, βt+1

j

= ηt,j(statt), for j = 1, . . . , ML AMP: rt = y − Aβt + rt−1 τ 2

t−1

( P − ∥βt∥2 n ) , statt = βt + ATrt, Adaptive Soft-Decision Decoding: Gt = part of Aβt orthogonal to Aβt−1, . . . , Y Zt = √n ATGt/∥Gt∥ Zcomb

t

= λ0 Z0 + λ1 Z1 + . . . + λt Zt, ∑ λ2

k = 1

statt = τtZcomb

t

+ βt

29 / 31

slide-47
SLIDE 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Comparison of SPARC decoders

In both decoders, βt+1

j

= ηt,j(statt), for j = 1, . . . , ML AMP: rt = y − Aβt + rt−1 τ 2

t−1

( P − ∥βt∥2 n ) , statt = βt + ATrt, Adaptive Soft-Decision Decoding: Gt = part of Aβt orthogonal to Aβt−1, . . . , Y Zt = √n ATGt/∥Gt∥ Zcomb

t

= λ0 Z0 + λ1 Z1 + . . . + λt Zt, ∑ λ2

k = 1

statt = τtZcomb

t

+ βt

  • For both, Pr

( | 1

nβTβt − xtP| > δ

) ≤ Kt exp(−κtnδ2/(log M)2t)

  • Constants, analysis techniques are different

29 / 31

slide-48
SLIDE 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Summary + Future Directions

  • AMP for low-complexity, parameter-free SPARC decoding
  • For any rate R < C, the probability of section error rate > ϵ

decays exponentially as Lϵ2/(log M)t

  • With Hadamard-based matrices, can implement the decoder

for large block lengths, making SPARCs an attractive alternative to coded modulation Open Questions:

  • 1. Theoretical Guarantees for the Hadamard-based SPARC
  • 2. Can we combine power allocation and spatial coupling to get

good empirical performance close to C at reasonable block lengths?

  • 3. The BIG question: Can we design feasible decoders whose gap

to capacity is O ( 1

na

) for some a ∈ (0, 1

2)?

30 / 31

slide-49
SLIDE 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

References

– C. Rush, A. Greig, and R. Venkataramanan, Capacity-achieving Sparse Superposition Codes with Approximate Message Passing Decoding, https://arxiv.org/abs/1501.05892, 2015 (short version at ISIT ’15) – J. Barbier, F. Krzakala, Approximate message-passing decoder and capacity-achieving sparse superposition codes, https://arxiv.org/abs/1503.08040, 2015 – C. Rush, R. Venkataramanan, Finite-Sample Analysis of Approximate Message Passing, https://arxiv.org/abs/1606.01800 (ISIT ’16)

31 / 31