From Supervised to Unsupervised Computational Sensing Ali Mousavi - - PowerPoint PPT Presentation

from supervised to unsupervised computational sensing
SMART_READER_LITE
LIVE PREVIEW

From Supervised to Unsupervised Computational Sensing Ali Mousavi - - PowerPoint PPT Presentation

From Supervised to Unsupervised Computational Sensing Ali Mousavi Aug 12 th 2019 brain Brain vision summit 1 Collaborators Rich Baraniuk Arian Maleki Rice University Columbia University Chris Metzler Reinhard Heckel Gautam Dasarathy 2


slide-1
SLIDE 1

From Supervised to Unsupervised Computational Sensing

Ali Mousavi

Aug 12th 2019

brain vision summit

Brain

1

slide-2
SLIDE 2

Collaborators

2

Rich Baraniuk Rice University

Reinhard Heckel Rice University

Arian Maleki Columbia University

Chris Metzler Stanford University Gautam Dasarathy Arizona StateUniversity

slide-3
SLIDE 3

Computational Sensing

Subject

Φ

Simpler Hardware Measurements

Computation

Software

Subject Expensive Hardware

Ψ

  • Conventional Sensing
  • Computational Sensing: Reduce costs in acquisition systems

by replacing expensive hardware w/ cheap hardware + computation

3

slide-4
SLIDE 4

Large Scale Datasets

4

slide-5
SLIDE 5

Data-Driven Computational Sensing

Subject Simpler Hardware Measurements Computational Software Recovered Subject

5

slide-6
SLIDE 6

Model

Subject

Φ

Simpler Hardware Measurements

Computation

Software

x ∈ RN y = Φ(x) ∈ RM ˆ x ∈ RN

Φ(.)

− − − →

Φ−1(.)

− − − − →

×

= =

× ×

=

y

x

Φ

y

x

Φ

y

x

Φ

M < N

M = N

M > N

Underdetermined Overdetermined Determined

6

slide-7
SLIDE 7

Model

×

=

Φ

x

y

M N

Subject

Φ

Simpler Hardware Measurements

Computation

Software

x ∈ RN y = Φ(x) ∈ RM ˆ x ∈ RN

Φ(.)

− − − →

Φ−1(.)

− − − − →

7

slide-8
SLIDE 8

Applications

8

slide-9
SLIDE 9

Data-Driven Computational Sensing

min

x ky Φxk2 2 + λ ⇥ f(x)

×

=

Φ y

M N

xo

9

slide-10
SLIDE 10

Iterative Algorithms

Initial Estimate Calculate the Residual Update the Estimate

Until Convergence

min

x ky Φxk2 2 + λ ⇥ f(x)

y Φ

M ⌧ N

xo

10

slide-11
SLIDE 11

Iterative Algorithms

y = Φx

C

xo

min

x ky Φxk2 2 + λ ⇥ f(x)

y Φ

M ⌧ N

xo

11

slide-12
SLIDE 12

Data-Driven Computational Sensing

min

x ky Φxk2 2 + λ ⇥ f(x)

×

=

Φ y

M N

xo

12

slide-13
SLIDE 13

Data-Driven Computational Sensing

min

x ky Φxk2 2 + λ ⇥ f(x)

×

=

Φ y

M N

xo

13

slide-14
SLIDE 14

Sparse Regression

14

min

x ky Φxk2 2 + λkxk1

min

x ky Φxk2 2 + λ ⇥ f(x)

y Φ

M ⌧ N

xo

slide-15
SLIDE 15

Approximate Message Passing

  • Approximate Message Passing (AMP)

[Donoho, Maleki, Montanari 2009] y = Φx

Φ|y

C

η ( Φ| y )

xo

xt+1 = η(xt + Φ|zt; τ t)

zt = y − Φxt + 1 δ zt1 ⌦ η0(xt1 + Φ|zt1) ↵

min

x ky Φxk2 2 + λkxk1

y Φ

M ⌧ N

xo

15

slide-16
SLIDE 16

Approximate Message Passing

  • Approximate Message Passing (AMP)

[Donoho, Maleki, Montanari 2009] y = Φx

Φ|y

C

η ( Φ| y )

xo

xt+1 = η(xt + Φ|zt; τ t)

zt = y − Φxt + 1 δ zt1 ⌦ η0(xt1 + Φ|zt1) ↵

min

x ky Φxk2 2 + λkxk1

Residual

y Φ

M ⌧ N

xo

16

slide-17
SLIDE 17

Approximate Message Passing

  • Approximate Message Passing (AMP)

[Donoho, Maleki, Montanari 2009] y = Φx

Φ|y

C

η ( Φ| y )

xo

xt+1 = η(xt + Φ|zt; τ t)

zt = y − Φxt + 1 δ zt1 ⌦ η0(xt1 + Φ|zt1) ↵

min

x ky Φxk2 2 + λkxk1

Residual

Gradient Step

y Φ

M ⌧ N

xo

17

slide-18
SLIDE 18

Approximate Message Passing

  • Approximate Message Passing (AMP)

[Donoho, Maleki, Montanari 2009] y = Φx

Φ|y

C

η ( Φ| y )

xo

xt+1 = η(xt + Φ|zt; τ t)

zt = y − Φxt + 1 δ zt1 ⌦ η0(xt1 + Φ|zt1) ↵

min

x ky Φxk2 2 + λkxk1

Residual

Gradient Step Projection Operator

y Φ

M ⌧ N

xo

18

slide-19
SLIDE 19

−1 −0.5 0.5 1 −0.5 0.5

Soft Thresholding

η(x, τ) τ

−τ

Approximate Message Passing

  • Approximate Message Passing (AMP)

[Donoho, Maleki, Montanari 2009] y = Φx

Φ|y

C

η ( Φ| y )

xo

xt+1 = η(xt + Φ|zt; τ t)

zt = y − Φxt + 1 δ zt1 ⌦ η0(xt1 + Φ|zt1) ↵

min

x ky Φxk2 2 + λkxk1

Residual

Gradient Step Projection Operator

y Φ

M ⌧ N

xo

19

slide-20
SLIDE 20

Sparse Regression

  • Approximate Message Passing (AMP)

[Donoho, Maleki, Montanari 2009]

xt+1 = η(xt + Φ|zt; τ t)

xt + Φ|zt = xo + vt

Effective Noise

y = Φx

Φ|y

C

η ( Φ| y )

xo

y Φ

M ⌧ N

xo

min

x ky Φxk2 2 + λkxk1

20

slide-21
SLIDE 21

Structured Regression

  • Denoising Approximate Message Passing (D-AMP)

[Metzler, Maleki, Baraniuk 2015]

xt+1 = Dt(xt + Φ|zt)

y = Φx

Φ|y

D ( Φ

|

y )

C

xo

min

x ky Φxk2 2 + λf(x)

y Φ

M ⌧ N

xo

21

slide-22
SLIDE 22

Unrolling Iterative Algorithms

Initial Estimate Calculate the Residual Update the Estimate

Until Convergence

Iterative Algorithm Unrolled Algorithm

Initial Estimate Updated Residual Updated Estimate Updated Residual Updated Estimate [Gregor and LeCun, 2010]

22

slide-23
SLIDE 23

Learned-Denoising-AMP

Learned-Denoising-AMP (LDAMP)

  • We use a 20-layer convolutional network as a denoiser [Zhang et al. 2017]
  • Two layers of the LDAMP network

Φ Φ

Φ| Φ|

xl+1 = Dl(xl + Φ|zl) zl = y − Φxl + 1 δ zl−1 ⌦ divDl(xl−1 + Φ|zl−1) ↵

[Metzler, Mousavi, Baraniuk, NIPS 2017] 23

slide-24
SLIDE 24

Training LDAMP and LDIT

End-to-End Training Layer-by-Layer Training

L1 L2 L3 L4 L5

L1 L1 L1 L1 L2 L2 L2 L3 L3 L4 L1 L2 L3 L4 L5

Denoiser-by-Denoiser Training

L1 L2 L3 L4 L5 D1, D2, . . . , Dq

24

slide-25
SLIDE 25

Training LDAMP

End-to-End Training Layer-by-Layer Training

L1 L2 L3 L4 L5

L1 L1 L1 L1 L2 L2 L2 L3 L3 L4 L1 L2 L3 L4 L5

Denoiser-by-Denoiser Training

L1 L2 L3 L4 L5 D1, D2, . . . , Dq

  • Lemma 1

Layer-by-layer training of LDAMP is MMSE optimal.

[Metzler, Mousavi, Baraniuk, NIPS 2017]

  • Lemma 2

Denoiser-by-denoiser training of LDAMP is MMSE optimal.

[Metzler, Mousavi, Baraniuk, NIPS 2017] 25

slide-26
SLIDE 26

Training LDAMP

End-to-End Training Layer-by-Layer Training

L1 L2 L3 L4 L5

L1 L1 L1 L1 L2 L2 L2 L3 L3 L4 L1 L2 L3 L4 L5

Denoiser-by-Denoiser Training

L1 L2 L3 L4 L5 D1, D2, . . . , Dq

  • Lemma 1

Layer-by-layer training of LDAMP is MMSE optimal.

[Metzler, Mousavi, Baraniuk, NIPS 2017]

  • Lemma 2

Denoiser-by-denoiser training of LDAMP is MMSE optimal.

[Metzler, Mousavi, Baraniuk, NIPS 2017]

Average PSNR (dB) of one hundred 40x40 images Recovered from i.i.d Gaussian Measurements

  • Denoiser-by-denoiser is

more generalizable.

  • Noise discretization

degrades the performance.

26

slide-27
SLIDE 27

Compressive Image Recovery

Original Image TVAL3 (26.4 dB, 6.85 sec) BM3D-AMP (27.2 dB, 75.04 sec) LDAMP (28.1 dB, 1.22 sec)

512x512 images, 20x undersampling, noiseless measurements

27

slide-28
SLIDE 28

summary so far

arg min

x ky Φxk2 2 subject to x 2 C

×

=

Φ y

M N

xo

min

x ky Φxk2 2 + λ ⇥ f(x)

x1, x1, . . . , xL

28

slide-29
SLIDE 29

Data-Driven Computational Sensing

min

x ky Φxk2 2 + λ ⇥ f(x)

×

=

Φ y

M N

xo

29

slide-30
SLIDE 30

Data-Driven Computational Sensing

min

x ky Φxk2 2 + λ ⇥ f(x)

×

=

Φ y

M N

xo

30

  • Mousavi, Maleki, Baraniuk, ‘Consistent Parameter Estimation’, Annals of Statistics 2017
  • Mousavi, Dasarathy, Baraniuk, ‘Data-Driven Sparse Representation’, ICLR 2019
slide-31
SLIDE 31

Summary so far

arg min

x ky Φxk2 2 subject to x 2 C

×

=

Φ y

M N

xo

min

x ky Φxk2 2 + λ ⇥ f(x)

x1, x1, . . . , xL

Supervised

31

slide-32
SLIDE 32

Next Step

arg min

x ky Φxk2 2 subject to x 2 C

×

=

Φ y

M N

xo

min

x ky Φxk2 2 + λ ⇥ f(x)

x1, x1, . . . , xL

Unsupervised

32

slide-33
SLIDE 33

Stein’s Unbiased Risk Estimator (SURE) [Stein ‘81]

  • A statistical model selection technique

Unknown Weakly differentiable

fθ(.)

33

slide-34
SLIDE 34

Monte-Carlo SURE [Ramani, Blu, Unser, 2008]

  • For bounded functions:
  • Challenge: Computing the divergence
  • Approximation:

34

slide-35
SLIDE 35

Denoising with Noisy Data

  • DnCNN Denoiser:
  • Training Data:
  • Loss Function:

MSE SURE

35

slide-36
SLIDE 36

Denoising with Noisy Data Results

Original Noisy Image BM3D (26.0 dB, 4.01 sec.) DnCNN SURE (26.5 dB, 0.04 sec.)DnCNN MSE (26.7 dB, 0.04 sec.)

36

slide-37
SLIDE 37

Compressive Image Recovery w/ Noisy Data

  • Problem Formulation:

Image: Measurements: Measurement Operator: Noise: Setting:

y = Φx + w, y ∈ RM x ∈ RN w ∈ RM Φ ∈ RM×N M ⌧ N

×

=

Φ y

M N

xo

+

w

37

slide-38
SLIDE 38

Recovery Algorithm

  • Learning Denoising-based AMP (LDAMP) Neural Network (for k=1,…,K):
  • Decouples image recovery into a series of denoising problems:
  • Layerwise Training of the LDAMP Network:

SURE MSE zk = y − Φxk + 1 mzk−1divDk−1

θk−1(xk−1 + Φ∗zk−1)

σk = kzkk2 pm

xk+1 = Dk

θk(xk + Φ∗zk)

xk + Φ∗zk = xo + σv

38

L1 L1 L1 L1 L2 L2 L2 L3 L3 L4 L1 L2 L3 L4 L5

Layer by Layer Training

[Donoho et al. 2009, 2011] [Bayati and Montanari, 2011]

slide-39
SLIDE 39

Original Image BM3D-AMP (31.3 dB, 13.2 sec.) LDAMP MSE (34.6 dB, 0.4 sec.) LDAMP SURE (31.9 dB, 0.4 sec.)

Compressive Image Recovery

5x undersampling

39

slide-40
SLIDE 40

Take away Messages!

  • There are three major paradigms

for signal acquisition.

  • Each paradigm puts resources
  • n one of the sampling, modeling,
  • r reconstruction tasks.
  • There seems to be a preservation
  • f computation between different

paradigms.

Sampling Modeling Reconstruction Nyquist Rate (~1900) Compressive Sensing (~2007) Our Work 40