From Supervised to Unsupervised Computational Sensing Ali Mousavi - PowerPoint PPT Presentation

From Supervised to Unsupervised Computational Sensing Ali Mousavi Aug 12 th 2019 brain Brain vision summit 1

Collaborators Rich Baraniuk Arian Maleki Rice University Columbia University Chris Metzler Reinhard Heckel Gautam Dasarathy 2 Stanford University Rice University Arizona StateUniversity

Computational Sensing • Conventional Sensing Ψ Expensive Subject Hardware Computational Sensing: Reduce costs in acquisition systems • by replacing expensive hardware w/ cheap hardware + computation Φ Computation Simpler Subject Software Measurements Hardware 3

Large Scale Datasets 4

Data-Driven Computational Sensing Recovered Subject Subject Measurements Simpler Computational Hardware Software 5

Model Φ Computation Simpler Subject Software Measurements Hardware Φ − 1 ( . ) Φ ( . ) y = Φ ( x ) ∈ R M x ∈ R N x ∈ R N ˆ − − − → − − − − → Overdetermined Determined Underdetermined Φ Φ Φ y y y x x x × × × = = = 6 M > N M = N M < N

Model Φ Computation Simpler Subject Software Measurements Hardware Φ − 1 ( . ) Φ ( . ) y = Φ ( x ) ∈ R M x ∈ R N x ∈ R N ˆ − − − → − − − − → N M = × Φ y 7 x

Applications 8

Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) 9

Iterative Algorithms Φ y x o M ⌧ N x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) Initial Estimate Calculate the Residual Until Convergence Update the Estimate 10

Iterative Algorithms Φ x o y x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) M ⌧ N x o y = Φ x C 11

Sparse Regression x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x k y � Φ x k 2 2 + λ k x k 1 min Φ y x o M ⌧ N 14

Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ x o y = Φ x C Φ | y ) Φ | y ( η 15

Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Residual x o y = Φ x C Φ | y ) Φ | y ( η 16

Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] Gradient Step x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Residual x o y = Φ x C Φ | y ) Φ | y ( η 17

Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] Gradient Step Projection Operator x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Residual x o y = Φ x C Φ | y ) Φ | y ( η 18

Approximate Message Passing y Φ x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N Approximate Message Passing (AMP) • [Donoho, Maleki, Montanari 2009] Gradient Step Projection Operator x t +1 = η ( x t + Φ | z t ; τ t ) z t = y − Φ x t + 1 η 0 ( x t � 1 + Φ | z t � 1 ) δ z t � 1 ⌦ ↵ Soft Thresholding 0.5 Residual x o η ( x, τ ) 0 y = Φ x C Φ | y ) Φ | y ( η 19 − 0.5 − 1 − 0.5 0 0.5 1 τ − τ

Sparse Regression Φ y x o x k y � Φ x k 2 2 + λ k x k 1 min M ⌧ N • Approximate Message Passing (AMP) [Donoho, Maleki, Montanari 2009] x o x t +1 = η ( x t + Φ | z t ; τ t ) y = Φ x C x t + Φ | z t = x o + v t Φ | y ) Φ | y ( η Effective Noise 20

Structured Regression Φ y x o x k y � Φ x k 2 min 2 + λ f ( x ) M ⌧ N • Denoising Approximate Message Passing (D-AMP) [Metzler, Maleki, Baraniuk 2015] x o x t +1 = D t ( x t + Φ | z t ) y = Φ x C Φ | y ) | y Φ ( D 21

Unrolling Iterative Algorithms Iterative Algorithm Unrolled Algorithm Initial Estimate Initial Estimate Updated Residual Calculate the Residual Updated Estimate Until Convergence Update the Estimate Updated Residual Updated Estimate 22 [Gregor and LeCun, 2010]

Learned-Denoising-AMP Learned-Denoising-AMP (LDAMP) [Metzler, Mousavi, Baraniuk, NIPS 2017 ] x l +1 = D l ( x l + Φ | z l ) z l = y − Φ x l + 1 div D l ( x l − 1 + Φ | z l − 1 ) δ z l − 1 ⌦ ↵ We use a 20-layer convolutional network as a denoiser [Zhang et al. 2017] • Two layers of the LDAMP network • Φ | Φ Φ | Φ 23

Training LDAMP and LDIT End-to-End Training Layer-by-Layer Training Denoiser-by-Denoiser Training L 1 L 1 L 2 D 1 , D 2 , . . . , D q L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 5 24

Training LDAMP End-to-End Training Layer-by-Layer Training Denoiser-by-Denoiser Training L 1 L 1 L 2 D 1 , D 2 , . . . , D q L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 5 • Lemma 1 [Metzler, Mousav i , Baraniuk, NIPS 2017 ] Layer-by-layer training of LDAMP is MMSE optimal. Lemma 2 • [Metzler, Mousavi, Baraniuk, NIPS 2017 ] Denoiser-by-denoiser training of LDAMP is MMSE optimal. 25

Training LDAMP End-to-End Training Layer-by-Layer Training Denoiser-by-Denoiser Training L 1 L 1 L 2 D 1 , D 2 , . . . , D q L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 L 1 L 2 L 3 L 4 L 5 Lemma 1 • [Metzler, Mousavi, Baraniuk, NIPS 2017 ] Layer-by-layer training of LDAMP is MMSE optimal. Lemma 2 • [Metzler, Mousavi, Baraniuk, NIPS 2017 ] Denoiser-by-denoiser training of LDAMP is MMSE optimal. Average PSNR (dB) of one hundred 40x40 images Recovered from i.i.d Gaussian Measurements • Noise discretization degrades the performance. • Denoiser-by-denoiser is more generalizable. 26

Compressive Image Recovery 512x512 images, 20x undersampling, noiseless measurements BM3D-AMP (27.2 dB, 75.04 sec) Original Image 27 LDAMP (28.1 dB, 1.22 sec) TVAL3 (26.4 dB, 6.85 sec)

summary so far N M = × y Φ x o x k y � Φ x k 2 2 subject to x 2 C arg min x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x 1 , x 1 , . . . , x L 28

Data-Driven Computational Sensing N M = × Φ y x o x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) Mousavi, Maleki, Baraniuk, ‘Consistent Parameter Estimation’, Annals of Statistics 2017 • Mousavi, Dasarathy, Baraniuk, ‘Data-Driven Sparse Representation’, ICLR 2019 • 30

Summary so far N M = × y Φ x o x k y � Φ x k 2 2 subject to x 2 C arg min x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x 1 , x 1 , . . . , x L Supervised 31

Next Step N M = × y Φ x o x k y � Φ x k 2 2 subject to x 2 C arg min x k y � Φ x k 2 min 2 + λ ⇥ f ( x ) x 1 , x 1 , . . . , x L Unsupervised 32

Stein’s Unbiased Risk Estimator (SURE) [Stein ‘81] • A statistical model selection technique Unknown f θ ( . ) Weakly differentiable 33

Monte-Carlo SURE [Ramani, Blu, Unser, 2008] • Challenge: Computing the divergence For bounded functions: • Approximation: • 34

Denoising with Noisy Data • DnCNN Denoiser: • Training Data: • Loss Function: MSE SURE 35

Denoising with Noisy Data Results Original Noisy Image BM3D (26.0 dB, 4.01 sec.) 36 DnCNN SURE (26.5 dB, 0.04 sec.)DnCNN MSE (26.7 dB, 0.04 sec.)

Compressive Image Recovery w/ Noisy Data • Problem Formulation: x ∈ R N Image: y ∈ R M y = Φ x + w , Measurements: Φ ∈ R M × N Measurement Operator: w ∈ R M Noise: Setting: M ⌧ N N M + = × Φ y 37 x o w

Recovery Algorithm • Learning Denoising-based AMP (LDAMP) Neural Network (for k=1,…,K): z k = y − Φ x k + 1 θ k − 1 ( x k − 1 + Φ ∗ z k − 1 ) m z k − 1 div D k − 1 σ k = k z k k 2 p m Layer by Layer Training x k +1 = D k θ k ( x k + Φ ∗ z k ) L 1 L 1 L 2 L 1 L 2 L 3 L 1 L 2 L 3 L 4 L 1 L 2 L 3 L 4 L 5 • Decouples image recovery into a series of denoising problems: x k + Φ ∗ z k = x o + σ v [Donoho et al. 2009, 2011] [Bayati and Montanari, 2011] • Layerwise Training of the LDAMP Network: MSE SURE 38

Compressive Image Recovery 5x undersampling Original Image BM3D-AMP (31.3 dB, 13.2 sec.) 39 LDAMP MSE (34.6 dB, 0.4 sec.) LDAMP SURE (31.9 dB, 0.4 sec.)

Take away Messages! There are three major paradigms • for signal acquisition. Sampling Modeling Reconstruction Each paradigm puts resources • on one of the sampling, modeling, Nyquist Rate (~1900) or reconstruction tasks. Compressive Sensing (~2007) Our Work There seems to be a preservation • of computation between different paradigms. 40

From Supervised to Unsupervised Computational Sensing Ali Mousavi - PowerPoint PPT Presentation

From Supervised to Unsupervised Computational Sensing Ali Mousavi Aug 12 th 2019 brain Brain vision summit 1 Collaborators Rich Baraniuk Arian Maleki Rice University Columbia University Chris Metzler Reinhard Heckel Gautam Dasarathy 2

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Current State of Unsupervised Deep Learning William Falcon, PhD Student AGENDA AGENDA

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

SENSING ACTUATION Cluj school, September 2007 SENSING ACTUATION MAGNETIC MAGNETIC SENSING

Review Network flow definitions CSE 421 Flow examples Augmenting Paths Algorithms

Weight Parameterizations in Deep Neural Networks Sergey Zagoruyko e Paris-Est, Universit

Interferometric Residual Phase Noise Measurement System Pakpoom Buabthong Lee Teng Internship

Formalizing the Edmonds-Karp Algorithm Peter Lammich and S. Reza Sefidgar TU Mnchen August

R02 - Regression diagnostics STAT 587 (Engineering) Iowa State University October 21, 2020 All

Statistical Modelling in Stata 5: Linear Models Mark Lunt Centre for Epidemiology Versus

Smoother Scheme Oren Peles and Eli Turkel Department of Applied Mathematics, Tel-Aviv University

RLT: Residual-Loop Training in Collaborative Filtering for Combining Factorization and