Fast Data Driven Compressed Sensing and application to compressed - - PowerPoint PPT Presentation
Fast Data Driven Compressed Sensing and application to compressed - - PowerPoint PPT Presentation
IDCOM, University of Edinburgh Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies & Mohammad Golbabaee Joint work with Zhouye Chen, Yves Wiaux Zaid Mahbub, Ian Marshall IDCOM, University of
IDCOM, University of Edinburgh
Outline
- Iterative Projected Gradients (IPG)
- Approximate/inexact oracles
- Robustness of inexact IPG
- Application to data driven Compressed Sensing
- Fast MR Fingerprinting reconstruction
- IPG with Approximate Nearest Neighbour searches
- Cover trees for fast ANN
- Numerical results
IDCOM, University of Edinburgh
Inverse problems
π§=π΅π¦+π₯ββπΊβ πΊβπ , Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘π΅:βπΊβ πΊβπ ββπΊβ πΊβπ where πβͺπ
Challenge: Missing information Complete measurements can be costly, time consuming and sometimes just impossible! Compressed sensing to address the challenge
[Donohoβ06; CandΓ¨s, Romberg, and Tao,β06]
IDCOM, University of Edinburgh
Data models/priors
Manifolds
IDCOM, University of Edinburgh
Solving Compressed Sensing/Inv. Problems
Estimating by constrained least squares
βπ¦ βargminβ {π(π¦)ββ1/2 β||π§βπ΅π¦||β2β2 Β‘} Β‘ Β‘ Β‘ Β‘π‘.π’. Β‘ Β‘ Β‘π¦βπ·
!! NP-hard for most interesting models (e.g. sparsity [Natarajanβ95]) Iterative Gradient Projection (IPG) Generally proximal-gradient algorithms are very popular: βπ¦βπ =βπΈβ
πΈβπ· (π¦βπβ1 βπβAβT (βAxβkβ1 βy))
Gradient βAβT (βAxβkβ1 βy)), step size π, , Euclidean projection
βπΈβ πΈβπ· (π¦)βargminβ||π¦ββπ¦ββ² ||β2 Β‘ Β‘ Β‘ Β‘π‘.π’. Β‘ Β‘ Β‘ Β‘βπ¦ββ² βπ· Β‘
Signal/data model projection (observation) nonlinear approximation (reconstruction)
IDCOM, University of Edinburgh
Embedding: key to CS/IPG stability
Model π·ββπΊ βπΊβπ π π(π) π Β‘(unstructured) Β‘points βπββ2 log(π) βπβπβπΏ -flats βπββ2 (πΏ+log(π)) rank π Β‘ Β‘(ββ π Γββ π ) matrices βπββ2 π βπ βsmoothβ π dim. manifold βπββ2 π πΏ tree-sparse βπββ2 πΏ
[Johnson, Lindenstraussβ89] [Blumensath, Daviesβ09] [CandΓ¨s,Recht;Ma etal.β09] [Wakin,Baraniukβ09] [Baraniuk etal.β09]
Bi-Lipschitz embeddable sets: βπ¦,π¦β²βπ·
βπ½||π¦ββπ¦ββ² ||β2 β€||π΅(π¦ββπ¦ββ² )||β2 β€βπΎ||π¦ββπ¦ββ² ||β2 Theorem.[Blumensathβ11] For any (π·, Β‘π΅) Β‘if holds πΎβ€1.5π½, ,
IPG Γ ο‘ stable & linear convergence: |β|π¦βπΏ ββπ¦β0 ||βπ(π₯)+π, Β‘ Β‘ Β‘ Β‘πΏβΌ(log Β‘βπββ1 )
Global optimality even for nonconvex programs! Sample complexity e.g. π΅~ i.i.d. subgaussian ~ i.i.d. subgaussian
IDCOM, University of Edinburgh
A limitationβ¦
Exact oracles might be too expensive or even do not exist! Gradient βAβT (βAxβkβ1 βy))
- π΅ too large to fully access or fully compute/update πΌπ Β‘
too large to fully access or fully compute/update πΌπ Β‘
- Noise in communication in distributed solvers
Projection βπΈβ
πΈβπ· (π¦)βargminβ||π¦ββπ¦ββ² ||β2 Β‘π‘.π’. Β‘ Β‘βπ¦ββ² βπ·
- βπΈβ
πΈβπ· may not be analytic and requires solving an auxiliary optimization (e.g. inclusions π·= Β‘βπββ Β‘ βπ·βπ , total variation ball, low-rank, tree-sparse,β¦)
- βπΈβ
πΈβπ· might be NP hard! (e.g. analysis sparsity, low-rank tensor decomposition)
Is IPG robust against inexact/approximate oracles?
IDCOM, University of Edinburgh
Inexact oracles I: Fixed Precision
βπ¦βπ =βπΈ πΈ βπ· (π¦βπβ1 βπ Β‘βπΌ Β‘π(βπ¦βπβ1 ))
Fixed Precision (FP) approximate oracles:
β||βπΌ Β‘π(.)βπΌπ(.)||β2 β€βπβπ , Β‘ Β‘ Β‘ Β‘ Β‘β||βπΈβ πΈβπ· Β‘(.)ββπΈβ πΈβπ· (.)||β2 β€βπβπ , Β‘ Β‘ Β‘(β Β‘ Β‘βπΈ βπ· (.)βπ· Β‘) Examples: TV ball, inclusions (e.g. Djkstra alg.), and many moreβ¦ (in convex settings, Duality gap Γ ο‘ FP proj.)
Progressive Fixed Precision (PFP) oracles:
β||βπΌ Β‘π(.)βπΌπ(.)||β2 β€βπβπβπ , Β‘ Β‘ Β‘ Β‘ Β‘β||βπΈ πΈ Β‘(.)βπΈ(.)||β2 β€βπβπβπ Examples: Any FP oracle with progressive refinement of the approx. levels e.g. convex sparse CUR factorization for βπβπβπ βΌπ(1/βπβ3 ) [Schmidt etal.β11] x xβ C
IDCOM, University of Edinburgh
Inexact oracles II: (1+ 1+π)-optimal
(1+π)-approximate projections combined with FP/PFP gradient oracle
βπ¦βπ =βπΈβ
πΈβπ βπ· (π¦βπβ1 βπ Β‘βπΌβk Β‘π(βπ¦βπβ1 )) Gradient β||βπΌ βπ Β‘π(.)βπΌπ(.)||β2 β€βπβπβπ
Projection β||βπΈβ
πΈβπ βπ· (π¦)βπ¦||β2 β€β(π+π)||βπΈβ πΈβπ· Β‘(π¦)βπ¦||β2 Β‘ Β‘ Β‘ Β‘ Β‘
Examples. Many nonconvex constraints: Cheaper low-rank proxies based on randomized lin. algebra [Halko etal.β11], K-tree sparse signals [Hegde etal.β14], Tensor low-rank (Tucker) decomposition [Rauhut etal.β16], β¦ and shortly, ANN for data driven CS!
βπΈβ πΈβπ βπ· (π¦)βπ·
IDCOM, University of Edinburgh
Robustness & linear convergence
- f the
inexact IPG
IDCOM, University of Edinburgh
IPG with (P)FP oracles
βπ¦βπ =βπΈ πΈ βπ· (π¦βπβ1 βπ Β‘βπΌ Β‘π(βπ¦βπβ1 ))
Remark:
βπββπ supresses the early stages errors
β use βprogressiveβ approximations to get as good as exact!
- Theorem. For any β(π¦β0 βπ·, Β‘π·,π΅) Β‘if πΎβ€βπββ1 <2βπ½β0 then
||βπ¦βπ ββπ¦β0 ||β€βπβπ (||βπ¦β0 ||+βπ=1βπββπββπ βπβπ )+β2ββ πΎ /(1βπ)βπ½β0 π₯ where, π=ββ β1βπβπ½β0 Β‘ β1 Β‘ and βπβπ =β2βπβπβπ /βπ½β0 +ββ βπβπβπ /βππ½β0
IDCOM, University of Edinburgh
IPG with (P)FP oracles
βπ¦βπ =βπΈ πΈ βπ· (π¦βπβ1 βπ Β‘βπΌ Β‘π(βπ¦βπβ1 ))
Corollary I. After πΏ=π(log(βπββ1 ) Β‘) iterations IPG-FP achieves
||βπ¦βπΏ ββπ¦β0 ||β€π(π₯+βπβπ +ββ βπβπ )+π Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘
linear convergence at rate π=ββ β1βπβπ½β0 Β‘ β1 Β‘ . Corollary II. Assume ββπ <1 Β‘ Β‘π‘.π’. Β‘ Β‘πβπ =π(βπ βπ ), then after πΏ=π(log(βπββ1 ) Β‘) iterations IPG-PFP achieves
||βπ¦βπΏ ββπ¦β0 ||β€π(π₯)+π Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘
linear convergence at rate βπ ={ββ‘max Β‘(π, Β‘π ) Β‘ Β‘πβ π β π+π Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘π=π Β‘ Β‘ (for any small π>0) Β‘
IDCOM, University of Edinburgh
IPG with (1+ 1+π) π)-approximate projection
βπ¦βπ =βπΈβ πΈβπ·βπ (π¦βπβ1 βπ Β‘βπΌβk Β‘π(βπ¦βπβ1 ))
- Theorem. Assume for any β(π¦β0 βπ·, Β‘π·,π΅) Β‘πππ Β‘ππ Β‘πβ₯0 it holds
ββ 2π+βπβ2 β€πβββ βπ½β0 /|||π΅||| Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘πππ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘πΎβ€βπββ1 <β(2β2π+βπβ2 )π½ββπ¦β0 Then, ||βπ¦βπ ββπ¦β0 ||β€βπβπ (||βπ¦β0 ||+βπβπ βπ=1βπββπββπ βπβπβπ )+βπβπ¨ /
(1βπ) π₯
where, π=ββ β1βπβπ½β0 Β‘ β1 +π Β‘ Β‘ Β‘ Β‘ Β‘ Β‘βπβπ =β2/βπ½β0 +ββπ/|||π΅||| π Β‘ Β‘ Β‘ Β‘ Β‘ Β‘πππ Β‘ Β‘ Β‘ Β‘βπβπ¨ =β2ββ πΎ /βπ½β0
+ββ π π
Remarks.
- Requires stronger embedding cond., slower convergence!
- Still linear conv. & π(π₯)+π Β‘ accuracy after π(log Β‘βπββ1 Β‘) iterations
- higher noise amplification
IDCOM, University of Edinburgh
Application in data driven CS
IDCOM, University of Edinburgh
Data driven CS
In the absence of (semi) algebraic physical models (l0, l1, rank,β¦) Collect a possibly large dictionary (sample the model)
π·=ββͺβπ=1,β¦,π {βπβπ } Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘βπβπ Β‘ππ’πππ‘ Β‘ππ Β‘ Β‘ ‘ΨββπβπΓπ
Examples in multi-dim. imaging: Hyperspectral, Mass/Raman/MALDI,β¦ spectroscopy [Golbabaee et al β13;
Duarte,Baraniukβ12;β¦]
IDCOM, University of Edinburgh
MR Fingerprinting: fast CQ-MRI
Goal: Measuring fast the NMR properties (relaxation times: T1, T2) [Ma etal.β13]
1. Multiple random/optimized excitations (magnetic field rotation) [Cohenβ15;Mahbub, Golbabaee,
D., Marshall β16]
2. Subsampling the k-space (per excitation) 3. Construct a (huge) dictionary of βfingerprintsβ i.e. solving Bloch dynamic eq. for all possible parameters (T1,T2)β¦ 4. Use this dictionary to solve inverse problem
- Exact IPG [D. et al β15]
βπ Β‘m(t)/π Β‘π’ = Β‘m(t) ‘à ‘γ Β‘hRF(t) Β‘β Β‘ Β‘ Β‘(ββ‘βmβπ¦ (π’)/T2@βπβπ§ (π’)/T2@β(πβπ¨ (π’)ββmβeq )/T1 ) π Ξ¨=β[β¬]βπ
Continuous model (Bloch manifold) exists,
- but not explicit i.e. requires solving dynamic ODEs
- MRF sacrifices memory ~ enables βπΈβ
πΈβπ· computation
IDCOM, University of Edinburgh
CS in product (tensor) space
Per pixel data driven model π·=ββͺβπ=1,β¦,π {βπβπ } Β‘ Β‘ Β‘ Β‘ Β‘ Β‘ Β‘βπβπ Β‘ππ’πππ‘ Β‘ππ ‘ΨββπβπΓπ Multi dim. image: βπββπβπΓπ Β‘ Β‘π₯βππ π Β‘ Β‘ Β‘πβπ βπ· Β‘ Β‘ Β‘βπ=1,β¦,π Inverse problem: βminβ¬βπβπ€ππ ββπββπ· β||π§βπ΅(π)||β2 Direct recovery complexity O(βPβd )!
IPG (I-NNsearch-G)
βπβπβπ =πβπβπ· (β[βπβπβ1 βπβπ΅βπΌ (π΅(βπβπβ1 )βπ§)]βπ ), Β‘βπ Complexity = iter Γ (gradient+O(Pd)) Sufficient RIP stability π=π(π Β‘π) Β‘, (ignoring inter channel structures) π= RIP complexity of each channel
. . .
Big datasets?
MRF uses large dictionaries d~ 500K
Can we do better?
IDCOM, University of Edinburgh
Fast NN searches
Trees: (historical) approach to fast NN
- Hierarchical partitioning + Brand & bound
- search. e.g., kd-trees, Metric/Ball-trees,β¦
βCurse of dimensionalityβ: exact NN in βπΊβ
πΊβπ cannot achieve π(ππ) Β‘in time with a reasonable memory. Approximate NN can! If datasets ~ low dim.
Nagivating nets, Cover trees
[Krauthgamer,Leeβ04; Beygelzimmerβ06]
At scale π Β‘= Β‘1,β¦,π
- Covering (parent nodes) βπ2ββl
- Separation (nodes appearing at scale l) βπ2ββl
Kd-tree Partitions Search
CT builds multi-resolution cover-nets Brand & bound search
IDCOM, University of Edinburgh
CT provably good for low dim data
Search options with cover trees
1. (1+π)-ANN: as proposed by [Beygelzimmer et al.β06]
- 2. FP-ANN: truncated tree L=βlogβ(πβπ /π)β+ exact NN
(complexity could be arbitrary large/theoretically)
- 3. PFP-ANN: progressing on truncation level βπβπβπ =π(βπ βπ ), Β‘ Β‘π <1
Inexact IPG
βπβπβπ =βπ΅ππβπ· (β[βπβπβ1 βπβπ΅βπΌ (π΅(βπβπβ1 )βπ§)]βπ ), Β‘βπ
- Theorem. Cover tree (1+π)-ANN complexity: [Krauthgamer + Lee 04]
β2βπ«(βdimβ (π·) ) βlogβ Ξ +βπββπ«(βdimβ (π·) ) in time, π(π) space
(typically βlogβ Ξ =π«(βlogβ (π) ))
IDCOM, University of Edinburgh
Numerical experiments 1: Toy problem
IDCOM, University of Edinburgh
2D manifold data
π·=ββͺβπ=1,β¦,π {βπβπ } Β‘ Β‘ Β‘βπβπ Β‘ππ’πππ‘ ‘ΨββπβπΓπ CT level 2 CT level 3 CT level 4 CT level 5
IDCOM, University of Edinburgh
Solution accuracy vs. Iterations (FP)
Signal: πββπΊβ
πΊβπΓπ Β‘π=200 Β‘, Β‘π=50 Β‘(randomly chosen βC) mπΓππ i.i.d. Normal A, CS ratio=m/n (noiseless)
i.i.d. Normal A, CS ratio=m/n (noiseless)
βminβ¬βπβπ€ππ ββπββπ· β||π§βπ΅(π)||β2 Β‘ Β‘ Β‘ Β‘β Β‘βπβπβπ =π΅πβπβπ· (β[βπβπβ1 βπβπ΅βπΌ (π΅(βπβπ β1 )βπ§)]βπ ), Β‘βπ Swiss roll
IDCOM, University of Edinburgh
Solution accuracy vs. Iterations (PFP)
Signal: πββπΊβ
πΊβπΓπ Β‘π=200 Β‘, Β‘π=50 Β‘(randomly chosen βC) mπΓππ i.i.d. Normal A, CS ratio=m/n (noiseless)
i.i.d. Normal A, CS ratio=m/n (noiseless)
βminβ¬βπβπ€ππ ββπββπ· β||π§βπ΅(π)||β2 Β‘ Β‘ Β‘ Β‘β Β‘βπβπβπ =π΅πβπβπ· (β[βπβπβ1 βπβπ΅βπΌ (π΅(βπβπ β1 )βπ§)]βπ ), Β‘βπ Swiss roll βπβπβπ =βπ βπ
IDCOM, University of Edinburgh
Solution accuracy vs. Iterations (1+ 1+π)-ANN
Signal: πββπΊβ
πΊβπΓπ Β‘π=200 Β‘, Β‘π=50 Β‘(randomly chosen βC) mπΓππ i.i.d. Normal A, CS ratio=m/n (noiseless)
i.i.d. Normal A, CS ratio=m/n (noiseless)
βminβ¬βπβπ€ππ ββπββπ· β||π§βπ΅(π)||β2 Β‘ Β‘ Β‘ Β‘β Β‘βπβπβπ =βπ΅ππβπ· (β[βπβπβ1 βπβπ΅βπΌ (π΅(βπβπ β1 )βπ§)]βπ ), Β‘βπ Swiss roll
IDCOM, University of Edinburgh
Phase transitions
(1+π)-ANN IPG
PFP-ANN IPG Β‘ Β‘ Β‘βπβπβπ =βπ βπ
Signal: πββπΊβ
πΊβπΓπ Β‘π=200 Β‘, Β‘π=50 Β‘(randomly chosen βC) mπΓππ i.i.d. Normal A, CS ratio=m/n (noiseless) ~ averaged 25trials
i.i.d. Normal A, CS ratio=m/n (noiseless) ~ averaged 25trials
Recovery PT: Black/white = low/high sol. nMSE, red curve = recovery region nMSE<10e-4)
IDCOM, University of Edinburgh
Numerical experiments 2: MRF
IDCOM, University of Edinburgh
MRF cone & EPI acquisition
- K-space subsampling: Eco Planar Imaging (EPI)
- Anatomical phantom {Grey/White matters, CSF, muscle, skin}
- Bloch eq. dictionary Ξ¨ββββ512 Β‘Γ~ Β‘50β²000
- βminβ¬ β||π§βπ΅(π)||β2 Β‘ Β‘ Β‘ Β‘π‘.π’. Β‘ Β‘ Β‘ Β‘βπβπ€ππ ββπββππππ(Ξ¨)
- 1. βπβπ =β Β‘πβπβkβ1 βπβAβH (β Β‘A(πβπβkβ1 )βy))
- 2. βπβπ =π΅πβπβπππ πππππ‘ππ{Ξ¨} (βπβπβπ /||βπβπβπ ||)
- 3. βπβπβπ =βπβπ /βπβπ
- 4. βπβπβπ =βπβπ βπβπ , Β‘ Β‘βπ
π Ξ¨=β[β¬]βπ
PD map T2map T1 map
IDCOM, University of Edinburgh Dominant costΓ ο‘ NN/ANN (since π΅ is FFT) is FFT) Projection cost = # matches calculated (i.e. visited nodes on the tree)
Accuracy vs. computation
Brute force CTβs exact NN
π Β‘~ 0.2-0.8
IDCOM, University of Edinburgh
Summary
- IPG robustness to inexact oracles (under embedding assumption)
- Linear convergence result:
- PFP/(1+π)-oracles: same final accuracy vs. exact IPG
- PFP: same convergence rate vs. exact IPG
- (1+π): stronger assumptions/sensitive to conditioning of A
- Implications in data driven CS (using ANN)
- Cover trees for fast ANN: complexity ~ intrinsic dim(data)
- O(10e3) faster parameter estimation in MRF