min such that y 1 2 Then we have: C q S is created by - - PowerPoint PPT Presentation

min such that y 1 2 then we have c q s is created by
SMART_READER_LITE
LIVE PREVIEW

min such that y 1 2 Then we have: C q S is created by - - PowerPoint PPT Presentation

Suppose the matrix A= FY of size m by n (where sensing matrix F has size m by n , and basis matrix Y has size n by n ) has RIP property of order 2 S where d 2S < 0.41. Let the solution of the following be denoted as q * , (for signal f = Yq ,


slide-1
SLIDE 1

 Suppose the matrix A=FY of size m by n (where

sensing matrix F has size m by n, and basis matrix Y has size n by n) has RIP property of order 2S where d2S < 0.41. Let the solution of the following be denoted as q*, (for signal f = Yq, measurement vector y = FYq): Then we have: 

1 1 2

C S C    

S *

θ θ θ θ

qS is created by retaining the S largest magnitude elements

  • f q, and setting the rest to 0.

  

2 2 1

that such min ΦΨθ y θ

slide-2
SLIDE 2

 The proof can be found in a paper by Candes

“The restricted isometry property and its implications for compressed sensing”, published in 2008.

 The proof as such is just between 1 and 2

pages long.

slide-3
SLIDE 3

 The proof uses various properties of vectors

in Euclidean space.

 The Cauchy Schwartz inequality:  The triangle inequality:  Reverse triangle inequality:

2 2

| | w v w v  

2 2 2

w v w v   

2 2 2

w v w v   

slide-4
SLIDE 4

 Relationship between various norms:  Refer to Theorem 3. For the sake of simplicity

alone, we shall assume Y to be the identity matrix.

 Hence x = θ. Even if Y were not identity, the

proof as such does not change.

vector sparse

  • a

is if | |

2 2 1 2

k k n v v v 1 v v v     

slide-5
SLIDE 5

 We have:

 2 ) (

2 2 2

      y Φx y Φx x x Φ

* *

Triangle inequality Given constraint + feasibility

  • f solution x*

This result is called the Tube constraint. y=Φx

2ε x0

In the following, x0 = true signal

slide-6
SLIDE 6

 Define vector .  Decompose h into vectors hT0, hT1, hT2,… which

are all at the most s-sparse.

 T0 contains indices corresponding to the s

largest elements of x, T1 contains indices corresponding to the s largest elements of h(T0-c) = h-hT0, T2 contains indices corresponding to the s largest indices of h(T0 U T1)-c = h-hT0-hT1, and so

  • n.

*

x x h  

slide-7
SLIDE 7

 We will assume x0 is s-sparse (later we will

remove this requirement).

 We now establish the so-called cone

constraint.

1 1 1 *

x h x x   

1 1 1 1 1 1 1

| | | |

T c T c T T T i i i T i i i

h h x h h x x h x h x

c

          

   

 

The vector h has its origin at x0 and it lies in the intersection of the L1 ball and the tube. The vector h must also necessarily obey this constraint – the cone constraint. 0-valued

slide-8
SLIDE 8

 We will now prove that such a vector h is

  • rthogonal to the null-space of Φ.

 In fact, we will prove that .  In other words, we will prove that the

magnitude of h is not much greater than 2ε, which means that the solution x* of the

  • ptimization problem is close enough to x0.

2 2

h Φh 

slide-9
SLIDE 9

 In step 3, we use a bunch of algebraic

manipulations to prove that the magnitude of h outside of T0 U T1 is upper bounded by the magnitude of h on T0 U T1 .

 In other words, we prove that:  The algebra involves various inequalities

mentioned earlier.

2 2 2 T1 T1 T0 T0 T0 T0 c

  • T1)

(T0

h h h

 

 

slide-10
SLIDE 10

 We now prove that the magnitude of h on

T0 U T1 is upper bounded by a reasonable quantity.

 For this, we show using the RIP of Φ of order

2s and a series of manipulations that:

 This implies that

) 2 1 2 ( ) 1 (

2 2 2 2 1 2 2 1 2 2 1 2 T s s T T T T T T s

h h h h d d  d    F  

  

) 1 2 ( 1 1 4 ) 1 2 ( 1 1 2 1 2 1 1 2

2 2 2 2 2 2 1 2 1 2 2 2 2 2 1

                      

   s s s s T T T T s s s s T T

h h h h d d  d  d d d d d 

slide-11
SLIDE 11

 The steps change a bit. The cone constraint

changes to:

1 1 1 *

x h x x   

1 , 1 1 1 1 , 1 1 1 , 1

2 | | | |

c T T c T c T c T T T T i i i T i i i

x h h x x h h x x h x h x

c

     

            

 

slide-12
SLIDE 12

 All the other steps remain as is, except the

last one which produces the following bound:

s x x h

T s s s s 1 , 2 2 2 2 2

) 1 2 ( 1 ) 1 2 ( 1 ) 2 1 ( 1 1 2           d d  d d

slide-13
SLIDE 13

 Step 3 of the proof uses the following

corollary of the RIP for two s-sparse unit vectors with disjoint support:

 Proof of corollary:

s 2

| | d  

2 1 Φx

Φx

 )  )

, 1 ) 2 )( 1 ( ) 2 )( 1 ( 4 1 ) 1 ( ) 1 ( 4 1 4 1 | | RIP by , ) 1 ( ) ( ) 1 (

1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

                                

2 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1

x x x x x x x x x x x x x x x x x x x x x x x x x x x x

s s s s s s S

 d d d d d d d Φ Φ Φ Φ Φ Φ Φ

slide-14
SLIDE 14

 This step also uses the following corollary of

the RIP for two s-sparse unit vectors with disjoint support:

 What if the original vectors x1 and x2 were not

unit-vectors, but both were s-sparse?

s 2

| | d  

2 1 Φx

Φx

2 2 2 2 2 2

| | | |

2 1 2 1 2 1 2 1

x x Φx Φx x x Φx Φx

s s

d d     

slide-15
SLIDE 15

 The bound is  Note the requirement that δ2s should be less than 20.5-

1.

 You can prove that the two constant factors – one

before  and the other before |x0-x0,TO|1, are both increasing functions of δ2s in the domain [0,1].

 So sensing matrices with smaller values of δ2s are

always nicer!

1 2 2 2 2 2

) 2 1 ( 1 ) 1 2 ( 1 1 ) 2 1 ( 1 1 4

T0 T0 0, 0,

x x h          

s s s s

s d d  d d

slide-16
SLIDE 16

 Suppose the matrix A=FY of size m by n (where

sensing matrix F has size m by n, and basis matrix Y has size n by n) has RIP property of order S where dS < 0.307. Let the solution of the following be denoted as q*, (for signal f = Yq, measurement vector y=FYq): Then we have:

 d d

k k

S       307 . 1 ) 307 . ( 1

1 2 S *

θ θ θ θ

qS is created by retaining the S largest magnitude elements

  • f q, and setting the rest to 0.

  

2 2 1

that such min ΦΨθ y θ

slide-17
SLIDE 17

 Theorems 3,5,6 refer to orthonormal bases for

the signal to have sparse or compressible representations.

 However that is not a necessary condition.  There exist the so-called “over-complete

bases” in which the number of columns exceeds the number of rows (n x K, K > n).

 Such matrices afford even sparser signal

representations.

slide-18
SLIDE 18

 Why? We explain with an example.  A cosine wave (with grid-aligned frequency) will have a sparse

representation in the DCT basis V1.

 An impulse signal has sparse representation in the identity basis

V2.

 Now consider a signal which is the superposition of a small

number of cosines and impulses.

 The combined signal has sparse representation in neither the DCT

basis nor the identity basis.

 But the combined signal will have a sparse representation in the

combined dictionary [V1 V2].

slide-19
SLIDE 19

 We know that certain classes of random

matrices satisfy the RIP with very high probability.

 However, we also know that small RICs are

desirable.

 This gives rise to the question: Can we design

matrices with smaller RIC than a randomly generated matrix?

slide-20
SLIDE 20

 Unfortunately, there is no known efficient

algorithm for even computing the RIC given a fixed matrix!

 But we know that the mutual coherence of ΦY

is an upper bound to the RIC:

 So we can design a CS matrix by starting with a

random one, and then performing a gradient descent on the mutual coherence to reach a matrix with a smaller mutual coherence!

) 1 (   s

s

 d

slide-21
SLIDE 21

 The procedure is summarized below:

} )) ( ( { e convergenc until Repeat . matrix by a pick Randomly ΦΨ Φ Φ Φ Φ       n m

Pick the step-size adaptively so that you actually descend on the mutual coherence.

2 2

) ( ) ( ) ( ) ( max ) (

j j i i j i

ΦΨ ΦΨ ΦΨ ΦΨ ΦΨ  

slide-22
SLIDE 22

 The aforementioned is one example of a

procedure to “design” a CS matrix – as

  • pposed to picking one randomly.

 Note that mutual coherence has one more

advantage over RIC – the former is not tied to any particular sparsity level!

 But one must bear in mind that the mutual

coherence is an upper bound to the RIC!

slide-23
SLIDE 23

 The main problem is how to find a derivative

  • f the “max” function which is non-

differentiable!

 Use the softmax function which is

differentiable:

n i i n i i

x x

1 1

} max{ ) exp( log 1 lim

   

        

slide-24
SLIDE 24

 This method can now be used to design CS matrices.  As the mutual coherence function is expected to be

very non-convex, one must run a multi-start strategy in practice.

 For each “start”, you begin with a random Φ, do a

gradient descent on  till convergence.

 Repeat this procedure many times, each time

beginning from a different randomly chosen initial condition.

 Choose the value of  that is the smallest – among all

these starts.

slide-25
SLIDE 25

 In many cases, one needs additional constraints on

the matrix Φ.

 For example, in a Hitomi video camera architecture, Φ

is a concatenation of non-negative and diagonal matrices.

 The non-negativity can be imposed by means of

projected gradient descent.

 See the next slide for the modified algorithm to

maintain non-negativity.

slide-26
SLIDE 26

https://arxiv.org/abs/1609.0 2135

slide-27
SLIDE 27

} 0. to in entries negative Set )) ( ( { e convergenc until Repeat . matrix by a pick Randomly Φ Φ Φ Φ Φ Φ       n m

Pick the step-size adaptively so that you actually descend on the mutual coherence after setting the negative entries to 0.

slide-28
SLIDE 28

 This method does not directly target  but

instead considers the Gram matrix DTD where

 The aim is to design Φ in such a way that the

Gram matrix resembles the identity matrix as much as possible, in other words we want:

. normalized

  • unit

columns all with ΦΨ D  I ΦΨ Φ Ψ 

T T

slide-29
SLIDE 29

 We have extensively examined the issue of

reconstruction of signals or images from compressive reconstructions – algorithms, systems as well as theory (theorems).

 Now imagine you had compressive measurements for

each of a set of K classes of images.

 The task is to classify the measurements into one of the

K categories without intermediate reconstruction.

 This is called as the problem of compressive

classification.

slide-30
SLIDE 30

 Consider a vector y which is a noisy measurement

  • f vector x in the following way:

y = x + n, n ~ N(0,σ2).

 Let us suppose that x belongs to one of P classes,

each class containing a single representative vector si, 1 <= i <= P.

 The likelihood that y is a noisy sample from the ith

class is given as:

          

2 2

2 exp 2 1 ) | (   

i i

s y s y p

slide-31
SLIDE 31

 The maximum likelihood classifiers assigns to

y the class j such that

 By using the –log, we see that this reduces to

a nearest neighbour classifier with Euclidean distance.

           

 2 2 } ,..., 2 , 1 {

2 exp 2 1 ) | ( ) | ( max arg   

i i k

s y s y s y p p j

P k

slide-32
SLIDE 32

 Consider the earlier problem was an image

classification problem where we had P image templates in a database.

 We observed Gaussian noisy versions of one of these

templates and wanted to determine which one it was.

 Now in addition, let us suppose that the noisy version

  • f the image were acquired in some different “pose”,

i.e. with some translations and/or rotation.

 Let the pose parameters be denoted by a vector θ

which belong to a set of values Θ.

 In such a case, this becomes a joint problem – of

classification as well as pose estimation.

slide-33
SLIDE 33

 The problem is solved as follows:

) , | ( argmax ~ 2 ) ~ ( exp 2 1 ) ~ , | ( ) ~ , | ( max arg

2 2 } ,..., 2 , 1 {

θ s y θ θ s y θ s y θ s y

i θ i i i i i k k

p p p j

P k   

                 

This denotes image si deformed by pose parameter tilde(θ)i. By “deformation” we mean rotation and/or translation in this example. In general, it could mean any

  • ther type of transformation

including geometric scaling, blurring, etc.

slide-34
SLIDE 34

 If the parameters θ denoted pure translation,

then one way to classify y is to determine for which i, the following quantity is maximized: where t = (x,y) denotes spatial coordinates.

 This is called a matched filter and it is

equivalent to GMLC if all the candidate signals si had the same magnitude, and we assume additive white Gaussian noise.

t t y θ t si d ) ( ) ( 

slide-35
SLIDE 35

 Consider the following compressive

acquisition model:

 The MLC for this case is now:

n m R R x R y N x y

n m n m

  F    F 

 ,

, , ), , ( ~ ,

2

           F    F F 

 2 2 } ,..., 2 , 1 {

2 exp 2 1 ) | ( ) | ( max arg   

i i k

s y s y s y p p j

P k

slide-36
SLIDE 36

 But the compressive measurement y could be

acquired from an image which was acquired in a different pose than any of the images in the database.

 The GMLC is now given as:

) , | ( argmax ~ 2 ) ~ ( exp 2 1 ) ~ , | ( ) ~ , | ( max arg

2 2 } ,..., 2 , 1 {

θ s y θ θ s y θ s y θ s y

i θ i i i i i k k

F            F    F F 

  

p p p j

P k

  

This has an interesting name – the smashed filter (derived from the name “matched filter”), taking into account the compressive nature of the measurements.