pooling fidelity and phase recovery joan bruna arthur
play

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and - PowerPoint PPT Presentation

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and Yann LeCun Neural networks: (Simplest version) functions of the form L k L k 1 ... L 0 , each L j is of the form L j ( x j 1 ) = h ( A j x j b j ) ,


  1. Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and Yann LeCun

  2. Neural networks: • (Simplest version) functions of the form L k ◦ L k − 1 ◦ ... ◦ L 0 , • each L j is of the form L j ( x j − 1 ) = h ( A j x j − b j ) , • A j is a matrix and b j is a vector • h is an elementwise nonlinearity. • A j optimized for a given task, usually via gradient descent.

  3. Convolutional neural networks: • The input x j has a grid structure, and A j specializes to a convolution. • The pointwise nonlinearity is followed by a pooling operator. • Pooling introduces invariance (on the grid) at the cost of lower res- olution (on the grid).

  4. Pooling in neural networks: • Usually block l p : � | z i 1 | p + | z i 2 | p + ... + | z i s | p = || z I i || p p [ P ( z )] i = • In words: the i th coordinate of the output is the l p norm of the i th block of z . • In convolutional nets, blocks of indices are usually small spatial blocks, • p is either 1, 2, or most usually, ∞ .

  5. Examples/History: • Hubel & Wiesel 1962 – neuroscientists’ description of lower level mammalian vision • Fukushima 1971: Neocognitron – An artificial version. – Filters hand designed in first versions, later used Hebbian learning. – Each layer has the same architecture

  6. • LeCun 1988, many others: Convolutional nets – Filters trained discriminatively (original versions), and for recon- struction and discrimination, (2005+). – Training end to end via backpropagation (the chain rule) and stochastic gradient descent. – currently state of the art in object recognition, localization, and detection in images, and in various speech recognition, localiza- tion, and detection tasks. – mathematically poorly understood. (not controversial) – poorly understood. (controversial)

  7. some wild speculation: • sparse coding • piecewise linear maps • pooling is key! (sparsify/decorrelate then contract). • output of network should be invariant to things we don’t care about, but sensitive to things we care about.

  8. Have results of Mallat and co-authors: • Convnet with l 2 pooling, specially chosen filters, and some other modifications, is provably invariant to deformations while preserving signal energy. • what about sensitivity to things we care about?

  9. The phase recovery problem: • Classical version: goal is to find a signal z ∈ C d given the absolute value of the (discrete) Fourier transform | F ( z ) | • With no additional information, this is not possible: 1. each coefficient can be rotated independently. 2. even if z is real, absolute value of Fourier is translation invariant. 3. In a strong sense, the majority of the information in signals we know and love is in the Fourier phase, not the magnitude:

  10. cow duck

  11. phase of duck, abs of cow phase of cow, abs of duck

  12. • Much simpler: consider a real dictionary; “phase” is the signs of the analysis coefficients. • If dictionary is orthogonal, can flip the sign of an inner product at will. • If dictionary is overcomplete, interactions force rigidity:

  13. Proposition 1 (Balan et al 2006,2013) . Let F = ( f 1 , . . . , f m ) with f i ∈ R n and set d ( x, x ′ ) = min( � x − x ′ � , � x + x ′ � ) , and λ − ( G ) and λ + ( G ) to be the lower and upper frame bounds of a set of vectors G . The mapping M ( x ) = {|� x, f i �|} i ≤ m satisfies ∀ x, x ′ ∈ R n , A d ( x, x ′ ) ≤ � M ( x ) − M ( x ′ ) � ≤ B d ( x, x ′ ) , (1) where � λ 2 − ( F S ) + λ 2 A = min − ( F S c ) , (2) S ⊂{ 1 ...m } B = λ + ( F ) . (3) In particular, M is injective if and only if for any subset S ⊆ { 1 , . . . , m } , either F S or F S c is an invertible frame.

  14. Proof of “in particular” (assuming F is spanning): • Suppose for any subset S ⊆ { 1 , . . . , m } , either F S or F S c is spanning. Fix x, x ′ ∈ R n with | f T i x | = | f T i x ′ | for all i , and let S be the set S = { i : sign( f T i x ) = sign( f T i x ′ ) } . If F S is spanning, x = x ′ ; else x = − x ′ . • Suppose not; let S be the offending set of indices. Pick x � = 0 such S x = 0, and x ′ � = 0 such that F T S c x ′ = 0. Then x + x ′ and x − x ′ that F T have the same modulus.

  15. An example for the l 2 subspace case: • consider √ √ √   1 0 1 / 2 0 1 / 2 1 / 2 √   F = 0 1 1 / 2 0 0 0   √ √ 0 0 0 1 − 1 / 2 1 / 2 with groups I 1 = { 1 , 2 } , I 2 = { 3 , 4 } , and I 3 = { 5 , 6 } • Is l 2 pooling invertible here?

  16. An example for the l 2 subspace case: • consider √ √ √   1 0 1 / 2 0 1 / 2 1 / 2 √   F = 0 1 1 / 2 0 0 0   √ √ 0 0 0 1 − 1 / 2 1 / 2 with groups I 1 = { 1 , 2 } , I 2 = { 3 , 4 } , and I 3 = { 5 , 6 } • Is l 2 pooling invertible here? • no.

  17. Proposition 2 (Cassaza et al 2013, BLS 2013) . The ℓ 2 pooling operator P 2 satisfies ∀ x, x ′ ; , A 2 d ( x, x ′ ) ≤ � P 2 ( x ) − P 2 ( x ′ ) � ≤ B 2 d ( x, x ′ ) , (4) where � λ 2 − ( G S ) + λ 2 A 2 = min min − ( G S c ) , G ∈ Q 2 S ⊂{ 1 ...m } = λ + ( F ) (5) B 2 In particular, P 2 is injective (up to a global sign) if and only if for any subset S ⊆ { 1 , . . . , m } , either G S or G S c is an invertible frame for all G ∈ Q 2 . Here Q 2 is the set of all block orthogonal transforms applied to F .

  18. Proof of “in particular” (assuming F is spanning): • Suppose for any subset S ⊆ { 1 , . . . , m } , either G S or G S c is spanning for every G ∈ Q 2 . – Fix x, x ′ ∈ R n with P 2 ( x ) = P 2 ( x ′ ). – Choose o.n. bases G k for the subspace spanned by F k so that the k x and u ′ = G T k x ′ satisfy coordinates u = G T ∗ u ′ i = u i = 0 for i ∈ { 3 , ..., d } ∗ u 1 = u ′ 1 and | u 2 | = | u ′ 2 | Now use the previous argument. • Suppose not; rotate into bad coordinates, use previous method. P 2 is invariant to block rotations.

  19. Corollary 3 (less than a week old) . IF K > 2 n and F has the property that any n columns are spanning, P 2 is invertible (in particular, for random block orthogonal F with K > 2 n ).

  20. Half rectification: • Let Ω = Ω( F, α ) be the set of subsets S of { 1 , ..., m } such that some x have f T i x > α for i ∈ S and f T i x ≤ α for i ∈ S c . � � Proposition 4. Let A 0 = min S ∈ Ω λ − ( F S � V S ) . Then the half-rectification operator M α ( x ) = ρ α ( F T x ) is injective if and only if A 0 > 0 . Moreover, it satisfies ∀ x, x ′ , A 0 � x − x ′ � ≤ � M α ( x ) − M α ( x ′ ) � ≤ B 0 � x − x ′ � , (6) with B 0 = max S ∈ Ω λ + ( F S ) ≤ λ + ( F ) .

  21. Corollary 5. Let d = 2 . Then the rectified ℓ 2 pooling operator R 2 satisfies ∀ x, x ′ ; , A 2 d ( x, x ′ ) ≤ � R 2 ( x ) − R 2 ( x ′ ) � ≤ B 2 d ( x, x ′ ) , ˜ (7) where � λ 2 ˜ = inf min min − ( F S x ∪ S x ′ \ ( S x ∩ S x ′ ) ) + A p x,x ′ S ⊂ S x ∩ S x ′ F ′ ∈ � Q p,x,x ′ � 1 / 2 λ 2 S ) + λ 2 − ( F ′ − ( F ′ S c ) ,

  22. • for l 1 , l ∞ : need to replace Q . • statements somewhat messier, not tight. • for random (block orthonormal) frames with K > 4 n , invertibility with probability 1.

  23. Will now discuss some experiments. But first, need algorithms for phase recovery: • alternating minimization • phaselift [Candes et al] and phasecut [Walsdspurger et al]

  24. • As above, denote the frame { f 1 , ..., f m } = F and set F ( − 1) to be the pseudoinverse of F ; • let F k be the frame vectors in the k th block, with I k to be the indices of the k th block. • Starting with an initial signal x 0 , update F k x ( n ) 1. y ( n ) = ( P p ( x )) k || F k x ( n ) || p , k = 1 . . . K , I k 2. x ( n +1) = F ( − 1) y ( n ) .

  25. • alternating minimization • phaselift [Candes et al] and phasecut [Walsdspurger et al]

  26. • phaselift and phasecut both use the lifting trick of [Balan et al]: consider matrix variable corresponding to xx ∗ . • absolute value constraints are linear when lifted. • many more variables • ugly nonconvex rank 1 constraint. phaselift and phase cut are differ- ent relaxations of the lifted problem.

  27. • alternating minimization is not, as far as we know, guarantee to converge to the correct solution, even when P p is invertible. • phasecut and phaselift are gauranteed with high probability for the (classical) phase recovery problem if have enough (random!) mea- surements. • In practice, if the inversion is easy enough, or if x 0 is close to the true solution, alternating minimization can work well. Moreover, • alternating minimization can be run essentially unchanged for each ℓ p ; for half rectification, we only use the nonegative entries in y for reconstruction.

  28. • We would like to use the same basic algorithm for all settings to get an idea of the relative difficulty of the recovery problem for different p , • but if our algorithm simply returns poor results in each case, differ- ences between the case might be masked. • The alternating minimization can be very effective when well initial- ized.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend