Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and - PowerPoint PPT Presentation

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and Yann LeCun

Neural networks: • (Simplest version) functions of the form L k ◦ L k − 1 ◦ ... ◦ L 0 , • each L j is of the form L j ( x j − 1 ) = h ( A j x j − b j ) , • A j is a matrix and b j is a vector • h is an elementwise nonlinearity. • A j optimized for a given task, usually via gradient descent.

Convolutional neural networks: • The input x j has a grid structure, and A j specializes to a convolution. • The pointwise nonlinearity is followed by a pooling operator. • Pooling introduces invariance (on the grid) at the cost of lower res- olution (on the grid).

Pooling in neural networks: • Usually block l p : � | z i 1 | p + | z i 2 | p + ... + | z i s | p = || z I i || p p [ P ( z )] i = • In words: the i th coordinate of the output is the l p norm of the i th block of z . • In convolutional nets, blocks of indices are usually small spatial blocks, • p is either 1, 2, or most usually, ∞ .

Examples/History: • Hubel & Wiesel 1962 – neuroscientists’ description of lower level mammalian vision • Fukushima 1971: Neocognitron – An artificial version. – Filters hand designed in first versions, later used Hebbian learning. – Each layer has the same architecture

• LeCun 1988, many others: Convolutional nets – Filters trained discriminatively (original versions), and for reconstruction and discrimination, (2005+). – Training end to end via backpropagation (the chain rule) and stochastic gradient descent. – currently state of the art in object recognition, localization, and detection in images, and in various speech recognition, localization, and detection tasks. – mathematically poorly understood. (not controversial) – poorly understood. (controversial)

some wild speculation: • sparse coding • piecewise linear maps • pooling is key! (sparsify/decorrelate then contract). • output of network should be invariant to things we don’t care about, but sensitive to things we care about.

Have results of Mallat and co-authors: • Convnet with l 2 pooling, specially chosen filters, and some other modifications, is provably invariant to deformations while preserving signal energy. • what about sensitivity to things we care about?

The phase recovery problem: • Classical version: goal is to find a signal z ∈ C d given the absolute value of the (discrete) Fourier transform | F ( z ) | • With no additional information, this is not possible: 1. each coefficient can be rotated independently. 2. even if z is real, absolute value of Fourier is translation invariant. 3. In a strong sense, the majority of the information in signals we know and love is in the Fourier phase, not the magnitude:

cow duck

phase of duck, abs of cow phase of cow, abs of duck

• Much simpler: consider a real dictionary; “phase” is the signs of the analysis coefficients. • If dictionary is orthogonal, can flip the sign of an inner product at will. • If dictionary is overcomplete, interactions force rigidity:

Proposition 1 (Balan et al 2006,2013) . Let F = ( f 1 , . . . , f m ) with f i ∈ R n and set d ( x, x ′ ) = min( � x − x ′ � , � x + x ′ � ) , and λ − ( G ) and λ + ( G ) to be the lower and upper frame bounds of a set of vectors G . The mapping M ( x ) = {|� x, f i �|} i ≤ m satisfies ∀ x, x ′ ∈ R n , A d ( x, x ′ ) ≤ � M ( x ) − M ( x ′ ) � ≤ B d ( x, x ′ ) , (1) where � λ 2 − ( F S ) + λ 2 A = min − ( F S c ) , (2) S ⊂{ 1 ...m } B = λ + ( F ) . (3) In particular, M is injective if and only if for any subset S ⊆ { 1 , . . . , m } , either F S or F S c is an invertible frame.

Proof of “in particular” (assuming F is spanning): • Suppose for any subset S ⊆ { 1 , . . . , m } , either F S or F S c is spanning. Fix x, x ′ ∈ R n with | f T i x | = | f T i x ′ | for all i , and let S be the set S = { i : sign( f T i x ) = sign( f T i x ′ ) } . If F S is spanning, x = x ′ ; else x = − x ′ . • Suppose not; let S be the offending set of indices. Pick x � = 0 such S x = 0, and x ′ � = 0 such that F T S c x ′ = 0. Then x + x ′ and x − x ′ that F T have the same modulus.

An example for the l 2 subspace case: • consider √ √ √   1 0 1 / 2 0 1 / 2 1 / 2 √   F = 0 1 1 / 2 0 0 0   √ √ 0 0 0 1 − 1 / 2 1 / 2 with groups I 1 = { 1 , 2 } , I 2 = { 3 , 4 } , and I 3 = { 5 , 6 } • Is l 2 pooling invertible here?

An example for the l 2 subspace case: • consider √ √ √   1 0 1 / 2 0 1 / 2 1 / 2 √   F = 0 1 1 / 2 0 0 0   √ √ 0 0 0 1 − 1 / 2 1 / 2 with groups I 1 = { 1 , 2 } , I 2 = { 3 , 4 } , and I 3 = { 5 , 6 } • Is l 2 pooling invertible here? • no.

Proposition 2 (Cassaza et al 2013, BLS 2013) . The ℓ 2 pooling operator P 2 satisfies ∀ x, x ′ ; , A 2 d ( x, x ′ ) ≤ � P 2 ( x ) − P 2 ( x ′ ) � ≤ B 2 d ( x, x ′ ) , (4) where � λ 2 − ( G S ) + λ 2 A 2 = min min − ( G S c ) , G ∈ Q 2 S ⊂{ 1 ...m } = λ + ( F ) (5) B 2 In particular, P 2 is injective (up to a global sign) if and only if for any subset S ⊆ { 1 , . . . , m } , either G S or G S c is an invertible frame for all G ∈ Q 2 . Here Q 2 is the set of all block orthogonal transforms applied to F .

Proof of “in particular” (assuming F is spanning): • Suppose for any subset S ⊆ { 1 , . . . , m } , either G S or G S c is spanning for every G ∈ Q 2 . – Fix x, x ′ ∈ R n with P 2 ( x ) = P 2 ( x ′ ). – Choose o.n. bases G k for the subspace spanned by F k so that the k x and u ′ = G T k x ′ satisfy coordinates u = G T ∗ u ′ i = u i = 0 for i ∈ { 3 , ..., d } ∗ u 1 = u ′ 1 and | u 2 | = | u ′ 2 | Now use the previous argument. • Suppose not; rotate into bad coordinates, use previous method. P 2 is invariant to block rotations.

Corollary 3 (less than a week old) . IF K > 2 n and F has the property that any n columns are spanning, P 2 is invertible (in particular, for random block orthogonal F with K > 2 n ).

Half rectification: • Let Ω = Ω( F, α ) be the set of subsets S of { 1 , ..., m } such that some x have f T i x > α for i ∈ S and f T i x ≤ α for i ∈ S c . � � Proposition 4. Let A 0 = min S ∈ Ω λ − ( F S � V S ) . Then the half-rectification operator M α ( x ) = ρ α ( F T x ) is injective if and only if A 0 > 0 . Moreover, it satisfies ∀ x, x ′ , A 0 � x − x ′ � ≤ � M α ( x ) − M α ( x ′ ) � ≤ B 0 � x − x ′ � , (6) with B 0 = max S ∈ Ω λ + ( F S ) ≤ λ + ( F ) .

Corollary 5. Let d = 2 . Then the rectified ℓ 2 pooling operator R 2 satisfies ∀ x, x ′ ; , A 2 d ( x, x ′ ) ≤ � R 2 ( x ) − R 2 ( x ′ ) � ≤ B 2 d ( x, x ′ ) , ˜ (7) where � λ 2 ˜ = inf min min − ( F S x ∪ S x ′ \ ( S x ∩ S x ′ ) ) + A p x,x ′ S ⊂ S x ∩ S x ′ F ′ ∈ � Q p,x,x ′ � 1 / 2 λ 2 S ) + λ 2 − ( F ′ − ( F ′ S c ) ,

• for l 1 , l ∞ : need to replace Q . • statements somewhat messier, not tight. • for random (block orthonormal) frames with K > 4 n , invertibility with probability 1.

Will now discuss some experiments. But first, need algorithms for phase recovery: • alternating minimization • phaselift [Candes et al] and phasecut [Walsdspurger et al]

• As above, denote the frame { f 1 , ..., f m } = F and set F ( − 1) to be the pseudoinverse of F ; • let F k be the frame vectors in the k th block, with I k to be the indices of the k th block. • Starting with an initial signal x 0 , update F k x ( n ) 1. y ( n ) = ( P p ( x )) k || F k x ( n ) || p , k = 1 . . . K , I k 2. x ( n +1) = F ( − 1) y ( n ) .

• alternating minimization • phaselift [Candes et al] and phasecut [Walsdspurger et al]

• phaselift and phasecut both use the lifting trick of [Balan et al]: consider matrix variable corresponding to xx ∗ . • absolute value constraints are linear when lifted. • many more variables • ugly nonconvex rank 1 constraint. phaselift and phase cut are different relaxations of the lifted problem.

• alternating minimization is not, as far as we know, guarantee to converge to the correct solution, even when P p is invertible. • phasecut and phaselift are gauranteed with high probability for the (classical) phase recovery problem if have enough (random!) mea- surements. • In practice, if the inversion is easy enough, or if x 0 is close to the true solution, alternating minimization can work well. Moreover, • alternating minimization can be run essentially unchanged for each ℓ p ; for half rectification, we only use the nonegative entries in y for reconstruction.

• We would like to use the same basic algorithm for all settings to get an idea of the relative difficulty of the recovery problem for different p , • but if our algorithm simply returns poor results in each case, differ- ences between the case might be masked. • The alternating minimization can be very effective when well initial- ized.

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and - PowerPoint PPT Presentation

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and Yann LeCun Neural networks: (Simplest version) functions of the form L k L k 1 ... L 0 , each L j is of the form L j ( x j 1 ) = h ( A j x j b j ) ,

Risk Pooling Strategies to Reduce and Hedge Uncertainty Location Pooling Product Pooling

Graph Neural Networks for Neutrino Classification Nicholas Choma and Joan Bruna July 18, 2018

Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values

Ask Arthur Ask Arthur Arthurs Story Ask ArthurThe First Year Resources

Introduction Introduction Webcast Moderators Moderators Webcast Joan B. Hawley, P.E. Joan B.

Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stphane

ON THE OPTIMIZATION LANDSCAPE OF NEURAL NETWORKS JOAN BRUNA , CIMS + CDS, NYU in collaboration

Scattering Bricks to Build Invariants Joan Bruna, Joakim Anden, Stphane Mallat

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Fidelity to the EBDP Programs in Michigan: Findings of Evaluation Report Joan Ilardo, PhD, LMSW

Business rates and pooling Cameron Hall, Ian Hewitt, Mark Holland, Owen Jones, Zoe Lawson, Neeraj

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

INS JOAN MERCADER Domnec Selvas- Marta Mercad- Joan Rubi Here we come

Water Resources Water Resources Water Resources Water Resources Geospatial World Forum 2014

Expectations and the demand for domestic goods April 5, 2018 Contents 1 Introduction 1 2 The

International Enforcement Co-ordination Event Styal, Cheshire 2014 Workshop Two Sally Anne

ABOUT THE PRICING EQUATION IN FINANCE Stphane C RPEY University of Evry, France

Gaming Simulation Dr.ir. Sebastiaan Meijer Associate professor, Faculty of TPM, TU Delft

Verifying commuting quantum computations via fidelity estimation of weighted graph states

Can Deep Learning Be Interpreted with Kernel Methods ? Ben Edelman & Preetum Nakkiran

Multifidelity modeling: Exploiting structure in high-dimensional problems Karen Willcox Joint

Coaching with Intention Making the Most of the PBC Cycle September 2020 Presenters Kymberly

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and - PowerPoint PPT Presentation

Pooling fidelity and phase recovery Joan Bruna, Arthur Szlam, and Yann LeCun Neural networks: (Simplest version) functions of the form L k L k 1 ... L 0 , each L j is of the form L j ( x j 1 ) = h ( A j x j b j ) ,

Risk Pooling Strategies to Reduce and Hedge Uncertainty Location Pooling Product Pooling

Graph Neural Networks for Neutrino Classification Nicholas Choma and Joan Bruna July 18, 2018

Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values

Ask Arthur Ask Arthur Arthurs Story Ask ArthurThe First Year Resources

Introduction Introduction Webcast Moderators Moderators Webcast Joan B. Hawley, P.E. Joan B.

Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stphane

ON THE OPTIMIZATION LANDSCAPE OF NEURAL NETWORKS JOAN BRUNA , CIMS + CDS, NYU in collaboration

Scattering Bricks to Build Invariants Joan Bruna, Joakim Anden, Stphane Mallat

Byron Nelson High School Phase 2 GMP January 14, 2019 BNHS Phase 2 GMP Bid Date: December 11,

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &amp;

Fidelity to the EBDP Programs in Michigan: Findings of Evaluation Report Joan Ilardo, PhD, LMSW

Business rates and pooling Cameron Hall, Ian Hewitt, Mark Holland, Owen Jones, Zoe Lawson, Neeraj

13 IN THIS CHAPTER Benefits of Thread Pooling 308 Considerations and Costs of Thread

Phase IB Supplement Phase II Submission Progressing Towards a Phase II Submission Phase IB

INS JOAN MERCADER Domnec Selvas- Marta Mercad- Joan Rubi Here we come

Water Resources Water Resources Water Resources Water Resources Geospatial World Forum 2014

Expectations and the demand for domestic goods April 5, 2018 Contents 1 Introduction 1 2 The

International Enforcement Co-ordination Event Styal, Cheshire 2014 Workshop Two Sally Anne

ABOUT THE PRICING EQUATION IN FINANCE Stphane C RPEY University of Evry, France

Gaming Simulation Dr.ir. Sebastiaan Meijer Associate professor, Faculty of TPM, TU Delft

Verifying commuting quantum computations via fidelity estimation of weighted graph states

Can Deep Learning Be Interpreted with Kernel Methods ? Ben Edelman &amp; Preetum Nakkiran

Multifidelity modeling: Exploiting structure in high-dimensional problems Karen Willcox Joint

Coaching with Intention Making the Most of the PBC Cycle September 2020 Presenters Kymberly

COMMUNITY GAME RETURN TO PLAY ROADMAP Phase 1 Phase 2A Phase 2B Phase 3 Phase 4 Phase 5 WRU &

Can Deep Learning Be Interpreted with Kernel Methods ? Ben Edelman & Preetum Nakkiran