Multiscale Sparse Models in Deep Convolutional Networks Tomas - PowerPoint PPT Presentation

Multiscale Sparse Models in Deep Convolutional Networks Tomas Anglès, Roberto Leonarduzi, Stéphane Mallat, Louis Thiry, John Zarka, Sixin Zhang Collège de France École Normale Supérieure Flatiron Institute www.di.ens.fr/data

Deep Convolutional Network • Deep convolutional neural network to predict y = f ( x ): Y. LeCun x ∈ R d L j ρ linear ˜ L 1 f ( x ) ρ low dimension Scale axis L j : spatial convolutions and linear combination of channels ρ ( a ) = max( a, 0): Supervised learning of L j from n examples { x i , f ( x i ) } i ≤ n Exceptional results for images, speech, language, bio-data. quantum chemistry regressions, .... How does it reduce dimensionality ? Multiscale , Sparsity, Invariants

Statistical Models from 1 Example M. Bethdge et. al. • Supervised network training (ex: on ImageNet) • For 1 realisation x of X , compute each layer • Compute correlation statistics of network coe ffi cients • Synthesize ˜ x having similar statistics x 6 10 4 pixels ˜ x 2 10 5 correlations What mathematical interpretation ?

Learned Generative Networks • Wasserstein autoencoder: trained on n examples { x i } i ≤ n Decoder G Gaussian white Encoder Φ Z = Φ ( X ) ρ e X X = G ( Z ) ρ L j W 1 W 2 W j L 1 Z 1 Network trained on bedroom images: Z 2 Z = α Z 1 + (1 − α ) Z 2 Linearization of deformations Network trained on faces of celebrities: G ( Z ) What mathematical interpretation ?

Image Classification: ImageNet 2012 1000 classes, 1.2 million labeled training images, of 224 × 224 pixels Alex-Net ResNet Top 5 error 20% 10%

Scale Separation and Interactions • Dimension reduction: Interactions de d bodies represented by x ( u ) : particles, pixels... u 1 Interactions across scales u 2 Multiscale regroupement of interactions of d bodies into interactions of O (log d ) groups. Scale separation ⇒ wavelet transforms. Critical How to capture scale interactions ? harmonic analysis problems since 1970’s

Overview • Scale separation with wavelets and interactions through phase • Linear scale interaction models: – Compressive signal approximations – Stochastic models of stationary processes • Non-linear scale interactions models with sparse dictionaries – Generative autoencoders – Classification of ImageNet • All these roads go to Convolutional Neural Networks…

Scale separation with Wavelets • Wavelet filter ψ ( u ): 2 phases + i ψ λ ( u ) = 2 − 2 j ψ (2 − j r θ u ) rotated and dilated: Fourier θ real parts imaginary parts ω 2 ˆ ψ 2 j , θ ( ω ) ω 1 invertible • Wavelet transform: ✓ x ? � 2 J ◆ Wx = \ x ( ! ) ˆ x ? λ x ? λ ( ! ) = ˆ λ ( ! ) λ problem! • Zero-mean and no correlations across scales: X X x ( ! ) | 2 λ ( ! ) λ ( ! ) ⇤ ⇡ 0 if � 6 = � 0 x ? λ ( u ) x ? ⇤ λ 0 ( u ) = | b ω u

Wavelet Transform Filter Cascade 2 0 How to capture multiscale similarities ? Relu & Phase 2 J Scale

Rectified Wavelet Coefficients • Multiphase real wavelets: ψ α , λ = Real( e − i α ψ λ ) • Rectified with ρ ( a ) = max( a, 0): ✓ ◆ x ? � 2 J : conv. net. coe ffi cients Ux = ⇢ ( x ? α , λ ) α , λ • Linearly invertible: x = U − 1 Ux with U − 1 linear ρ ( a ) + ρ ( − a ) = a ⇒ • Relu creates non-zero mean and correlations across scales: X ⇢ ( x ? α , λ ( u )) u X ⇢ ( x ? α , λ ( u )) ⇢ ( x ? α 0 , λ 0 ( u )) u

Linear Rectifiers act on Phase Ux ( u, ↵ , � ) = ⇢ ( x ? Real( e i α λ )) = ⇢ (Real( e i α x ? λ )) x ? � = | x ? � | e i ' ( x ? λ ) Homogeneous: ρ ( α a ) = α ρ ( a ) if α > 0 Ux ( u, ↵ , � ) = | x ? λ | ⇢ (cos( ↵ + ' ( x ? λ )) ∀ z = | z | e i ϕ ( z ) ∈ C , [ z ] k , | z | e ik ϕ ( z ) Phase harmonics A Relu computes phase harmonics: Theorem : Fourier transform along the phase α : b � ( k ) | x ? � ( u ) | e ik ' ( x ? l a ( u )) Ux ( u, k, � ) = ˆ with γ ( α ) = ρ (cos α ) for any homogeneous non-linearity ρ .

Frequency Transpositions [ x ? � ] k = | x ? � ( u ) | e i k ' ( x ? λ ( u )) Phase harmonics: Performs a non-linear frequency dilation / transposition with no time dilation not correlated \ \ x ? λ ( ! ) x ? λ 0 ( ! ) k = 1 λ 0 λ ω Phase k = 2 ω Harmonics k = 3 λ 2 λ 3 λ ω Correlated if k λ ≈ λ 0

Scale Transposition with Harmonics k = 2 | x ? j, θ ( u ) | ' ( x ? j, θ ( u )) k ' ( x ? j, θ ( u )) ω 2 ω 1 Correlated ω 2 k = 2 ω 1 Phase harmonics: Frequency transpositions j scale

Linear Prediction Across Scales/Freq. • Relu mean and correlations: invariant to translations M ( ↵ , � ) = d − 1 X ⇢ ( x ? α , λ ( u )) u C ( ↵ , � , ↵ 0 , � 0 ) = d � 1 X ⇢ ( x ? α , λ ( u )) ⇢ ( x ? α 0 , λ 0 ( u )) u • Define linear autoregressive model from low to high frequencies: ⇢ ( x ? α , λ ) λ Linear prediction ⇢ ( x ? α 0 , λ 0 ) λ 0

Compressive Reconstructions Gaspar Rochette , Sixin Zhang • If x ? λ is sparse then x is recovered from m ⌧ d phase harmonic means Mx and covariances Cx : k Cx � Cy + ( Mx � My ) ( Mx � My ) ∗ k 2 x = arg min ˜ y PSNR (db) log 10 ( m/d ) Approximation rate optimal for total variation signals: x k ⇠ m − 2 k x � ˜

Compressive Reconstructions PSNR (db) log 10 ( m/d ) Approximation rate optimal for total variation signals: x k ⇠ m − 1 k x � ˜

Gaussian Models of Stationary Proc. What stochastic models for turbulence ? x d = 6 10 4 Kolmogorov model: From d empirical moments: d − 1 P u ( x ( u ) x ( u − τ )) ˜ x Gaussian model with same power spectrum No correlation is captured across scales and frequencies. Random phases. How to capture non-Gaussianity and long range interactions ?

Models of Stationary Processes Sixin Zhang x If ergodic then empirical moments converge: ⇣ ⌘ d − 1 X ⇢ ( x ? α , λ ) E ⇢ ( x ? α , λ ( u )) d → ∞ u ⇣ ⌘ d − 1 X ⇢ ( x ? α , λ ( u )) ⇢ ( x ? α 0 , λ 0 ( u )) ⇢ ( x ? α , λ ) ⇢ ( x ? α 0 , λ 0 ) E d → ∞ u • Stationary processes conditioned by translation invariant moments

Ergodic Stationary Processes Sixin Zhang d = 6 10 4 x ˜ x m = 3 10 3 Phase coherence is captured number Same quality as with learned Deep networks of moments with much less moments

Multifractal Models without high order moments Roberto Leonarduzi Financial S & P 500 returns: • Multifractal properties: E [ | X ⋆ ψ | q ] ∼ 2 j ζ ( q ) reproduce high-order moments • Probability distribution: P ( | x | ) L ( τ ) = E [ | X ( t + τ ) | 2 X ( t ) ] • Leverage correlation: : time asymmetry

Learned Generative Networks • Variational autoencoder: trained on n examples { x i } i ≤ n Decoder G Gaussian white Encoder Φ Z = Φ ( X ) ρ e X X = G ( Z ) ρ L j W 1 W 2 W j L 1 Z 1 Network trained on bedroom images: Z 2 Z = α Z 1 + (1 − α ) Z 2 Linearization of deformations • Encoder Lipschitz continuous to actions of deformations How to build such auto encoders ?

Averaged Rectified Wavelets Spatial averaging at a large scale 2 J : ✓ x ? � 2 J (2 J n ) ◆ Ux ? � J = ⇢ ( x ? α , λ ) ? � J (2 J n ) α , λ Ux ? � J 2 J Scale separation and spatial averaging with φ J : - Gaussianization - Linearize small deformations Theorem if D τ x ( u ) = x ( u − τ ( u )) then J →∞ k S J D τ x � S J x k  C kr τ k ∞ k x k lim

Multiscale Autoencoder Tomas Angles • Encoder: convolutional network d 0 = 10 2 d = 10 4 Id − Pr Ix L Ux ? � J x Z Innovation White noise ≈ Gaussian • Innovations: prediction errors are decorrelated across scales • Spatial decorrelation and dimension reduction • Generator: sparse deconvolution L − 1 Deconvolution ( Id − Pr ) − 1 U − 1 e Z e Ix + ✏ Ux ? � J + ✏ 0 Ux x CNN pseudo-inverse Non-linear Dictionary model

Progressive Sparse Deconvolution Tomas Angles • Progressive sparse deconvolution of x ? � j for j decreasing. α Ux ? � j + ✏ 0 D j Ux ? � j +1 + ✏ CNN sparse • Learns a dictionary D j where Ux ? � j is sparse the CNN computes a sparse code α so that: Ux ? � j + ✏ 0 = D j ↵ The CNN is learned jointly with D j by minimising the average error k ✏ 0 k 2 over a data basis. What sparse code is computed by the CNN ? Could it be an l 1 sparse code ?

Training Reconstruction Tomas Angles Training x i 2 J = 16 G ( S J ( x i )) Polygones Celebrities Data Basis

Testing Reconstruction Tomas Angles Testing G ( S J ( x t )) x t

Generative Interpolations Tom´ as Angles Celebrities Polygons b b Z = α Z 1 + (1 − α ) Z 2 Z 2 Z 1 G G G

Random Sampling Tomas Angles • Images synthesised from a Gaussian white noise

Classification by Dictionary Learning Louis Thiry, John Zarka 1000 classes, 1.2 million labeled training images, of 224 × 224 pixels Spatial Linear pooling classifier Ux ? � J Logistic Averaging class W 2 J 10 3 Phase Harmonic Alex-Net Wavelets Top 5 error 20% 70%

Classification by Dictionary Learning Louis Thiry, John Zarka 1000 classes, 1.2 million labeled training images, of 224 × 224 pixels Sparse Linear Spatial Linear dictionary Dimension pooling classifier expansion Reduction Ux ? � J Logistic α L class CCN, D Averaging W 2 J 10 3 sparse invariants multiscale l 1 sparse coding

Multiscale Sparse Models in Deep Convolutional Networks Tomas - PowerPoint PPT Presentation

Multiscale Sparse Models in Deep Convolutional Networks Tomas Angls, Roberto Leonarduzi, Stphane Mallat, Louis Thiry, John Zarka, Sixin Zhang Collge de France cole Normale Suprieure Flatiron Institute www.di.ens.fr/data Deep

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Generating Sparse Representations by Adaptive Multiscale Approximations Angela Kunoth University

Parameter efficient training of deep convolutional neural networks by dynamic sparse

Multiscale Modeling of Membrane Distillation Wonyup Song September 26, 2016 Essence of

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

Submanifold Sparse Convolutional Networks for Sparse, Locally Dense Particle Image Analysis

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Underwater sparse image classification using deep convolutional neural networks Mohamed Elawady

Multiscale Processing on Networks and Community Mining Part 2 - Spectral Graph Wavelets and

Gaussian Multiscale Spatio-temporal Models for Areal Data Marco A. R. Ferreira (University of

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Spline approximation of random processes with singularity Konrad Abramowicz Department of

SPECTRAL MEASURES OF POINT PROCESSES Pierre Br emaud January 12, 2015 P. Br emaud (Inria

NONLINEAR REGRESSION II Sylvain Calinon Robot Learning & Interaction Group Idiap Research

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Policy Forum Strengthening Rural Health Through Effective Advocacy June 20, 2016 MN Rural

2008 2008 2008 2008 Investor Community Conference Call Financial Results RUSS ROBERTSON Ch

A=218 (October 22-26, 2018) Balraj Singh Department of Physics and Astronomy McMaster

Multiscale Sparse Models in Deep Convolutional Networks Tomas - PowerPoint PPT Presentation

Multiscale Sparse Models in Deep Convolutional Networks Tomas Angls, Roberto Leonarduzi, Stphane Mallat, Louis Thiry, John Zarka, Sixin Zhang Collge de France cole Normale Suprieure Flatiron Institute www.di.ens.fr/data Deep

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Generating Sparse Representations by Adaptive Multiscale Approximations Angela Kunoth University

Parameter efficient training of deep convolutional neural networks by dynamic sparse

Multiscale Modeling of Membrane Distillation Wonyup Song September 26, 2016 Essence of

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

Submanifold Sparse Convolutional Networks for Sparse, Locally Dense Particle Image Analysis

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Underwater sparse image classification using deep convolutional neural networks Mohamed Elawady

Multiscale Processing on Networks and Community Mining Part 2 - Spectral Graph Wavelets and

Gaussian Multiscale Spatio-temporal Models for Areal Data Marco A. R. Ferreira (University of

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MSc in Computer Engineering, Cybersecurity and Artificial Intelligence Course FDE , a.a.

Spline approximation of random processes with singularity Konrad Abramowicz Department of

SPECTRAL MEASURES OF POINT PROCESSES Pierre Br emaud January 12, 2015 P. Br emaud (Inria

NONLINEAR REGRESSION II Sylvain Calinon Robot Learning &amp; Interaction Group Idiap Research

Simulation of stationary processes Timo Tiihonen 2014 Tactical aspects of simulation

Policy Forum Strengthening Rural Health Through Effective Advocacy June 20, 2016 MN Rural

2008 2008 2008 2008 Investor Community Conference Call Financial Results RUSS ROBERTSON Ch

A=218 (October 22-26, 2018) Balraj Singh Department of Physics and Astronomy McMaster

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

NONLINEAR REGRESSION II Sylvain Calinon Robot Learning & Interaction Group Idiap Research