Learning Music, Images and Physics with Deep Neural Networks Joan Bruna, Matthew Hirn, Stéphane Mallat Vincent Lostanlen, Edouard Oyallon, Nicolas Poilvert, Laurent Sifre, Irène Waldspurger École Normale Supérieure www.di.ens.fr/data

High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Classification: estimate a class label f ( x ) given n sample values { x i , y i = f ( x i ) } i ≤ n Image Classification d = 10 6 Huge variability Anchor Joshua Tree Beaver Lotus Water Lily inside classes

High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Classification: estimate a class label f ( x ) given n sample values { x i , y i = f ( x i ) } i ≤ n Huge variability Audio: instrument recognition inside classes

High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Regression: approximate a functional f ( x ) given n sample values { x i , y i = f ( x i ) ∈ R } i ≤ n Physics: energy f ( x ) of a state vector x Astronomy Quantum Chemistry

Curse of Dimensionality • f ( x ) can be approximated from examples { x i , f ( x i ) } i by local interpolation if f is regular and there are close examples: ? x • Need ✏ − d points to cover [0 , 1] d at a Euclidean distance ✏ ) k x � x i k is always large Huge variability inside classes

Learning by Euclidean Embedding Representation x ∈ R d Data: Φ x ∈ H k x � x 0 k : non-informative Linear Classifier Φ x Gaussian & Separated k Φ x � Φ x 0 k ”Similarity” metric: ∆ ( x, x 0 ) Equivalent Euclidean metric: C 1 k Φ x � Φ x 0 k ∆ ( x, x 0 ) C 2 k Φ x � Φ x 0 k How to define Φ ?

Deep Convolution Neworks • The revival of an old (1950) idea: Y. LeCun , G. Hinton x L 1 linear convolution ρ ( u ) = | u | non-linear scalar: neuron ρ L 2 linear convolution ρ . . . Linear Classificat. Φ ( x ) Optimize the L k with support constraints: over 10 9 parameters Exceptional results for images, speech, bio-data classification. Products by FaceBook, IBM, Google, Microsoft, Yahoo... Why does it work so well ?

Overview • Deep multiscale networks: invariant and stable metrics on groups • Image classification • Models of audio and image textures: information theory • Learning physics: quantum chemistry energy regression

Image Metrics • Low-dimensional ”geometric shapes” x 0 ( u ) x ( u ) (classic mechanics) Deformation metric: Grenander Di ff eomorphism action: D τ x ( u ) = x ( u − τ ( u )) ∆ ( x, x 0 ) ⇠ min k D τ x � x 0 k + kr τ k 1 k x k τ Invariant to translations di ff eomorphism amplitude

Image Metrics X ( u ) • High dimensional textures: ergodic stationary processes 2D Turbulence Highly non-Gaussian processes • A Euclidean metric is a Maximum Likelihood on Gaussian models. • Can we find Φ so that Φ ( X ) is nearly Gaussian, without loosing information ?

Euclidean Metric Embedding • Stability to additive perturbations: k Φ x � Φ x 0 k C k x � x 0 k • Invariance to translations: x c ( u ) = x ( u − c ) ⇒ Φ ( x c ) = Φ ( x ) • Stability to deformations: x τ ( u ) = x ( u � τ ( u )) ) k Φ x � Φ x τ k C kr τ k ∞ k x k Failure of Fourier and classic invariants

Wavelet Transform ψ λ ( t ) = 2 − j/Q ψ (2 − j/Q t ) with λ = 2 − j/Q • Dilated wavelets: ψ λ � ( t ) | ˆ | ˆ ψ λ ( ω ) | 2 | ˆ ψ λ � ( ω ) | 2 φ ( ω ) | 2 ψ λ ( t ) λ � λ 0 ω Q-constant band-pass filters ˆ ψ λ ✓ x ? � 2 J ( t ) ◆ : average Wx = • Wavelet transform: x ? λ ( t ) : higher λ ≤ 2 J frequencies Preserves norm: � Wx � 2 = � x � 2 .

Scale separation with Wavelets • Complex wavelet: ψ ( t ) = g ( t ) exp i ξ t , t = ( t 1 , t 2 ) ψ λ ( t ) = 2 − j ψ (2 − j r θ t ) with λ = (2 j , θ ) rotated and dilated: ω 2 real parts imaginary parts | ˆ ψ λ ( ω ) | 2 ω 1 ✓ x ? � 2 J ( t ) ◆ : average Wx = • Wavelet transform: x ? λ ( t ) : higher λ ≤ 2 J frequencies Preserves norm: � Wx � 2 = � x � 2 .

Fast Wavelet Transform 2 0 | W 1 | 2 1 | x ? 2 1 , θ | 2 J Scale

Wavelet Transform 2 0 | W 1 | | W 1 | 2 1 | x ? 2 1 , θ | 2 2 | x ? 2 2 , θ | 2 3 | x ? 2 3 , θ | 2 J Depth: Scale x ? � J : locally invariant by translation How to make everything invariant to translation ?

Wavelet Translation Invariance First wavelet transform full translation invariance x ( t ) local translation invariance x ? � 2 J ( t ) ✓ x ? � 2 J ✓ ◆ x ? � 2 J ◆ W 1 x = | W 1 | x = x ? λ 1 | x ? λ 1 | 2 J = ∞ λ 1 2 J λ 1 q x ? λ 1 ( t ) = x ? a λ 1 ( t ) + i x ? b λ 1 ( t ) | 2 + | x ? b λ 1 ( t ) | x ? a λ 1 ( t ) | 2 Modulus improves invariance: | x ? λ 1 ( t ) | = but covariant | x ? λ 1 | ? � 2 J ( t ) Second wavelet transform modulus ✓ ◆ | x ? λ 1 | ? � 2 J ( t ) | W 2 | | x ? λ 1 | = || x ? λ 1 | ? λ 2 ( t ) | λ 2

Scattering Transform x x ? � 2 J | W 1 | | x ⇥ � λ 1 ( t ) | | x ⇥ � λ 0 1 ( t ) | | x ⇥ � λ 00 1 ( t ) | | x ⇥ � λ 000 1 ( t ) |

Scattering Transform x x ? � 2 J | W 1 | | x ? λ 1 | ? � 2 J | W 2 | || x ? λ 1 | ? λ 2 ( t ) |

Scattering Neural Network x x ? � 2 J | W 1 | | x ? λ 1 | ? � 2 J | W 2 | || x ? λ 1 | ? λ 2 | ? � 2 J | W 3 | ||| x ? λ 1 | ? λ 2 | ? λ 3 |

Scattering Properties x ? � 2 J | x ? λ 1 | ? � 2 J = . . . | W 3 | | W 2 | | W 1 | x || x ? λ 1 | ? λ 2 | ? � 2 J S J x = ||| x ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J ... λ 1 , λ 2 , λ 3 ,... W k is unitary ⇒ | W k | is contractive Theorem : For appropriate wavelets, a scattering is contractive k S J x � S J y k k x � y k ( L 2 stability ) preserves norms k S J x k = k x k translations invariance and deformation stability: if x τ ( u ) = x ( u − τ ( u )) then J →∞ k S J x τ � S J x k C kr τ k ∞ k x k lim

Digit Classification: MNIST Joan Bruna y = f ( x ) Linear Classifier S J x x Classification Errors Training size Conv. Net. Scattering 50000 0 . 5% 0 . 4 % LeCun et. al.

Classification of Textures J. Bruna CUREt database 61 classes Texte Scat. Moments y = f ( x ) Linear Classifier S J x x 2 J = image size Classification Errors Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0 . 2 %

Scattering Moments of Processes The scattering transform of a stationary process X ( t ) X | X ? λ 1 | ? � 2 J : Gaussian for 2 J large || X ? λ 1 | ? λ 2 | S J X = ||| X ? λ 2 | ? λ 2 | ? λ 3 | if X is ergodic ... J → ∞ E ( X ) E ( | X ? λ 1 | ) E ( SX ) = E ( || X ? λ 1 | ? λ 2 | ) E ( ||| X ? λ 2 | ? λ 2 | ? λ 3 | ) ... λ 1 , λ 2 , λ 3 ,...

Representation of Random Processes E ( X ) = E ( U 0 X ) E ( | X ? λ 1 | ) = E ( U 1 X ) E ( SX ) = E ( || X ? λ 1 | ? λ 2 | ) = E ( U 2 X ) E ( ||| X ? λ 2 | ? λ 2 | ? λ 3 | ) = E ( U 3 X ) ... λ 1 , λ 2 , λ 3 ,... Theorem (Boltzmann) The distribution p ( x ) which satisfies Z R N U m x p ( x ) dx = E ( U m X ) R with a maximum entropy H max = − p ( x ) log p ( x ) dx is ∞ p ( x ) = 1 ⇣ ⌘ X Z exp λ m . U m x m =1 H max ≥ H ( X ) (entropie of X) Little loss of information: H max ≈ H ( X )

Ergodic Texture Reconstructions Joan Bruna Original Textures 2D Turbulence Gaussian process model with same second order moments Second order Gaussian Scattering: O (log N 2 ) moments E ( | x ? λ 1 | ) , E ( || x ? λ 1 | ? λ 2 | )

Representation of Audio Textures Joan Bruna Gaussian Gaussian Original in time in scattering 60 ω 20 40 Applauds 60 t 20 Paper 40 60 Cocktail Party

Failures: Harmonic Sounds V . Lostanlen Need to express frequency channel interactions: time-frequency image Bird Speech Cello

Harmonic Spiral V . Lostanlen Need to capture frequency variability and structures. octave λ 5 - - 4 • - • • • • 3 - • • • • • - j • • 2 • • - • 1 • - • • θ t R R × Z • Alignment of harmonics in two main groups. More regular variations along ( θ , j ) than λ

Rotation and Scaling Invariance Laurent Sifre UIUC database: 25 classes Scattering classification errors Training Scat. Translation 20 20 %

Extension to Rigid Mouvements Laurent Sifre Need to capture the variability of spatial directions. • Group of rigid displacements: translations and rotations • Action on wavelet coe ffi cients: rotation & translation rotation & translation , angle translation x j ( r α ( u − c ) , θ − α ) x j ( u, ✓ ) = | x ? 2 j , θ ( u ) | x ( u ) | W 1 | | W 1 | x ( r α ( u − c )) R R x ( u ) du x ( u ) du

Extension to Rigid Mouvements Laurent Sifre • To build invariants: second wavelet transform on L 2 ( G ): convolutions of x j ( u, θ ) with wavelets ψ λ 2 ( u, θ ) Z 2 π Z x j ~ ψ λ 2 ( u, θ ) = x j ( v, α ) ψ λ 2 ( u − v, θ − α ) dv d α R 2 0 • Scattering on rigid mouvements: Wavelets on Translations Wavelets on Rigid Mvt. Wavelets on Rigid Mvt. | x j ~ ψ λ 2 ( v, θ ) | | W 3 | | W 2 | x j ( u , θ ) x ( u ) | W 1 | Z | x j ~ ψ λ 2 ( v, θ ) | dud θ R x j ( u, θ ) dud θ R x ( u ) du

Recommend

More recommend