Understanding (or not) Deep Convolutional Networks Stphane Mallat - PowerPoint PPT Presentation

Understanding (or not) Deep Convolutional Networks Stéphane Mallat École Normale Supérieure www.di.ens.fr/data

Deep Neural Networks • Approximations of high-dimensional functions from examples, for classification and regression. • Applications: computer vision, audio and music classification, natural language analysis, bio-medical data, unstructured data… • Related to: neurophysiology of vision and audition, quantum and statistical physics, linguistics, … • Mathematics: statistics, probability, harmonic analysis, geometry, optimization. Little is understood.

High Dimensional Learning • High-dimensional x = ( x (1) , ..., x ( d )) ∈ R d : • Classification: estimate a class label f ( x ) given n sample values { x i , y i = f ( x i ) } i ≤ n Image Classification d = 10 6 Huge variability Anchor Joshua Tree Beaver Lotus Water Lily inside classes Find invariants

Curse of Dimensionality • f ( x ) can be approximated from examples { x i , f ( x i ) } i by local interpolation if f is regular and there are close examples: ? x • Need ✏ − d points to cover [0 , 1] d at a Euclidean distance ✏ ) k x � x i k is always large Huge variability inside classes

Linearisation by Change of Variable Change of variable Φ ( x ) = { φ k ( x ) } k ≤ d 0 to nearly linearize f ( x ), which is approximated by: ˜ X f ( x ) = h Φ ( x ) , w i = w k φ k ( x ) . 1D projection k Φ ( x ) ∈ R d 0 x ∈ R d Data: Linear Classifier w Φ x

Deep Convolution Neworks • The revival of an old (1950) idea: Y. LeCun , G. Hinton x L 1 linear convolution ρ ( u ) = | u | non-linear scalar: neuron ρ L 2 linear convolution ρ . . . Linear Classificat. Φ ( x ) Optimize L j with architecture constraints: over 10 9 parameters Exceptional results for images, speech, bio-data classification. Products by FaceBook, IBM, Google, Microsoft, Yahoo... Why does it work so well ?

ImageNet Data Basis • Data basis with 1 million images and 2000 classes

Alex Deep Convolution Network A. Krizhevsky, Sutsever, Hinton • Imagenet supervised training: 1.2 10 6 examples, 10 3 classes 15.3% testing error in 2012 New networks with 5% errors. with 150 layers! Wavelets

Image Classification

Scene Labeling / Car Driving

Overview • Linearisation of symmetries • Deep convolutional networks architectures • Simplified convolutional trees: wavelet scattering • Deep networks: contractions, linearization and separations

Separation and Linearization with Φ • Separation: change of variable f ( x ) = f ( Φ ( x )) Φ ( x ) 6 = Φ ( x 0 ) if f ( x ) 6 = f ( x 0 ) ) f ( z ) is Lipschitz , k Φ ( x ) � Φ ( x 0 ) k � ✏ | f ( x ) � f ( x 0 ) | • Linearization: f ( z ) = h w, z i linearize level sets Ω t = { x : f ( x ) = t } 8 x 2 Ω t , f ( x ) = h Φ ( x ) , w i = t Φ ( Ω t ) for all t are in parallel linear spaces Ω t w

Linearization of Symmetries • No local estimations because of dimensionality curse • A symmetry is an operator g which preserves level sets: f ( g.x ) = f ( x ) . ∀ x , : global If g 1 and g 2 are symmetries then g 1 .g 2 is also a symmetry ⇒ groups G of symmetries. : high dimensional • A change of variable Φ ( x ) must linearize the orbits { g.x } g ∈ G Problem: find the symmetries and linearise them.

Contract Linearize Symmetries • A change of variable Φ ( x ) must linearize the orbits { g.x } g ∈ G Problem: find the symmetries and linearise them. • Regularize the orbit, remove high curvature: linearisation

Translation and Deformations • Digit classification: x 0 ( u ) x ( u ) - Globally invariant to the translation group : small - Locally invariant to small di ff eomorphisms : huge group Video of Philipp Scott Johnson

Deep Convolutional Networks x J ( u, k J ) x 2 ( u, k 2 ) up to J = 150 x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 2 ρ L 1 k 1 k 2 x j = ρ L j x j − 1 • ρ is a pointwise contractive non-linearity: ∀ ( α , α 0 ) ∈ R 2 , | ρ ( α ) − ρ ( α 0 ) | ≤ | α − α 0 | Examples: ρ ( u ) = max( u, 0) or ρ ( u ) = | u | . • Optimisation of the L j to minimise the training error with stochastic gradient descent and back-propagation. • What is the role of the linear operators L j and of ρ ?

Deep Convolutional Networks x J ( u, k J ) x 2 ( u, k 2 ) up to J = 150 x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 2 ρ L 1 k 1 k 2 x j = ρ L j x j − 1 L j has several roles: • L j eliminates useless linear variable: dimension reduction • L j computes appropriate variables contracted by ρ Linearizes and computes invariants to groups of symmetries • L j is a linear preprocessing for the next layers

Deep Convolutional Networks x J ( u, k J ) x 2 ( u, k 2 ) x ( u ) x 1 ( u, k 1 ) ρ L J classification ρ L 1 k 1 k 2 x j = ρ L j x j − 1 • L j is a linear combination of convolutions and subsampling: ⇣ X ⌘ x j ( u, k j ) = ⇢ x j − 1 ( · , k ) ? h k j ,k ( u ) k sum across channels • Optimization of h k j ,k ( u ) to minimise the training error

Simplified Convolutional Networks x J ( u, k J ) • No channel combination x 2 ( u, k 2 ) x ( u ) x 1 ( u, k 1 ) ρ L J ρ L 1 k 1 k 2 x j = ρ L j x j − 1 • L j is a linear combination of convolutions and subsampling: ⇣ ⌘ x j ( u, k j ) = ⇢ x j − 1 ( · , k ) ? h k j ,k j − 1 ( u ) no channel interaction • If α ≥ 0 then ρ ( α ) = α ⇒ if h k j ,k j − 1 is an averaging filter then x j ( u, k j ) = x j − 1 ( · , k ) ? h k j ,k j − 1 ( u )

Convolution Tree Network • No channel combination x ρ L 1 ρ x 1 ρ L 2 ρ ρ x 2 ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ L J x J : averaging filters : band-pass filters

Wavelet Transform x ρ ρ W 1 x ρ ρ ρ : averaging filters : band-pass filters W 1 : cascade of low-pass filters and a band-pass filter

Wavelet Filter Bank x ( u ) 2 0 ρ ( α ) = | α | | W 1 | | W 1 | 2 1 | x ? 2 1 , θ | 2 2 | x ? 2 2 , θ | ψ 2 j , θ : equivalent filter | x ? 2 j , θ | 2 J Scale • Sparse representation

Scale separation with Wavelets • Complex wavelet: ψ ( u ) = g ( u ) exp i ξ u , u ∈ R 2 ψ 2 j , θ ( u ) = 2 − j ψ (2 − j r θ u ) rotated and dilated: real parts imaginary parts ✓ x ? � 2 J ( u ) ◆ : average • Wavelet transform: Wx = x ? 2 j , θ ( u ) : higher j ≤ J, θ frequencies | x ? 2 j , θ ( u ) | : eliminates phase which encodes local translation

Wavelet Scattering Network x ρ L 1 ρ L 2 ρ L J x J : averaging filters ρ W 1 ρ W 2 ... ρ W J x x J = ρ ( α ) = | α | n o ||| x ? 2 j 1 , θ 1 | ? 2 j 2 , θ 2 | ? ... | ? 2 jm , θ m | ? � J Sx = j k , θ k

Scattering Properties   x ? � 2 J | x ? λ 1 | ? � 2 J   = . . . | W 3 | | W 2 | | W 1 | x   || x ? λ 1 | ? λ 2 | ? � 2 J S J x =     ||| x ? λ 2 | ? λ 2 | ? λ 3 | ? � 2 J   ... λ 1 , λ 2 , λ 3 ,... k | W k x | � | W k x 0 | k  k x � x 0 k k W k x k = k x k ) Lemma : k [ W k , D τ ] k = k W k D τ � D τ W k k  C kr τ k ∞ Theorem : For appropriate wavelets, a scattering is contractive k S J x � S J y k  k x � y k ( L 2 stability ) translations invariance and linearizes small deformations: if D τ x ( u ) = x ( u − τ ( u )) then J →∞ k S J D τ x � S J x k  C kr τ k ∞ k x k lim

Digit Classification: MNIST Joan Bruna Supervised y = f ( x ) S J x x Linear classifier Invariants to specific deformations Invariants to translations Separates di ff erent patterns Linearises small deformations No learning Classification Errors Training size Conv. Net. Scattering 50000 0 . 5% 0 . 4 % LeCun et. al.

Classification of Textures J. Bruna CUREt database 61 classes Texte Scat. Moments Supervised y = f ( x ) S J x x Linear classifier 2 J = image size Classification Errors Training Fourier Histogr. Scattering per class Spectr. Features 46 1% 1% 0 . 2 %

Reconstruction from Scattering • Second order scattering: n o x ? � J , | x ? 2 j 1 , θ 1 | ? � J , | x ? 2 j 1 , θ 1 | ? 2 j 2 , θ 2 | ? � J S J x = If x has N 2 pixels and J = log 2 N : translation invariant then S J x has O ([log 2 N ] 2 ) coe ffi cients. • If x ( u ) is a stationary process n o E ( x ) , E ( | x ? 2 j 1 , θ 1 | ) , E ( || x ? 2 j 1 , θ 1 | ? 2 j 2 , θ 2 | ) S J x ≈ • Gradient descent reconstruction: given a random initialisation x 0 iteratively update x n to minimise k S J x � S J x n k

Translation Invariant Models Joan Bruna Original Textures 2D Turbulence Sparse Gaussian process model with same second order moments From O ((log 2 N ) 2 ) scattering coe ffi cients of order 2

Complex Image Classification Edouard Oyallon Arbre de Joshua Castore Ancre Metronome Nénuphare Bateau Supervised y = f ( x ) S J x x Linear classifier No learning Data Basis Deep-Net Scat/Unsupervised CIFAR-10 7% 20%

Generation with Deep Networks A. Radford, L. Metz, S. Chintala • Unsupervised generative models with convolutional networks • Trained on a data basis of faces: linearization • On a data basis including bedrooms: interpolaitons

Understanding (or not) Deep Convolutional Networks Stphane Mallat - PowerPoint PPT Presentation

Understanding (or not) Deep Convolutional Networks Stphane Mallat cole Normale Suprieure www.di.ens.fr/data Deep Neural Networks Approximations of high-dimensional functions from examples, for classification and regression.

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Mixed Reality Service Tony Parisi (as proxy gateway for) Mark Pesce www.mixedrealitysystem.org

Retroconversion Of A Complex Etymological Dictionary European Master in Lexicography 2009-2010

Register Allocation II 15-745 Optimizing Compilers Spring 2006 David Koes School of Computer

BASIC FACTS CONCERNING ACTIONS OF AMENABLE GROUPS ON COMPACT SPACES TOMASZ DOWNAROWICZ based on

PAST STA NOO NOODLE (Bari rilla lla P Penne Past sta) a) Feb ebruary 26, 26, 2016 2016

ITS 2.0 onto Web Platform Jirka Kosek (UEP) Dave Lewis (TCD) Felix Sasaki (DFKI) The

Week 14 - Monday What did we talk about last time? JUnit test examples Final exam will

Introduction to [computational] social choice J er ome Lang LAMSADE, CNRS & Universit

Understanding (or not) Deep Convolutional Networks Stphane Mallat - PowerPoint PPT Presentation

Understanding (or not) Deep Convolutional Networks Stphane Mallat cole Normale Suprieure www.di.ens.fr/data Deep Neural Networks Approximations of high-dimensional functions from examples, for classification and regression.

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

CS7015 (Deep Learning) : Lecture 13 Visualizing Convolutional Neural Networks, Guided

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

Understanding Convolutional Neural Networks David Stutz July 24th, 2014 David Stutz | July

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Mixed Reality Service Tony Parisi (as proxy gateway for) Mark Pesce www.mixedrealitysystem.org

Retroconversion Of A Complex Etymological Dictionary European Master in Lexicography 2009-2010

Register Allocation II 15-745 Optimizing Compilers Spring 2006 David Koes School of Computer

BASIC FACTS CONCERNING ACTIONS OF AMENABLE GROUPS ON COMPACT SPACES TOMASZ DOWNAROWICZ based on

PAST STA NOO NOODLE (Bari rilla lla P Penne Past sta) a) Feb ebruary 26, 26, 2016 2016

ITS 2.0 onto Web Platform Jirka Kosek (UEP) Dave Lewis (TCD) Felix Sasaki (DFKI) The

Week 14 - Monday What did we talk about last time? JUnit test examples Final exam will

Introduction to [computational] social choice J er ome Lang LAMSADE, CNRS &amp; Universit

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Introduction to [computational] social choice J er ome Lang LAMSADE, CNRS & Universit