DATA Edouard Oyallon
1
Organising Deep Networks
advisor: Stéphane Mallat
following the works of Laurent Sifre, Joan Bruna, …
collaborators: Eugene Belilovsky, Sergey Zagoruyko, Jörn Jacobsen, …
Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat - - PowerPoint PPT Presentation
DATA 1 Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of Laurent Sifre, Joan Bruna, collaborators : Eugene Belilovsky, Sergey Zagoruyko, Jrn Jacobsen, DATA 2 2 High Dimensional
DATA Edouard Oyallon
1
advisor: Stéphane Mallat
following the works of Laurent Sifre, Joan Bruna, …
collaborators: Eugene Belilovsky, Sergey Zagoruyko, Jörn Jacobsen, …
DATA
2 2
Training set to predict labels
"Rhino" Not a "rhino"
"Rhinos"
− → ˆ y(x)? Estimation problem (xi, yi) ∈ R2242 × {1, ..., 1000}, i < 106
DATA
simple (say euclidean) classifier can estimate the label :
low dimensional space which is regular with respect to the class:
3
Φx x Φ
Φ w
<latexit sha1_base64="eqfykHkxP29JLdLAGokHSA1pBNI=">AAAB53icbZDLSgMxFIbP1Futt6pLN8EiuCozCupKC25ctuDYQjuUTHqmjc1khiSjlNIncONCxa3P4hu4821MLwtt/SHw8f/nkHNOmAqujet+O7ml5ZXVtfx6YWNza3unuLt3p5NMMfRZIhLVCKlGwSX6hhuBjVQhjUOB9bB/Pc7rD6g0T+StGaQYxLQrecQZNdaqPbaLJbfsTkQWwZtB6erz9LQCANV28avVSVgWozRMUK2bnpuaYEiV4UzgqNDKNKaU9WkXmxYljVEHw8mgI3JknQ6JEmWfNGTi/u4Y0ljrQRzaypianp7PxuZ/WTMz0UUw5DLNDEo2/SjKBDEJGW9NOlwhM2JggTLF7ayE9aiizNjbFOwRvPmVF8E/KZ+V3ZpXqlzCVHk4gEM4Bg/OoQI3UAUfGCA8wQu8OvfOs/PmvE9Lc86sZx/+yPn4AYu+jmw=</latexit> <latexit sha1_base64="O0lQJ2xoQXj8ACW3DbgTbPBIgF0=">AAAB53icbZDLSgNBEEVr4ivGV9Slm8YguAozCupKA25cJuCYQDKEnk5N0qbnQXePEoZ8gRsXKm79Fv/AnX9jZ5KFJl5oONxbRVeVnwiutG1/W4Wl5ZXVteJ6aWNza3unvLt3p+JUMnRZLGLZ8qlCwSN0NdcCW4lEGvoCm/7wepI3H1AqHke3epSgF9J+xAPOqDZW47FbrthVOxdZBGcGlavP01z1bvmr04tZGmKkmaBKtR070V5GpeZM4LjUSRUmlA1pH9sGIxqi8rJ80DE5Mk6PBLE0L9Ikd393ZDRUahT6pjKkeqDms4n5X9ZOdXDhZTxKUo0Rm34UpILomEy2Jj0ukWkxMkCZ5GZWwgZUUqbNbUrmCM78yovgnlTPqnbDqdQuYaoiHMAhHIMD51CDG6iDCwwQnuAFXq1769l6s96npQVr1rMPf2R9/ABGL474</latexit>kΦx Φx0k n 1 ) ˆ y(x) = ˆ y(x0) RD Rd ˆ y y D d Completely solved by the deep blackbox
DATA
4
x0 x1 x2
…
Classifier
Ref.: ImageNet Classification with Deep Convolutional Neural Networks. A Krizhevsky et al.
Solving it: DeepNetwork ρ0W0 ρ1W1 ρJ−1WJ−1 xJ = Φx xj+1 = ρjWjxj
linear operator non-linear operator
learned from labeled data Wi
DATA
Many rely on a "manifold hypothesis".
Ex: stability to diffeomorphisms
the inputs might have a large impact on the system. And this happens.
not explain the generalization properties.
priors?) The deep cascade makes features are hard to interpret
5
Ref.: Intriguing properties of neural networks.
Ref.: Understanding deep learning requires rethinking generalization
Ref.: Deep Roto-Translation Scattering for Object Classification. EO and S Mallat
DATA
6
Answers Questions Answers Questions Answers Questions Answers Questions
Organizing questions Organizing answers
Both
0 or 1 to some question. What does organizing mean?
neighbours become meaningful: local metrics
In general, works tackle only
Ref.: Harmonic Analysis of Digital Data Bases
Coifman R. et al.
DATA
neighbourhood of amplitude , for images.
7
Small deformations +Translation
Ref.: Invariant Convolutional Scattering Network, J. Bruna and S Mallat Ref.: Rotation, Scaling and Deformation Invariant Scattering for texture discrimination, Sifre L and Mallat S.
Rotation+Scale
All variabilities are known
2J SJ
Ref.: Group Invariant Scattering, Mallat S
DATA
art result on Imagenet2012:
8
ResNet x SJ Learning?
Ref.: Scaling the Scattering Transform: Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko
Accuracy Depth #params AlexNet 80.1 9 61M ResNet 88.8 18 11.7M Scat+ResNet 88.6 10 12.8M
1M images for training, 400k testing, 1000 classes
DATA
the particular case of limited samples situations, without reducing the number of parameters.
9
STL10: 5k training, 8k testing, 10 classes +100k unlabeled(not used!!) Cifar10, 10 classes
Ref.: Scaling the Scattering Transform: Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko
Geometry helps
Accuracy Scattering+CNN 76 Deep 70 Unsupervised 75
Number of samples 100 500 1000 50000 Accuracy 25 50 75 100
CNN Scattering+CNN
ResNet x SJ
DATA
appropriate representation must contract the space:
10
kΦx Φx0k kx x0k 9✏ > 0, y(x) 6= y(x0) ) kΦx Φx0k ✏ ✏
Φ
boundary of the training set classification boundary
Ref.: Understanding deep convolutional networks
S Mallat
DATA
11
# boundary points 25 50 75 100 Complexity 1 2 3 4 5 6 7 8 9 10
layer 1 layer 2 layer 3 layer 4 layer 5 last layer
Classification acc% 25 50 75 100 Depth 1 2 3 4 5 6 7 8 9 10
boundary (estimating the local dimensionality is hard)
Explains the improvement
complexity
Ref.: Building a Regular Decision Boundary with Deep Networks
EO
What variabilities are reduced?
DATA
covariance:
interpolations):
12
Ref.: Understanding deep features with computer-generated imagery, M Aubry, B Russel Ref.: Unsupervised Representation Learning with Deep Convolutional GAN, Radford, Metz & Chintalah
DATA
13
Not organised Layers Defining an order
Ref.: Multiscale Hierarchical Convolutional Networks J Jacobsen, EO, S Mallat, AWM Smeulders
Organised Layers
6 12 #params
Organized Not organised
DATA
14
important aspects…
http://www.di.ens.fr/data/
http://www.di.ens.fr/~oyallon/
Eugene Belilovsky Sergey Zagoruyko Stéphane Mallat Jörn Jacobsen