Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat - - PowerPoint PPT Presentation

organising deep networks
SMART_READER_LITE
LIVE PREVIEW

Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat - - PowerPoint PPT Presentation

DATA 1 Organising Deep Networks Edouard Oyallon advisor: Stphane Mallat following the works of Laurent Sifre, Joan Bruna, collaborators : Eugene Belilovsky, Sergey Zagoruyko, Jrn Jacobsen, DATA 2 2 High Dimensional


slide-1
SLIDE 1

DATA Edouard Oyallon

1

Organising Deep Networks

advisor: Stéphane Mallat

following the works of Laurent Sifre, Joan Bruna, …

collaborators: Eugene Belilovsky, Sergey Zagoruyko, Jörn Jacobsen, …

slide-2
SLIDE 2

DATA

  • Caltech 101, etc

2 2

Training set to
 predict labels

"Rhino" Not a "rhino"

"Rhinos"

High Dimensional classification

− → ˆ y(x)? Estimation problem (xi, yi) ∈ R2242 × {1, ..., 1000}, i < 106

slide-3
SLIDE 3

DATA

Fighting the curse of dimensionality

  • Objective: building a representation of such that a

simple (say euclidean) classifier can estimate the label : 
 
 


  • Designing consist of building an approximation of a

low dimensional space which is regular with respect to the class:

  • Necessary dimensionality and variance reduction


3

Φx x Φ

Φ w

<latexit sha1_base64="eqfykHkxP29JLdLAGokHSA1pBNI=">AAAB53icbZDLSgMxFIbP1Futt6pLN8EiuCozCupKC25ctuDYQjuUTHqmjc1khiSjlNIncONCxa3P4hu4821MLwtt/SHw8f/nkHNOmAqujet+O7ml5ZXVtfx6YWNza3unuLt3p5NMMfRZIhLVCKlGwSX6hhuBjVQhjUOB9bB/Pc7rD6g0T+StGaQYxLQrecQZNdaqPbaLJbfsTkQWwZtB6erz9LQCANV28avVSVgWozRMUK2bnpuaYEiV4UzgqNDKNKaU9WkXmxYljVEHw8mgI3JknQ6JEmWfNGTi/u4Y0ljrQRzaypianp7PxuZ/WTMz0UUw5DLNDEo2/SjKBDEJGW9NOlwhM2JggTLF7ayE9aiizNjbFOwRvPmVF8E/KZ+V3ZpXqlzCVHk4gEM4Bg/OoQI3UAUfGCA8wQu8OvfOs/PmvE9Lc86sZx/+yPn4AYu+jmw=</latexit> <latexit sha1_base64="O0lQJ2xoQXj8ACW3DbgTbPBIgF0=">AAAB53icbZDLSgNBEEVr4ivGV9Slm8YguAozCupKA25cJuCYQDKEnk5N0qbnQXePEoZ8gRsXKm79Fv/AnX9jZ5KFJl5oONxbRVeVnwiutG1/W4Wl5ZXVteJ6aWNza3unvLt3p+JUMnRZLGLZ8qlCwSN0NdcCW4lEGvoCm/7wepI3H1AqHke3epSgF9J+xAPOqDZW47FbrthVOxdZBGcGlavP01z1bvmr04tZGmKkmaBKtR070V5GpeZM4LjUSRUmlA1pH9sGIxqi8rJ80DE5Mk6PBLE0L9Ikd393ZDRUahT6pjKkeqDms4n5X9ZOdXDhZTxKUo0Rm34UpILomEy2Jj0ukWkxMkCZ5GZWwgZUUqbNbUrmCM78yovgnlTPqnbDqdQuYaoiHMAhHIMD51CDG6iDCwwQnuAFXq1769l6s96npQVr1rMPf2R9/ABGL474</latexit>

kΦx Φx0k n 1 ) ˆ y(x) = ˆ y(x0) RD Rd ˆ y y D d Completely solved by the deep blackbox

slide-4
SLIDE 4

DATA

4

x0 x1 x2

Classifier

Ref.: ImageNet Classification with Deep Convolutional Neural Networks. A Krizhevsky et al.

Solving it: DeepNetwork ρ0W0 ρ1W1 ρJ−1WJ−1 xJ = Φx xj+1 = ρjWjxj

linear operator non-linear operator

learned from labeled data Wi

slide-5
SLIDE 5

DATA

Why mathematics about deep learning are important

  • Pure black box. Few mathematical results are available.

Many rely on a "manifold hypothesis".


Ex: stability to diffeomorphisms


  • No stability results. It means that "small" variations of

the inputs might have a large impact on the system. And this happens.

  • No generalisation result. Rademacher complexity can

not explain the generalization properties.

  • Shall we learn each layer from scratch? (geometric

priors?) The deep cascade makes features are hard to interpret

5

Ref.: Intriguing properties of neural networks.

  • C. Szegedy et al.

Ref.: Understanding deep learning requires rethinking generalization

  • C. Zhang et al.

Ref.: Deep Roto-Translation Scattering for Object Classification. EO and S Mallat

slide-6
SLIDE 6

DATA

Organization is a key

6

Answers Questions Answers Questions Answers Questions Answers Questions

Organizing questions Organizing answers

Both

  • Consider a problem of questionnaires: people answer to

0 or 1 to some question. What does organizing mean?

neighbours become meaningful: local metrics

In general, works tackle only

  • ne of the aspect

Ref.: Harmonic Analysis of Digital Data Bases

Coifman R. et al.

slide-7
SLIDE 7

DATA

Structuring the input with the Scattering Transform

  • Scattering Transform is a deep local descriptor of

neighbourhood of amplitude , for images.

  • It is a representation built via geometry with limited
  • learning. (~SIFT)
  • Successfully used in several applications:
  • Digits
  • Textures

7

Small deformations +Translation

Ref.: Invariant Convolutional Scattering Network, J. Bruna and S Mallat Ref.: Rotation, Scaling and Deformation Invariant Scattering for texture discrimination, Sifre L and Mallat S.

Rotation+Scale

All variabilities are known

2J SJ

Ref.: Group Invariant Scattering, Mallat S

slide-8
SLIDE 8

DATA

Scattering on ImageNet: Geometry in CNNs

  • Cascading a modern CNN leads to almost state-of-the-

art result on Imagenet2012:

  • Demonstrates no loss of information + Less layers

8

ResNet x SJ Learning?

Ref.: Scaling the Scattering Transform: Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko

Accuracy Depth #params AlexNet 80.1 9 61M ResNet 88.8 18 11.7M Scat+ResNet 88.6 10 12.8M

1M images for training, 400k testing, 1000 classes

slide-9
SLIDE 9

DATA

Benchmarking Scattering + small data

  • Adding geometric prior regularises the CNN input, in

the particular case of limited samples situations, without reducing the number of parameters.

  • State-of-the-art results on STL10 and CIFAR10:

9

STL10: 5k training, 8k testing, 10 classes +100k unlabeled(not used!!) Cifar10, 10 classes

Ref.: Scaling the Scattering Transform: Deep Hybrid Networks EO, E Belilovsky, S Zagoruyko

Geometry helps

Accuracy Scattering+CNN 76 Deep 70 Unsupervised 75

Number of samples 100 500 1000 50000 Accuracy 25 50 75 100

CNN Scattering+CNN

ResNet x SJ

slide-10
SLIDE 10

DATA

Necessary mechanism: Separation - Contraction

  • In high dimension, typical distances are huge, thus an

appropriate representation must contract the space:


  • While avoiding the different classes to collapse:

10

kΦx Φx0k  kx x0k 9✏ > 0, y(x) 6= y(x0) ) kΦx Φx0k ✏ ✏

Φ

boundary of the training set classification boundary

Ref.: Understanding deep convolutional networks

S Mallat

slide-11
SLIDE 11

DATA

Complexity measure

11

# boundary points 25 50 75 100 Complexity 1 2 3 4 5 6 7 8 9 10

layer 1 layer 2 layer 3 layer 4 layer 5 last layer

Classification acc% 25 50 75 100 Depth 1 2 3 4 5 6 7 8 9 10

  • Measuring the complexity of the classification

boundary (estimating the local dimensionality is hard)


  • Progressive contraction of the space, at each layer:

Explains the improvement

complexity

Ref.: Building a Regular Decision Boundary with Deep Networks

EO

What variabilities are reduced?

slide-12
SLIDE 12

DATA

Identifying the variabilities?

  • Several works showed a DeepNet exhibits some

covariance:

  • Manifold of faces at a certain depth (e.g. good

interpolations):

  • It is hard to enumerate them…

12

Ref.: Understanding deep features with computer-generated imagery, M Aubry, B Russel Ref.: Unsupervised Representation Learning with Deep Convolutional GAN, Radford, Metz & Chintalah

slide-13
SLIDE 13

DATA

Flattening the variability

13

Not organised Layers Defining an order

  • n layers of neurons

Ref.: Multiscale Hierarchical Convolutional Networks J Jacobsen, EO, S Mallat, AWM Smeulders

Organised Layers

6 12 #params

Organized Not organised

slide-14
SLIDE 14

DATA

Conclusion

14

  • Stability, generalisation results, interpretability are

important aspects…

  • Check the website of the team DATA: 


http://www.di.ens.fr/data/

  • Check my webpage for softwares and papers: 


http://www.di.ens.fr/~oyallon/

Thank you!

Eugene Belilovsky Sergey Zagoruyko Stéphane Mallat Jörn Jacobsen