Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH - PowerPoint PPT Presentation

Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH 2018

About Me o Yani Ioannou (yu-an-nu) o Ph.D. Student, University of Cambridge o Dept. of Engineering, Machine Intelligence Lab o Prof. Roberto Cipolla, Dr. Antonio Criminisi (MSR) o Research scientist at Wayve o Self-driving car start-up in Cambridge o Have lived in 4 countries (Canada, UK, Cyprus and Japan)

Research Background o M.Sc. Computing, Queen’s University o Prof. Michael Greenspan o 3D Computer Vision o Segmentation and recognition in massive unorganized point clouds of urban environments o “Difference of Normals” multi-scale operator (Published at 3DIMPVT)

Research Background o Ph.D. Engineering, University of Cambridge (2014 - 2018) o Prof. Roberto Cipolla, Dr. Antonio Criminisi (Microsoft Research) o Microsoft PhD Scholarship, 9-month internship at Microsoft Research c. 1496 c. 2012

Ph.D. – Collaborative Work o Segmentation of brain tumour tissues with CNNs D. Zikic, Y. Ioannou, M. Brown, A. Criminisi (MICCAI-BRATS 2014) MICCAI-BRATS 2014 o One of the first papers using deep learning for volumetric/medical imagery o Using CNNs for Malaria Diagnosis Intellectual Ventures/Gates Foundation o Designed CNN for the classification of malaria parasites in blood smears o Measuring Neural Net Robustness with Constraints O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis, A. Nori, A. Criminisi NIPS 2016 o Found that not all adversarial images can be used to improve network robustness o Refining Architectures of Deep Convolutional Neural Networks S. Shankar, D. Robertson, Y. Ioannou, A. Criminisi, R. Cipolla CVPR 2016 o Proposed a method for adapting neural network architectures to new datasets

Ph.D. – First Author o Thesis: “Structural Priors in Deep Neural Networks” o Training CNNs with Low-Rank Filters for Efficient Image Classification Yani Ioannou, Duncan Robertson, Jamie Shotton, Roberto Cipolla, Antonio Criminisi ICLR 2016 o Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups Yani Ioannou, Duncan Robertson, Roberto Cipolla, Antonio Criminisi CVPR 2017 o Decision Forests, Convolutional Networks and the Models In-Between Y. Ioannou, D. Robertson, D. Zikic, P. Kontschieder, J. Shotton, M. Brown, A. Criminisi Microsoft Research Tech. Report (2015)

Motivation o Deep Neural Networks are massive! o AlexNet 1 (2012) o 61 million parameters o 724 million FLOPS o Most compute in conv. layers 1 Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” 2 He, Zhang, Ren, and Sun, “Deep Residual Learning for Image Recognition”

Motivation o Deep Neural Networks are massive! o AlexNet 1 (2012) o 61 million parameters o 724 million FLOPS o 96% of param in F.C. layers! 1 Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” 2 He, Zhang, Ren, and Sun, “Deep Residual Learning for Image Recognition”

Motivation o Deep Neural Networks are massive! o AlexNet 1 (2012) o 61 million parameters o 7.24x10 8 million FLOPS o ResNet 2 200 (2015) o 62.5 million parameters o 5.65x10 12 FLOPS o 2-3 weeks of training on 8 GPUs 1 Krizhevsky, Sutskever, and Hinton, “ImageNet Classification with Deep Convolutional Neural Networks” 2 He, Zhang, Ren, and Sun, “Deep Residual Learning for Image Recognition”

Motivation alexnet 18% Crop & Mirror Aug. o Until very recently, state-of-the- Extra Augmentation art DNNs for Imagenet were only 16% getting more computationally complex 14% vgg-11 Top-5 Error o Each generation increased in 12% googlenet depth and width vgg-11 10% vgg-13 o Is it necessary to increase vgg-16 (C) googlenet 10x vgg-19 vgg-16 (D) resnet-50-mirror-earlylr complexity to improve 8% googlenet 144x generalization? msra-a msra-b pre-resnet 200 6% msra-c 10 8 10 9 10 10 10 11 10 12 10 13 10 14 log 10 (Multiply-Accumulate Operations) 17

Over-parameterization of DNNs o There are many proposed methods for improving the test time efficiency of DNNs showing that trained DNNs are over-parameterized o Compression o Pruning o Reduced Representation

Structural Prior Incorporating our prior knowledge of the problem and its representation into the connective structure of a neural network o Optimization of neural networks needs to learn what weights not to use o This is usually achieved with regularization o Can we structure networks closer to the specialized components used for learning images with our prior knowledge of the problem/it’s representation? o Structural Priors ⊂ Network Architecture ◦ architecture is a more general term, i.e., number of layers, activation functions, pooling, etc.

Regularization o Regularization does help training, but is not a substitute for good structural priors o MacKay (1991): regularization is not enough to make an over-parameterized network generalize as well as a network with a more appropriate parameterization o We liken regularization to a weak structural prior o Used where our only prior knowledge is that our network is greatly over-parameterized

Rethinking Regularization o “Understanding deep learning requires rethinking generalization”, Zhang et al., 2016 o “Deep neural networks easily fit random labels.” o Identifies types of “regularization”: o “Explicit regularization” – i.e. weight decay, dropout and data augmentation o “Implicit regularization” – i.e. early stopping, batch normalization o “Network architecture” o Explicit regularization has little effect on fitting random labels, while implicit regularization and network architecture does o Highlights the importance of network architecture, and by extension structural priors, for good generalization

Convolutional Neural Networks Prior Knowledge for Natural Images: o Local correlations are very important o -> Convolutional filters o We don’t need to learn a different filter for each pixel o -> Shared weights

c 2 filters ∗ … ReLU H c 1 c 1 c 2 h 1 w 1 W H W filter input image/ (parameters) feature map Convolutional Neural Networks Structural Prior for Natural Images

Connection structure Kernel Connection weights 0 1 2 3 4 5 6 7 8 9 10 11 Input image Input pixels 0 1 2 3 4 5 6 7 8 9 (zero-padded 3 x 4 pixels) Output pixels Fully connected N/A (a) 0 1 2 3 layer structure 4 5 6 7 8 9 10 11 10 11 Output pixels Output Input pixels Input pixels feature map (4 x 3) Convolutional (b) 0 1 2 3 3 × 3 square 4 5 6 7 8 9 10 11 Input pixels Convolutional Neural Networks Structural Prior for Natural Images

Ph.D. Thesis Outline My thesis is based on three novel contributions which have explored separate aspects of structural priors in DNN: I. Spatial Connectivity II. Inter-Filter Connectivity III. Conditional Connectivity

Spatial Connectivity

Spatial Connectivity Prior Knowledge: o Many of the filters learned in CNNs appear to be representing vertical/horizontal edges/relationships o Many others appear to be representable by combinations of low-rank filters o Previous work had shown that full-rank filters could be replaced with low rank approximations , e.g. Jaderberg (2014) Does every filter need to be square in a CNN?

c 3 filters c 2 filters * * H H … … H d c 1 c 2 c 1 1 h W W c 2 w W 1 Approximated Low-Rank Filters Jaderberg, Max, Andrea Vedaldi, and Andrew Zisserman (2014) “Speeding up Convolutional Neural Networks with Low Rank Expansions”.

c 2 filters c 3 filters ∗ … ∗ … H H c 1 ReLU ReLU c 3 c 1 c 2 c 2 W h 1 w 1 W H W 1 1 CNN with Low-Dimensional Embedding Typical sub-architecture found in Network-in-Network, ResNet/Inception

c 2 filters c 3 filters … ∗ ∗ ReLU … H H ReLU c 3 c 1 c 2 c 2 W W H … W 1 1 Proposed: Low-Rank Basis Same total number of filters on each layer as original network, but 50% are 1x3, and 50% are 3x1

c 2 filters … c 3 filters ∗ ∗ … ReLU … H H ReLU c 3 c 1 c 2 c 2 W W H W 1 1 … Proposed Structural Prior: Low-Rank + Full Basis 25% of total filters are full 3x3

c 1 1 … 1 c 2 filters c 3 filters … c 1 3 3 * * H H H … c 3 c 1 c 2 W W … W 1 c 2 c 1 5 1 5 … 7 c 1 7 Inception Learning a Filter-Size Basis – learning many small filters (1x1, 3x3), and fewer of the larger (5x5, 7x7)

ImageNet Results o gmp: vgg-11 w/ global max pooling o gmp-lr-2x: o 60% less computation o gmp-lr-join-wfull: o 16% less computation o 1% pt. lower error

c 2 filters … VGG-11 ILSVRC 21% fewer parameters, 41% less computation c 3 filters (low-rank only) ∗ ∗ … or … H H ReLU ReLU c 3 1% pt higher accuracy, 16% less computation c 1 c 2 c 2 W W H (low/full-rank mix) W 1 1 … Low-Rank Basis Structural Prior for CNNs

Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH - PowerPoint PPT Presentation

Structural Priors in Deep Neural Networks YANI IOANNOU, MAR. 12 TH 2018 About Me o Yani Ioannou (yu-an-nu) o Ph.D. Student, University of Cambridge o Dept. of Engineering, Machine Intelligence Lab o Prof. Roberto Cipolla, Dr. Antonio Criminisi

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Conjugate Priors: Beta and Normal; Choosing Priors 18.05 Spring 2014 Jeremy Orloff and Jonathan

Informative Priors for Graphical Model Structure James Cussens, University of York

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

On the Expressive Power of Deep Neural Networks Maithra Raghu, Ben Poole, Jon Kleinberg, Surya

Weight Parameterizations in Deep Neural Networks Sergey Zagoruyko e Paris-Est, Universit

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

State of play on digitization in Romania- presentation Vienna, 24 April 2017 Florentina ENACHE

Satellite radar altimetry and the quasi-geoid D.C. Slobbe 1 Challenge the future The NEVREF

Michael Kumhof, International Monetary Fund Douglas Laxton, International Monetary Fund

Investor Presentation Unique, Distinctive, Disruptive Unique, Distinctive, Disruptive Private

Proudly sponsored by: Thank you to our judges Beth Donahue, Grace by Design Events Jenny

State of the Club Rotary People of Action Promoting Peace Fighting Disease

2016 Sister Cities Commission Report To: Mayor Rick Mueller Members of the City Council City

St. John Paul II Parish Parish Assembly January 31, 2016 I know well the plans I have for