A Kernel Perspective for Regularizing Deep Neural Networks Alberto - PowerPoint PPT Presentation

A Kernel Perspective for Regularizing Deep Neural Networks Alberto Bietti* Grégoire Mialon* Dexiong Chen Julien Mairal Inria ICML 2019, Long Beach Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 1 / 5

Regularization in Deep Learning Two issues with today’s deep learning models: Poor performance on small datasets Lack of robustness to adversarial perturbations Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 2 / 5

Regularization in Deep Learning Two issues with today’s deep learning models: Poor performance on small datasets Lack of robustness to adversarial perturbations Questions: Can regularization address this? n 1 � min ℓ ( y i , f ( x i )) + λ Ω( f ) n f i =1 What is a good choice of Ω( f ) for deep (convolutional) networks? Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 2 / 5

Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Kernels for deep convolutional architectures (Bietti and Mairal, 2019): Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Kernels for deep convolutional architectures (Bietti and Mairal, 2019): � Φ( x ) − Φ( y ) � H ≤ � x − y � 2 � Φ( x τ ) − Φ( x ) � H ≤ C ( τ ) for a small transformation x τ of x Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Kernels for deep convolutional architectures (Bietti and Mairal, 2019): � Φ( x ) − Φ( y ) � H ≤ � x − y � 2 � Φ( x τ ) − Φ( x ) � H ≤ C ( τ ) for a small transformation x τ of x CNNs f θ with ReLUs are (approximately) in the RKHS with norm � f θ � 2 H ≤ ω ( � W 1 � 2 , . . . , � W L � 2 ) . Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup � f , Φ( x + δ ) − Φ( x ) � H (adversarial perturbations) x , � δ �≤ 1 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 � f � H ≥ sup f ( x τ ) − f ( x ) (adversarial deformations) x , C ( τ ) ≤ 1 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 � f � H ≥ sup f ( x τ ) − f ( x ) (adversarial deformations) x , C ( τ ) ≤ 1 � f � H ≥ sup x �∇ f ( x ) � 2 (gradient penalty) Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 � f � H ≥ sup f ( x τ ) − f ( x ) (adversarial deformations) x , C ( τ ) ≤ 1 � f � H ≥ sup x �∇ f ( x ) � 2 (gradient penalty) Best performance by combining upper + lower bound approaches Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

More Perspectives and Experiments Regularization approaches Unified view on various existing strategies, including links with robust optimization Theoretical insights Guarantees on adversarial generalization with margin bounds Insights on regularization for training generative models Experiments Improved performance on small data scenarios in vision and biological datasets Robustness benefits with large adversarial perturbations Poster #223 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 5 / 5

A Kernel Perspective for Regularizing Deep Neural Networks Alberto - PowerPoint PPT Presentation

A Kernel Perspective for Regularizing Deep Neural Networks Alberto Bietti* Grgoire Mialon* Dexiong Chen Julien Mairal Inria ICML 2019, Long Beach Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 1

A Kernel Perspective for Regularizing Deep Neural Networks Julien Mairal Inria Grenoble Imaging

Regularizing Part Geometry Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

On the Expressive Power of Deep Neural Networks Maithra Raghu, Ben Poole, Jon Kleinberg, Surya

Weight Parameterizations in Deep Neural Networks Sergey Zagoruyko e Paris-Est, Universit

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Introduction to Deep Neural Networks 0. Logistics Spring 2020 1 Neural Networks are taking

New Computational Upper Bounds for Ramsey Numbers R ( 3 , K k e ) Jan Goedgebeur Department

Tight Tradeoffs in Searchable Symmetric Encryption Gilad Asharov Gil Segev Ido Shahaf Cornell

On the Size of Finite Rational Matrix Semigroups Christoph Haase University of Oxford, UK based

Jeffrey D. Ullman Stanford University Foto Afrati (NTUA) Anish Das Sarma (Google)

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

dimensional norms Assaf Naor Princeton University SODA17 Partitions of metric spaces P Let

Network Flows Math 482, Lecture 23 Misha Lavrov March 30, 2020 Network Flows Upper bounds on

Optimizing the fundamental limits for quantum communication Xin Wang Baidu Research TQC 2020