A Kernel Perspective for Regularizing Deep Neural Networks Alberto - - PowerPoint PPT Presentation

a kernel perspective for regularizing deep neural networks
SMART_READER_LITE
LIVE PREVIEW

A Kernel Perspective for Regularizing Deep Neural Networks Alberto - - PowerPoint PPT Presentation

A Kernel Perspective for Regularizing Deep Neural Networks Alberto Bietti* Grgoire Mialon* Dexiong Chen Julien Mairal Inria ICML 2019, Long Beach Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 1


slide-1
SLIDE 1

A Kernel Perspective for Regularizing Deep Neural Networks

Alberto Bietti* Grégoire Mialon* Dexiong Chen Julien Mairal

Inria

ICML 2019, Long Beach

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 1 / 5

slide-2
SLIDE 2

Regularization in Deep Learning

Two issues with today’s deep learning models: Poor performance on small datasets Lack of robustness to adversarial perturbations

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 2 / 5

slide-3
SLIDE 3

Regularization in Deep Learning

Two issues with today’s deep learning models: Poor performance on small datasets Lack of robustness to adversarial perturbations Questions: Can regularization address this? min

f

1 n

n

  • i=1

ℓ(yi, f (xi)) + λΩ(f ) What is a good choice of Ω(f ) for deep (convolutional) networks?

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 2 / 5

slide-4
SLIDE 4

Regularization with the RKHS Norm

Kernel methods: f (x) = f , Φ(x)H Φ(x) captures useful properties of the data f H controls model complexity and smoothness: |f (x) − f (y)| ≤ f H · Φ(x) − Φ(y)H

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

slide-5
SLIDE 5

Regularization with the RKHS Norm

Kernel methods: f (x) = f , Φ(x)H Φ(x) captures useful properties of the data f H controls model complexity and smoothness: |f (x) − f (y)| ≤ f H · Φ(x) − Φ(y)H Our work: view generic CNN fθ as an element of a RKHS H and regularize using fθH

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

slide-6
SLIDE 6

Regularization with the RKHS Norm

Kernel methods: f (x) = f , Φ(x)H Φ(x) captures useful properties of the data f H controls model complexity and smoothness: |f (x) − f (y)| ≤ f H · Φ(x) − Φ(y)H Our work: view generic CNN fθ as an element of a RKHS H and regularize using fθH Kernels for deep convolutional architectures (Bietti and Mairal, 2019):

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

slide-7
SLIDE 7

Regularization with the RKHS Norm

Kernel methods: f (x) = f , Φ(x)H Φ(x) captures useful properties of the data f H controls model complexity and smoothness: |f (x) − f (y)| ≤ f H · Φ(x) − Φ(y)H Our work: view generic CNN fθ as an element of a RKHS H and regularize using fθH Kernels for deep convolutional architectures (Bietti and Mairal, 2019): Φ(x) − Φ(y)H ≤ x − y2 Φ(xτ) − Φ(x)H ≤ C(τ) for a small transformation xτ of x

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

slide-8
SLIDE 8

Regularization with the RKHS Norm

Kernel methods: f (x) = f , Φ(x)H Φ(x) captures useful properties of the data f H controls model complexity and smoothness: |f (x) − f (y)| ≤ f H · Φ(x) − Φ(y)H Our work: view generic CNN fθ as an element of a RKHS H and regularize using fθH Kernels for deep convolutional architectures (Bietti and Mairal, 2019): Φ(x) − Φ(y)H ≤ x − y2 Φ(xτ) − Φ(x)H ≤ C(τ) for a small transformation xτ of x CNNs fθ with ReLUs are (approximately) in the RKHS with norm fθ2

H ≤ ω(W12, . . . , WL2).

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

slide-9
SLIDE 9

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-10
SLIDE 10

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-11
SLIDE 11

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms Lower bounds: use f H = supuH≤1f , uH = ⇒ consider tractable subsets of the RKHS unit ball

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-12
SLIDE 12

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms Lower bounds: use f H = supuH≤1f , uH = ⇒ consider tractable subsets of the RKHS unit ball f H ≥ sup

x,δ≤1

f , Φ(x + δ) − Φ(x)H (adversarial perturbations)

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-13
SLIDE 13

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms Lower bounds: use f H = supuH≤1f , uH = ⇒ consider tractable subsets of the RKHS unit ball f H ≥ sup

x,δ≤1

f (x + δ) − f (x) (adversarial perturbations)

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-14
SLIDE 14

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms Lower bounds: use f H = supuH≤1f , uH = ⇒ consider tractable subsets of the RKHS unit ball f H ≥ sup

x,δ≤1

f (x + δ) − f (x) (adversarial perturbations) f H ≥ sup

x,C(τ)≤1

f (xτ) − f (x) (adversarial deformations)

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-15
SLIDE 15

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms Lower bounds: use f H = supuH≤1f , uH = ⇒ consider tractable subsets of the RKHS unit ball f H ≥ sup

x,δ≤1

f (x + δ) − f (x) (adversarial perturbations) f H ≥ sup

x,C(τ)≤1

f (xτ) − f (x) (adversarial deformations) f H ≥ sup

x ∇f (x)2

(gradient penalty)

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-16
SLIDE 16

Approximating the RKHS norm

Our approach: use upper and lower bound approximations of f H Upper bound: constraint/penalty on spectral norms Lower bounds: use f H = supuH≤1f , uH = ⇒ consider tractable subsets of the RKHS unit ball f H ≥ sup

x,δ≤1

f (x + δ) − f (x) (adversarial perturbations) f H ≥ sup

x,C(τ)≤1

f (xτ) − f (x) (adversarial deformations) f H ≥ sup

x ∇f (x)2

(gradient penalty) Best performance by combining upper + lower bound approaches

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

slide-17
SLIDE 17

More Perspectives and Experiments

Regularization approaches Unified view on various existing strategies, including links with robust optimization Theoretical insights Guarantees on adversarial generalization with margin bounds Insights on regularization for training generative models Experiments Improved performance on small data scenarios in vision and biological datasets Robustness benefits with large adversarial perturbations

Poster #223

Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 5 / 5