So Sorting g Out Lipsch chitz Funct ction Approximation Cem - - PowerPoint PPT Presentation

so sorting g out lipsch chitz funct ction approximation
SMART_READER_LITE
LIVE PREVIEW

So Sorting g Out Lipsch chitz Funct ction Approximation Cem - - PowerPoint PPT Presentation

So Sorting g Out Lipsch chitz Funct ction Approximation Cem Anil* James Lucas* Roger Grosse Pacific Ballroom Poster #15 (6:30 9:00 PM) *Equal contribution Goal Train neural networks subject to a strict Lipschitz constraint while


slide-1
SLIDE 1

So Sorting g Out Lipsch chitz Funct ction Approximation

Pacific Ballroom Poster #15 (6:30 – 9:00 PM)

Cem Anil* James Lucas* Roger Grosse

*Equal contribution

slide-2
SLIDE 2

Goal

Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power.

slide-3
SLIDE 3

Goal

Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power.

Norm of Output Change Norm of Input Change Lipschitz Constant

slide-4
SLIDE 4

Goal

Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power.

Norm of Output Change Norm of Input Change Lipschitz Constant Gradient Norm Lipschitz Constant

slide-5
SLIDE 5

Goal

Train neural networks subject to a strict Lipschitz constraint while maintaining expressive power.

Norm of Output Change Norm of Input Change Lipschitz Constant Gradient Norm Lipschitz Constant

slide-6
SLIDE 6

Why Care?

  • Provable Adversarial Robustness (Cisse et. al., 2018)
  • Wasserstein Distance Estimation(Arjovsky et. al., 2017)
  • Training Generative Models (Arjovsky et. al., 2017) (Behrmann et. al., 2019)
  • Computing Generalization Bounds (Bartlett et. al., 1998,2017)
  • Stabilizing Neural Net Training(Xiao et. al., 2018) (Odena et. al., 2018)
  • ...
slide-7
SLIDE 7

Lipschitz via. Architectural Constraints

Design an architecture that is:

Universal Lipschitz Function Approximation

Constrained Enough Never violates a prescribed K-Lipschitz constraint Expressive Enough Approximate any K-Lipschitz Function (universality).

slide-8
SLIDE 8

Main Contributions

Propose an expressive Lipschitz constrained architecture that

  • Overcomes a previously unidentified limitation in prior art.
  • Can recover Universal Lipschitz function approximation.

Lipschitz via. Architectural Constraints

Design an architecture that is:

Universal Lipschitz Function Approximation

Constrained Enough Never violates a prescribed K-Lipschitz constraint Expressive Enough Approximate any K-Lipschitz Function (universality).

slide-9
SLIDE 9

Main Contributions

Propose an expressive Lipschitz constrained architecture that

  • Overcomes a previously unidentified limitation in prior art.
  • Can recover Universal Lipschitz function approximation.

Apply this architecture to

  • Train classifiers provably robust to adversarial perturbations.
  • Obtain tight estimates of Wasserstein distance.

Lipschitz via. Architectural Constraints

Design an architecture that is:

Universal Lipschitz Function Approximation

Constrained Enough Never violates a prescribed K-Lipschitz constraint Expressive Enough Approximate any K-Lipschitz Function (universality).

slide-10
SLIDE 10

Lipschitz via. Architectural Constraints

  • Compose Lipschitz linear layers and Lipschitz activations.

x Lipschitz Linear Lipschitz Activation Lipschitz Linear Lipschitz Activation Lipschitz Activation Lipschitz Linear

y Lipschitz Network

slide-11
SLIDE 11

x 1-Lipschitz Linear 1-Lipschitz Activation 1-Lipschitz Linear 1-Lipschitz Activation 1-Lipschitz Activation 1-Lipschitz Linear

y 1-Lipschitz Network

Lipschitz via. Architectural Constraints

  • Compose Lipschitz linear layers and Lipschitz activations.
slide-12
SLIDE 12

x 1-Lipschitz Linear 1-Lipschitz Activation 1-Lipschitz Linear 1-Lipschitz Activation 1-Lipschitz Linear y 1-Lipschitz Network 1-Lipschitz Linear

Lipschitz via. Architectural Constraints

First thing to try: approximate absolute value function.

slide-13
SLIDE 13

Lipschitz via. Architectural Constraints

First thing to try: approximate absolute value function.

x 1-Lipschitz Linear tanh 1-Lipschitz Linear tanh 1-Lipschitz Linear y 1-Lipschitz Network

slide-14
SLIDE 14

Lipschitz via. Architectural Constraints

First thing to try: approximate absolute value function.

x 1-Lipschitz Linear tanh 1-Lipschitz Linear tanh 1-Lipschitz Linear y 1-Lipschitz Network

slide-15
SLIDE 15

Lipschitz via. Architectural Constraints

First thing to try: approximate absolute value function.

x 1-Lipschitz Linear ReLU 1-Lipschitz Linear ReLU 1-Lipschitz Linear y 1-Lipschitz Network

slide-16
SLIDE 16

???

Lipschitz via. Architectural Constraints

What went wrong?

x 1-Lipschitz Linear ReLU 1-Lipschitz Linear ReLU 1-Lipschitz Linear y 1-Lipschitz Network

slide-17
SLIDE 17

Gradient Norms of Output wrt. Activations

Norm of Gradients input 1 After W1 After ReLU After W2 After ReLU

  • utput

Lipschitz via. Architectural Constraints

  • Diagnosing the issue: Inspect gradient norms!

x 1-Lipschitz Linear ReLU 1-Lipschitz Linear ReLU 1-Lipschitz Linear y

slide-18
SLIDE 18

Norm of Gradients input

Gradient Norms of Output wrt. Activations

1 After W1 After ReLU After W2 After ReLU

  • utput

Lipschitz via. Architectural Constraints

  • Diagnosing the issue: Inspect gradient norms!

x 1-Lipschitz Linear ReLU 1-Lipschitz Linear ReLU 1-Lipschitz Linear y

slide-19
SLIDE 19

Norm of Gradients input

Gradient Norms of Output wrt. Activations

1 After W1 After ReLU After W2 After ReLU

  • utput

Lipschitz via. Architectural Constraints

  • Diagnosing the issue: Inspect gradient norms!

x 1-Lipschitz Linear ReLU 1-Lipschitz Linear ReLU 1-Lipschitz Linear y

slide-20
SLIDE 20

Norm of Gradients input

Gradient Norms of Output wrt. Activations

1

Problem: Architecture is losing gradient norm!

After W1 After ReLU After W2 After ReLU

  • utput

Lipschitz via. Architectural Constraints

  • Diagnosing the issue: Inspect gradient norms!

x 1-Lipschitz Linear ReLU 1-Lipschitz Linear ReLU 1-Lipschitz Linear y

slide-21
SLIDE 21

Solution: Gradient Norm Preservation

slide-22
SLIDE 22

Solution: Gradient Norm Preservation

  • Activation: GroupSort
slide-23
SLIDE 23

Solution: Gradient Norm Preservation

  • Activation: GroupSort
  • Nonlinear,

continuous and differentiable almost everywhere.

  • Gradient Norm Preserving
slide-24
SLIDE 24

Solution: Gradient Norm Preservation

  • Activation: GroupSort
  • Nonlinear,

continuous and differentiable almost everywhere.

  • Gradient Norm Preserving
  • Linear Transformation:

Described in the paper.

slide-25
SLIDE 25

Gradient Norm Preservation => Expressive Power

slide-26
SLIDE 26

Gradient Norm Preservation => Expressive Power

slide-27
SLIDE 27

Gradient Norm Preservation => Expressive Power

slide-28
SLIDE 28

Gradient Norm Preservation => Expressive Power

slide-29
SLIDE 29

Universal Lipschitz Function Approximation

  • Norm

constrained GroupSort architectures can recover Universal Lipschitz Function Approximation!

Subtleties and details in the paper/poster

slide-30
SLIDE 30

Wasserstein Distance Estimation

  • Much tighter estimates of Wasserstein distance
  • Training Wasserstein GANs (Arjovsky et. al. 2017)
slide-31
SLIDE 31

Provable Adversarial Robustness

  • L-inf constrained GroupSort networks + multi-class hinge loss

gets us provable adversarial robustness with little hit to accuracy.

slide-32
SLIDE 32

Main Contributions

Propose an Lipschitz GroupSort Networks that

  • Buy us expressivity via. Gradient norm preservation.
  • Can recover Universal Lipschitz function approximation.

Apply GroupSort Networks to

  • Train classifiers provably robust to adversarial perturbations.
  • Obtain tight estimates of Wasserstein distance.
slide-33
SLIDE 33

So Sorting g Out Lipsch chitz Funct ction Approximation

Pacific Ballroom Poster #15 (6:30 – 9:00 PM)

Cem Anil* James Lucas* Roger Grosse

*Equal contribution