[PPT] - Polynomial Optimization for Bounding Lipschitz Constants of Deep PowerPoint Presentation

SLIDE 1

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks

Victor Magron, MAC Team, CNRS–LAAS Jointly certified with T. Chen, J.-B. Lasserre and E. Pauwels IPAM, UCLA 28 February 2020

Output Hidden Input 6 4 5 1 2 3

SLIDE 2

Lipschitz constant of neural networks

Output Hidden Input

Applications: WGAN, certification Existing works: [Lattore et al.’18] based on linear programming (LP) Network setting: K-classifier, ReLU network, 1 + m layers (1 input layer + m hidden layer), Ai weights, bi biases Score of label k K = ckTxm with last activation vector ck

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 1 / 11

SLIDE 3

Lipschitz constant of neural networks

x0 ∈ Rp z0 ∈ Rp z1 ∈ Rp1 . . . zm ∈ Rpm zi = Aixi−1 + bi xi−1 = ReLU(zi−1)

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

SLIDE 4

Lipschitz constant of neural networks

x0 ∈ Rp z0 ∈ Rp z1 ∈ Rp1 . . . zm ∈ Rpm zi = Aixi−1 + bi xi−1 = ReLU(zi−1) LIPSCHITZ CONSTANT: L||·||

f

= inf{L : ∀x, y ∈ X , | f (x) − f (y)| ≤ L||x − y||}

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

SLIDE 5

Lipschitz constant of neural networks

x0 ∈ Rp z0 ∈ Rp z1 ∈ Rp1 . . . zm ∈ Rpm zi = Aixi−1 + bi xi−1 = ReLU(zi−1) LIPSCHITZ CONSTANT: L||·||

f

= inf{L : ∀x, y ∈ X , | f (x) − f (y)| ≤ L||x − y||} = sup{||∇ f (x)||∗ : x ∈ X } = sup{tT∇ f (x) : x ∈ X , ||t|| ≤ 1}

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

SLIDE 6

Lipschitz constant of neural networks

x0 ∈ Rp z0 ∈ Rp z1 ∈ Rp1 . . . zm ∈ Rpm zi = Aixi−1 + bi xi−1 = ReLU(zi−1) LIPSCHITZ CONSTANT: L||·||

f

= inf{L : ∀x, y ∈ X , | f (x) − f (y)| ≤ L||x − y||} = sup{||∇ f (x)||∗ : x ∈ X } = sup{tT∇ f (x) : x ∈ X , ||t|| ≤ 1} GRADIENT for a fixed label k: ∇ f (x0) = m

∏

i=1

Ai

Tdiag (ReLU′(zi))

ck

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 2 / 11

SLIDE 7

A polynomial optimization formulation

ReLU (left) & its semialgebraicity (right)

1
0.5

0.5 1

x

1
0.5

0.5 1

u

u = max{x, 0}

1
0.5

0.5 1

x

1
0.5

0.5 1

u

u(u − x) = 0, u ≥ x, u ≥ 0

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 3 / 11

SLIDE 8

A polynomial optimization formulation

ReLU’ (left) & its semialgebraicity (right)

1
0.5

0.5 1

x

0.5

0.5 1 1.5

u

u = 1{x≥0}

1
0.5

0.5 1

x

0.5

0.5 1 1.5

u

u(u − 1) = 0, (u − 1

2)x ≥ 0

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 4 / 11

SLIDE 9

A polynomial optimization formulation

Local Lipschitz constant: x0 ∈ ball of center ¯ x0 and radius ε

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11

SLIDE 10

A polynomial optimization formulation

Local Lipschitz constant: x0 ∈ ball of center ¯ x0 and radius ε One single hidden layer (m = 1): sup

x,u,z,t

tTATdiag (u)c s.t.        (z − Ax − b)2 = 0 t2 ≤ 1, (x − ¯ x0 + ε)(x − ¯ x0 − ε) ≤ 0 u(u − 1) = 0, (u − 1/2)z ≥ 0

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11

SLIDE 11

A polynomial optimization formulation

Local Lipschitz constant: x0 ∈ ball of center ¯ x0 and radius ε One single hidden layer (m = 1): sup

x,u,z,t

tTATdiag (u)c s.t.        (z − Ax − b)2 = 0 t2 ≤ 1, (x − ¯ x0 + ε)(x − ¯ x0 − ε) ≤ 0 u(u − 1) = 0, (u − 1/2)z ≥ 0 “CHEAP” and “TIGHT” upper bound?

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 5 / 11

SLIDE 12

The moment-sums of squares hierarchy

NP-hard NON CONVEX Problem f ⋆ = sup f (x) Theory (Primal) (Dual) sup

f dµ

inf λ with µ proba ⇒

INFINITE LP

⇐ with λ − f 0

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 6 / 11

SLIDE 13

The moment-sums of squares hierarchy

NP-hard NON CONVEX Problem f ⋆ = sup f (x) Practice (Primal Relaxation) (Dual Strengthening) moments

xα dµ

λ − f = sum of squares finite number ⇒ SDP ⇐ fixed degree LASSERRE’S HIERARCHY of CONVEX PROBLEMS ↑ f ∗ [Lasserre/Parrilo 01] degree d & n vars Numeric solvers

= ⇒ (n+2d

n ) SDP VARIABLES

= ⇒ Approx Certificate

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 6 / 11

SLIDE 14

The sparse hierarchy [Waki, Lasserre 06]

Correlative sparsity pattern f = x2x5 + x3x6 − x2x3 − x5x6 + x1(−x1 + x2 + x3 − x4 + x5 + x6) Chordal graph

6 4 5 1 2 3

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 7 / 11

SLIDE 15

The sparse hierarchy [Waki, Lasserre 06]

Correlative sparsity pattern f = x2x5 + x3x6 − x2x3 − x5x6 + x1(−x1 + x2 + x3 − x4 + x5 + x6) Chordal graph

6 4 5 1 2 3

1 Subsets C1, C2, C3 2 Average size κ ❀ (κ+2d κ ) vars

C1 = {1, 4} C2 = {1, 2, 3, 5} C3 = {1, 3, 5, 6} Dense SDP: 210 vars Sparse SDP: 115 vars

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 7 / 11

SLIDE 16

Our “heuristic relaxation” method: HR-2

Go between 1ST & 2ND stair in SPARSE hierarchy

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

SLIDE 17

Our “heuristic relaxation” method: HR-2

Go between 1ST & 2ND stair in SPARSE hierarchy sup

x,u,z,t

tTATdiag (u)c s.t.        (z − Ax − b)2 = 0 t2 ≤ 1, (x − ¯ x0 + ε)(x − ¯ x0 − ε) ≤ 0 u(u − 1) = 0, (u − 1/2)z ≥ 0

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

SLIDE 18

Our “heuristic relaxation” method: HR-2

Go between 1ST & 2ND stair in SPARSE hierarchy sup

x,u,z,t

tTATdiag (u)c s.t.        (z − Ax − b)2 = 0 t2 ≤ 1, (x − ¯ x0 + ε)(x − ¯ x0 − ε) ≤ 0 u(u − 1) = 0, (u − 1/2)z ≥ 0 Pick SDP variables for products in {x, t}, {u, z} up to deg 4

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

SLIDE 19

Our “heuristic relaxation” method: HR-2

Go between 1ST & 2ND stair in SPARSE hierarchy sup

x,u,z,t

tTATdiag (u)c s.t.        (z − Ax − b)2 = 0 t2 ≤ 1, (x − ¯ x0 + ε)(x − ¯ x0 − ε) ≤ 0 u(u − 1) = 0, (u − 1/2)z ≥ 0 Pick SDP variables for products in {x, t}, {u, z} up to deg 4 Pick SDP variables for products in {x, z}, {t, u} up to deg 2

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 8 / 11

SLIDE 20

HR-2 on random (80, 80) networks

Weight matrix A with band structure of width s SHOR: Shor’s relaxation given by 1ST stair in the hierarchy LipOpt-3: LP based method LBS: lower bound given by 104 random samples

1

2 3 4 10 20 30 40

s

Upper bound

1

10 100 10 20 30 40

s

Algorithm

HR−2

SHOR LipOpt−3 LBS

Time

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 9 / 11

SLIDE 21

HR-2 on trained (784, 500) network

MNIST classifier (SDP-NN) from Raghunathan et al. Certified defenses against adversarial examples, ICLR’18 HR-2 SHOR LipOpt-3 LBS

Global Lipschitz Bound

14.56 < 17.85

Out of RAM

9.69

Time

12246 > 2869

Out of RAM

Local Lipschitz

Bound

12.70 < 16.07

8.20

Time

20596 > 4217

Victor Magron

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 10 / 11

SLIDE 22

What’s next?

MORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY: exploit term sparsity [Wang-M.-Lasserre 19]

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

SLIDE 23

What’s next?

MORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY: exploit term sparsity [Wang-M.-Lasserre 19]

Term sparsity pattern graph Chordal extension Link with Jared Miller’s poster!

x y z xy 1 yz

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

SLIDE 24

What’s next?

MORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY: exploit term sparsity [Wang-M.-Lasserre 19]

Term sparsity pattern graph Chordal extension Link with Jared Miller’s poster!

x y z xy 1 yz

CERTIFIED bounds embed ML into “CRITICAL” dynamical systems

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

SLIDE 25

What’s next?

MORE LAYERS = ⇒ higher degree polynomials TSSOS HIERARCHY: exploit term sparsity [Wang-M.-Lasserre 19]

Term sparsity pattern graph Chordal extension Link with Jared Miller’s poster!

x y z xy 1 yz

CERTIFIED bounds embed ML into “CRITICAL” dynamical systems Open PhD/Postdoc positions

Victor Magron Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks 11 / 11

SLIDE 26

Thank you for your attention!

https://homepages.laas.fr/vmagron

Chen, Lasserre, Magron and Pauwels. Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks. arxiv:2002.03657 Wang, Magron & Lasserre. TSSOS: a moment-SOS hierarchy that exploits term sparsity. arxiv:1912.08899 TSSOS