Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI - - PowerPoint PPT Presentation

complexity of linear regions in deep nets
SMART_READER_LITE
LIVE PREVIEW

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI - - PowerPoint PPT Presentation

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI Research and Texas A&M March 5, 2019 Joint with David Rolnick Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19 Theoretical vs. Practical Expressivity Boris


slide-1
SLIDE 1

Complexity of Linear Regions in Deep Nets

Boris Hanin

Facebook AI Research and Texas A&M

March 5, 2019 Joint with David Rolnick

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-2
SLIDE 2

Theoretical vs. Practical Expressivity

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-3
SLIDE 3

Theoretical vs. Practical Expressivity

Brain: Why deep nets, Pinky?

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-4
SLIDE 4

Theoretical vs. Practical Expressivity

Brain: Why deep nets, Pinky? Pinky: Expressivity, Brain!

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-5
SLIDE 5

Theoretical vs. Practical Expressivity

Brain: Why deep nets, Pinky? Pinky: Expressivity, Brain! Brain: What about learnability?

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-6
SLIDE 6

Numerical Instability for Large Numbers of Regions

Figure: Random perturbation of example w/maximal number of regions.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-7
SLIDE 7

Theoretical Expressivity

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-8
SLIDE 8

Practical Expressivity at Init

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-9
SLIDE 9

Practical Expressivity

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-10
SLIDE 10

How To Do Theory?

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-11
SLIDE 11

How To Do Theory?

  • Goal. Characterize typical complexity of functions drawn from

µA,init, µA,train.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-12
SLIDE 12

How To Do Theory?

  • Goal. Characterize typical complexity of functions drawn from

µA,init, µA,train.

  • Intution. Probabilty measures in high dimensions are often

concentrated around low dimensional sets.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-13
SLIDE 13

How To Do Theory?

  • Goal. Characterize typical complexity of functions drawn from

µA,init, µA,train.

  • Intution. Probabilty measures in high dimensions are often

concentrated around low dimensional sets.

  • Idea. For networks with piecewise linear activations,

complexity of µA,init and µA,train encoded in corresponding partition of input space.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-14
SLIDE 14

Overview

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-15
SLIDE 15

Overview

N − depth d ReLU net with nout = 1

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-16
SLIDE 16

Overview

N − depth d ReLU net with nout = 1 x → N(x) is continuous and piecewise linear function

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-17
SLIDE 17

Overview

N − depth d ReLU net with nout = 1 x → N(x) is continuous and piecewise linear function Fixed weights/biases partition Rnin into convex pieces on which N is linear

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-18
SLIDE 18

Overview

N − depth d ReLU net with nout = 1 x → N(x) is continuous and piecewise linear function Fixed weights/biases partition Rnin into convex pieces on which N is linear

  • Goal. Understand average complexity of this partition

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-19
SLIDE 19

ReLU Net with nin = nout = 1 at Initialization

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-20
SLIDE 20

Input Space Partition with nin = 2 at Initialization

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-21
SLIDE 21

Evolution of Input Partition Through Network

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-22
SLIDE 22

Complexity v1.0: Number of Regions

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-23
SLIDE 23

Complexity v1.0: Number of Regions

Deterministic Bounds: 1 ≤ #regions ≤ 2#neurons

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-24
SLIDE 24

Complexity v1.0: Number of Regions

Deterministic Bounds: 1 ≤ #regions ≤ 2#neurons Moral of Prior Work. There exist very special weight/bias settings for deep skinny nets that saturate upper bound.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-25
SLIDE 25

Complexity v1.0: Number of Regions

Deterministic Bounds: 1 ≤ #regions ≤ 2#neurons Moral of Prior Work. There exist very special weight/bias settings for deep skinny nets that saturate upper bound.

  • Q1. What is the average number of regions at init?

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-26
SLIDE 26

Complexity v1.0: Number of Regions

Deterministic Bounds: 1 ≤ #regions ≤ 2#neurons Moral of Prior Work. There exist very special weight/bias settings for deep skinny nets that saturate upper bound.

  • Q1. What is the average number of regions at init?
  • Q2. What happens to regions during training (practical vs.

theoretical expressivity)?

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-27
SLIDE 27

Number of Regions when nin = nout = 1

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-28
SLIDE 28

Number of Regions when nin = nout = 1

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-29
SLIDE 29

Number of Regions when nin = nout = 1

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

For any compact S ⊂ R there are c = c(σb), C = C(σb) so that c # {neurons} ≤ 1 |S|E

  • # {regions in S}
  • ≤ C # {neurons}

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-30
SLIDE 30

Number of Regions when nin = nout = 1

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

For any compact S ⊂ R there are c = c(σb), C = C(σb) so that c # {neurons} ≤ 1 |S|E

  • # {regions in S}
  • ≤ C # {neurons}

Remark

1 Comes from formula that holds throughout training Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-31
SLIDE 31

Number of Regions when nin = nout = 1

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

For any compact S ⊂ R there are c = c(σb), C = C(σb) so that c # {neurons} ≤ 1 |S|E

  • # {regions in S}
  • ≤ C # {neurons}

Remark

1 Comes from formula that holds throughout training 2 Holds for any network connectivity Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-32
SLIDE 32

Number of Regions when nin = nout = 1

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

For any compact S ⊂ R there are c = c(σb), C = C(σb) so that c # {neurons} ≤ 1 |S|E

  • # {regions in S}
  • ≤ C # {neurons}

Remark

1 Comes from formula that holds throughout training 2 Holds for any network connectivity 3 Holds for any 1D curve inside high dimensional input space Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-33
SLIDE 33

Number of Regions on 1D Line Through Training

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-34
SLIDE 34

Number of Regions on 1D Line Through Training

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-35
SLIDE 35

Maximal # Regions on 2D Plane

Figure: Heuristic: # {regions on k dim slice} ∼ (#neurons)k . When k = 2, should have ≈ (16 ∗ 3)2 = 2304 regions.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-36
SLIDE 36

Maximal # Regions on 2D Plane

Figure: Heuristic: # {regions on k dim slice} ∼ (#neurons)k . When k = 2, should have ≈ (32 ∗ 3)2 = 9216 regions.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-37
SLIDE 37

Maximal # Regions on 2D Plane

Figure: Heuristic: # {regions on k dim slice} ∼ (#neurons)k . When k = 2, should have ≈ (32 ∗ 3)2 = 9216 regions.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-38
SLIDE 38

Complexity v2.0: Volume of Linear Region Boundaries

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-39
SLIDE 39

Complexity v2.0: Volume of Linear Region Boundaries

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-40
SLIDE 40

Complexity v2.0: Volume of Linear Region Boundaries

Basic Object of Study: BN := {Linear region boundaries of N} .

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-41
SLIDE 41

Complexity v2.0: Volume of Linear Region Boundaries

Basic Object of Study: BN := {Linear region boundaries of N} . nin = 1: vol(BN ) + 1 = #regions

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-42
SLIDE 42

Complexity v2.0: Volume of Linear Region Boundaries

Basic Object of Study: BN := {Linear region boundaries of N} . nin = 1: vol(BN ) + 1 = #regions nin > 1: # {regions inside S} = vol( BN ∩ S)

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-43
SLIDE 43

Complexity v2.0: Volume of Linear Region Boundaries

Basic Object of Study: BN := {Linear region boundaries of N} . nin = 1: vol(BN ) + 1 = #regions nin > 1: # {regions inside S} = vol( BN ∩ S) Motivation 1. vol(BN ) controls avg dist to boundary: P (dist(x, BN ) ≤ ǫ) ≃ ǫ vol(BN ∩ S), x ∼ Unif(S).

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-44
SLIDE 44

Complexity v2.0: Volume of Linear Region Boundaries

Basic Object of Study: BN := {Linear region boundaries of N} . nin = 1: vol(BN ) + 1 = #regions nin > 1: # {regions inside S} = vol( BN ∩ S) Motivation 1. vol(BN ) controls avg dist to boundary: P (dist(x, BN ) ≤ ǫ) ≃ ǫ vol(BN ∩ S), x ∼ Unif(S). Motivation 2.: vol(BN ) controls correlation length:

  • corr. length of N

?

≈ dist(x, BN )

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-45
SLIDE 45

Volume of BN

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-46
SLIDE 46

Volume of BN

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-47
SLIDE 47

Volume of BN

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

For compact S ⊂ Rnin there are c = c(σb), C = C(σb) so that c # {neurons} ≤ 1 vol (S) E

  • vol( BN ∩ S)
  • ≤ C # {neurons}

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-48
SLIDE 48

Volume of BN

Theorem (H-Rolnick) Suppose weights and biases are independent with Var[weights] = 2/fan-in, Var[bias] = σ2

b > 0.

For compact S ⊂ Rnin there are c = c(σb), C = C(σb) so that c # {neurons} ≤ 1 vol (S) E

  • vol( BN ∩ S)
  • ≤ C # {neurons}

Corollary Let x ∈ S = [0, 1]nin be uniform. There exists c = c(σb) so that E [dist(x, BN )] ≥ c # {neurons}

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-49
SLIDE 49

Distance to BN vs. Number of Neurons

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-50
SLIDE 50

Distance to BN vs. Number of Neurons

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-51
SLIDE 51

Distance to BN vs. Test Accuracy

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-52
SLIDE 52

Input Space Partition with nin = 2 at Initialization

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-53
SLIDE 53

Input Space Partition with nin = 2 after 1 Epoch

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-54
SLIDE 54

Input Space Partition with nin = 2 after Training

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-55
SLIDE 55

Distribution of Distance to Linear Region Boundary

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-56
SLIDE 56

Distribution of Distance to Linear Region Boundary

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-57
SLIDE 57

Distribution of Distance to Linear Region Boundary

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-58
SLIDE 58

Main Technical Theorem (for ReLU Nets)

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-59
SLIDE 59

Main Technical Theorem (for ReLU Nets)

Theorem (H-Rolnick) Let N be a ReLU net with nout = 1 and random weights/biases, so that bias bz at neuron z has density ρbz.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-60
SLIDE 60

Main Technical Theorem (for ReLU Nets)

Theorem (H-Rolnick) Let N be a ReLU net with nout = 1 and random weights/biases, so that bias bz at neuron z has density ρbz. Then, for S ⊂ Rnin, E [vol (BN ∩ S)] =

  • neurons z
  • S

E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx,

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-61
SLIDE 61

Main Technical Theorem (for ReLU Nets)

Theorem (H-Rolnick) Let N be a ReLU net with nout = 1 and random weights/biases, so that bias bz at neuron z has density ρbz. Then, for S ⊂ Rnin, E [vol (BN ∩ S)] =

  • neurons z
  • S

E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx,

where z(x) is the pre-activation for neuron z and Z(x) = max {bz, z(x)} = post-activation.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-62
SLIDE 62

Main Technical Theorem (for ReLU Nets)

Theorem (H-Rolnick) Let N be a ReLU net with nout = 1 and random weights/biases, so that bias bz at neuron z has density ρbz. Then, for S ⊂ Rnin, E [vol (BN ∩ S)] =

  • neurons z
  • S

E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx,

where z(x) is the pre-activation for neuron z and Z(x) = max {bz, z(x)} = post-activation. Remark

1 Analogous to Kac-Rice formula but easier because bz random Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-63
SLIDE 63

Main Technical Theorem (for ReLU Nets)

Theorem (H-Rolnick) Let N be a ReLU net with nout = 1 and random weights/biases, so that bias bz at neuron z has density ρbz. Then, for S ⊂ Rnin, E [vol (BN ∩ S)] =

  • neurons z
  • S

E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx,

where z(x) is the pre-activation for neuron z and Z(x) = max {bz, z(x)} = post-activation. Remark

1 Analogous to Kac-Rice formula but easier because bz random 2 Holds throughout training as weights/biases can be correlated Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-64
SLIDE 64

Main Technical Theorem (for ReLU Nets)

Theorem (H-Rolnick) Let N be a ReLU net with nout = 1 and random weights/biases, so that bias bz at neuron z has density ρbz. Then, for S ⊂ Rnin, E [vol (BN ∩ S)] =

  • neurons z
  • S

E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx,

where z(x) is the pre-activation for neuron z and Z(x) = max {bz, z(x)} = post-activation. Remark

1 Analogous to Kac-Rice formula but easier because bz random 2 Holds throughout training as weights/biases can be correlated 3 Holds for any connectivity Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-65
SLIDE 65

Interpretation and Intuition

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-66
SLIDE 66

Interpretation and Intuition

For fixed x ∈ S, each term in E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx

has interpretation

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-67
SLIDE 67

Interpretation and Intuition

For fixed x ∈ S, each term in E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx

has interpretation:

∇z(x) dx − size of dx under x → z(x)

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-68
SLIDE 68

Interpretation and Intuition

For fixed x ∈ S, each term in E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx

has interpretation:

∇z(x) dx − size of dx under x → z(x) ρbz(z(x)) ∇z(x) dx − P(bz creates kink at [x ± dx])

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-69
SLIDE 69

Interpretation and Intuition

For fixed x ∈ S, each term in E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx

has interpretation:

∇z(x) dx − size of dx under x → z(x) ρbz(z(x)) ∇z(x) dx − P(bz creates kink at [x ± dx]) 1{ ∂N

∂Z (x)=0}

− event that kink at x survives to output

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19
slide-70
SLIDE 70

Interpretation and Intuition

For fixed x ∈ S, each term in E

  • ∇z(x) ρbz(z(x)) 1{ ∂N

∂Z (x)=0}

  • dx

has interpretation:

∇z(x) dx − size of dx under x → z(x) ρbz(z(x)) ∇z(x) dx − P(bz creates kink at [x ± dx]) 1{ ∂N

∂Z (x)=0}

− event that kink at x survives to output

  • Intuition. If ∇z(x) = O(1) and bz is not too concentrated,

then z(x) = bz can only be solved in O(1) regions.

Boris Hanin Complexity of Linear Regions in Deep Nets

  • 3/5/19