Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI - PowerPoint PPT Presentation

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI Research and Texas A&M March 5, 2019 Joint with David Rolnick Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Theoretical vs. Practical Expressivity Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Theoretical vs. Practical Expressivity Brain: Why deep nets, Pinky? Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Theoretical vs. Practical Expressivity Brain: Why deep nets, Pinky? Pinky: Expressivity, Brain! Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Theoretical vs. Practical Expressivity Brain: Why deep nets, Pinky? Pinky: Expressivity, Brain! Brain: What about learnability? Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Numerical Instability for Large Numbers of Regions Figure: Random perturbation of example w/maximal number of regions. Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Theoretical Expressivity Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Practical Expressivity at Init Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Practical Expressivity Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

How To Do Theory? Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

How To Do Theory? Goal. Characterize typical complexity of functions drawn from µ A , init , µ A , train . Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

How To Do Theory? Goal. Characterize typical complexity of functions drawn from µ A , init , µ A , train . Intution. Probabilty measures in high dimensions are often concentrated around low dimensional sets. Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

How To Do Theory? Goal. Characterize typical complexity of functions drawn from µ A , init , µ A , train . Intution. Probabilty measures in high dimensions are often concentrated around low dimensional sets. Idea. For networks with piecewise linear activations, complexity of µ A , init and µ A , train encoded in corresponding partition of input space. Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Overview Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Overview N − depth d ReLU net with n out = 1 Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Overview N − depth d ReLU net with n out = 1 x �→ N ( x ) is continuous and piecewise linear function Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Overview N − depth d ReLU net with n out = 1 x �→ N ( x ) is continuous and piecewise linear function Fixed weights/biases partition R n in into convex pieces on which N is linear Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Overview N − depth d ReLU net with n out = 1 x �→ N ( x ) is continuous and piecewise linear function Fixed weights/biases partition R n in into convex pieces on which N is linear Goal. Understand average complexity of this partition Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

ReLU Net with n in = n out = 1 at Initialization Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Input Space Partition with n in = 2 at Initialization Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Evolution of Input Partition Through Network Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v1.0: Number of Regions Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v1.0: Number of Regions Deterministic Bounds : 1 ≤ # regions ≤ 2 # neurons Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v1.0: Number of Regions Deterministic Bounds : 1 ≤ # regions ≤ 2 # neurons Moral of Prior Work. There exist very special weight/bias settings for deep skinny nets that saturate upper bound. Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v1.0: Number of Regions Deterministic Bounds : 1 ≤ # regions ≤ 2 # neurons Moral of Prior Work. There exist very special weight/bias settings for deep skinny nets that saturate upper bound. Q1. What is the average number of regions at init? Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v1.0: Number of Regions Deterministic Bounds : 1 ≤ # regions ≤ 2 # neurons Moral of Prior Work. There exist very special weight/bias settings for deep skinny nets that saturate upper bound. Q1. What is the average number of regions at init? Q2. What happens to regions during training (practical vs. theoretical expressivity)? Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions when n in = n out = 1 Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions when n in = n out = 1 Theorem (H-Rolnick) Suppose weights and biases are independent with Var[ bias ] = σ 2 Var[ weights ] = 2 / fan - in , b > 0 . Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions when n in = n out = 1 Theorem (H-Rolnick) Suppose weights and biases are independent with Var[ bias ] = σ 2 Var[ weights ] = 2 / fan - in , b > 0 . For any compact S ⊂ R there are c = c ( σ b ) , C = C ( σ b ) so that 1 � � c # { neurons } ≤ # { regions in S } ≤ C # { neurons } | S | E Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions when n in = n out = 1 Theorem (H-Rolnick) Suppose weights and biases are independent with Var[ bias ] = σ 2 Var[ weights ] = 2 / fan - in , b > 0 . For any compact S ⊂ R there are c = c ( σ b ) , C = C ( σ b ) so that 1 � � c # { neurons } ≤ # { regions in S } ≤ C # { neurons } | S | E Remark 1 Comes from formula that holds throughout training Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions when n in = n out = 1 Theorem (H-Rolnick) Suppose weights and biases are independent with Var[ bias ] = σ 2 Var[ weights ] = 2 / fan - in , b > 0 . For any compact S ⊂ R there are c = c ( σ b ) , C = C ( σ b ) so that 1 � � c # { neurons } ≤ # { regions in S } ≤ C # { neurons } | S | E Remark 1 Comes from formula that holds throughout training 2 Holds for any network connectivity Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions when n in = n out = 1 Theorem (H-Rolnick) Suppose weights and biases are independent with Var[ bias ] = σ 2 Var[ weights ] = 2 / fan - in , b > 0 . For any compact S ⊂ R there are c = c ( σ b ) , C = C ( σ b ) so that 1 � � c # { neurons } ≤ # { regions in S } ≤ C # { neurons } | S | E Remark 1 Comes from formula that holds throughout training 2 Holds for any network connectivity 3 Holds for any 1D curve inside high dimensional input space Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Number of Regions on 1D Line Through Training Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Maximal # Regions on 2D Plane Figure: Heuristic: # { regions on k dim slice } ∼ (# neurons ) k . When k = 2 , should have ≈ (16 ∗ 3) 2 = 2304 regions. Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Maximal # Regions on 2D Plane Figure: Heuristic: # { regions on k dim slice } ∼ (# neurons ) k . When k = 2 , should have ≈ (32 ∗ 3) 2 = 9216 regions. Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v2.0: Volume of Linear Region Boundaries Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v2.0: Volume of Linear Region Boundaries Basic Object of Study: B N := { Linear region boundaries of N} . Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v2.0: Volume of Linear Region Boundaries Basic Object of Study: B N := { Linear region boundaries of N} . vol( B N ) + 1 n in = 1: = # regions Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v2.0: Volume of Linear Region Boundaries Basic Object of Study: B N := { Linear region boundaries of N} . vol( B N ) + 1 n in = 1: = # regions n in > 1: # { regions inside S } � = vol( B N ∩ S ) Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity v2.0: Volume of Linear Region Boundaries Basic Object of Study: B N := { Linear region boundaries of N} . vol( B N ) + 1 n in = 1: = # regions n in > 1: # { regions inside S } � = vol( B N ∩ S ) Motivation 1. vol( B N ) controls avg dist to boundary: P ( dist ( x , B N ) ≤ ǫ ) ≃ ǫ vol( B N ∩ S ) , x ∼ Unif ( S ) . Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI - PowerPoint PPT Presentation

Complexity of Linear Regions in Deep Nets Boris Hanin Facebook AI Research and Texas A&M March 5, 2019 Joint with David Rolnick Boris Hanin Complexity of Linear Regions in Deep Nets - 3/5/19 Theoretical vs. Practical Expressivity Boris

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

outline of this tutorial motivations 1 ACISS09 tutorial on deep belief nets deep

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1

Deep Nets: What have they ever done for Vision? Alan Yuille Dept. Cognitive Science and

From DB-nets to Coloured Petri Nets with Priorities Marco Montali and Andrey Rivkin KRDB Research

Why Are Convlotuional Nets More Sample-Efficient than Fully-Connected Nets? Zhiyuan Li Joint

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Compiling Deep Nets Scott Sanner Goal of this talk Will not evangelize deep networks /

RNN Recitation 10/27/17 Recurrent nets are very deep nets Y(T) h f (-1) X(0) The relation

Complexity and Character of Human Languages The Faculty of Language Informatics 2A: Lecture 28

MorphNet Elad Eban Faster Neural Nets with Hardware-Aware Architecture Learning Where Do

Incident Mobilization Incident Mobilization (R- -T T- -S) Nets S) Nets (R Mobilization

Iridex Group Plastic Protective sleeves Rabitz mesh Soil stabilization grid W Fencing nets

Deep Learning (Partly) Need for Pooling Demystified Which Pooling . . . Pooling Four Values

Short introduction to standard regulator Daniele Carnevale Dipartimento di Ing. Civile ed Ing.

2 2 2 Netlink2 2 2 2 2 2 as ForCES protocol draft-jhsrha-forces-netlink2-00.txt Robert

Lecture 19 Practical Issues in PID Implementation Process Control Prof. Kannan M. Moudgalya

1 Stochastic, Partially Observable Markov Decision Process (MDP) Partially Observable MDP S

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Lifting Tropical Curves and Linear Systems on Graphs Eric Katz (University of Waterloo) September

Oblivious Neural Network Predictions via MiniONN Transformations Presented by: Sherif