AnIsabelleFormalization oftheExpressiveness ofDeepLearning - PowerPoint PPT Presentation

AnIsabelleFormalization oftheExpressiveness ofDeepLearning Alexander Bentkamp Vrije Universiteit Amsterdam Jasmin Blanchette Vrije Universiteit Amsterdam Dietrich Klakow Universität des Saarlandes

Motivation ◮ Case study of proof assistance in the field of machine learning ◮ Development of general-purpose libraries ◮ Study of the mathematics behind deep learning 1

Motivation ◮ Case study of proof assistance in the field of machine learning ◮ Development of general-purpose libraries ◮ Study of the mathematics behind deep learning Just wanted to formalize something! 1

Fundamental Theorem of Network Capacity (Cohen, Sharir & Shashua, 2015) x Shallow network f ( x ) needs exponentially more nodes to express the same function as x deep network f ( x ) for the vast majority of functions* * except for a Lebesgue null set 2

Deep convolutional arithmetic circuit Input Represen- tational M M M M M M M M Layer Non-linear functions r 0 r 0 r 0 r 0 r 0 r 0 r 0 r 0 r 0 r 0 r 0 r 0 r 1 r 1 r 1 r 1 r 1 r 1 1x1 Convolution Multiplication by r 2 r 2 a weight matrix Pooling r 2 Componentwise Multiplication Output Y 3

Shallow convolutional arithmetic circuit Representational Input Layer Non-linear functions M M M M M M M M Z Z Z Z Z Z Z Z 1x1 Convolution Multiplication by Pooling Z a weight matrix Componentwise Multiplication Output Y 4

Lebesgue measure definition lborel :: ( α :: euclidean_space ) measure Isabelle’s standard probability library vs. My new definition definition lborel f :: nat ⇒ ( nat ⇒ real ) measure where lborel f n = � M b ∈ { .. < n } . ( lborel :: real measure ) 5

Matrices ◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) 6

Matrices matrix dimension fixed by the type ◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) 6

Matrices matrix dimension fixed by the type ◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) lacking many necessary lemmas ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) 6

Matrices matrix dimension fixed by the type ◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) lacking many necessary lemmas ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) I added definitions and lemmas for ◮ matrix rank ◮ submatrices 6

Multivariate polynomials Lochbihler & Haftmann’s polynomial library I added various definitions, lemmas, and the theorem ◮ "Zero sets of polynomials �≡ 0 are Lebesgue null sets." theorem: fixes p :: real mpoly assumes p � = 0 and vars p ⊆ { .. < n } shows { x ∈ space ( lborel f n ) . insertion x p = 0 } ∈ null_sets ( lborel f n ) 7

My tensor library typedef α tensor = { ( ds :: nat list , as :: α list ) . length as = prod_list ds } ◮ addition, multiplication by scalars, tensor product, matricization, CP-rank ◮ Powerful induction principle uses subtensors: ◮ Slices a d 1 × d 2 × · · · × d N tensor into d 1 subtensors of dimension d 2 × · · · × d N definition subtensor :: α tensor ⇒ nat ⇒ α tensor 8

The proof on one slide Def1 Define a tensor A ( w ) that describes the function expressed by the deep network with weights w Lem1 The CP-rank of A ( w ) indicates how many nodes the shallow network needs to express the same function Def2 Define a polynomial p with the deep network weights w as variables Lem2 If p ( w ) � = 0, then A ( w ) has a high CP-rank Lem3 p ( w ) � = 0 almost everywhere 9

Restructuring the proof Before After Def1 Tensors Def1 Tensors Lem1 Tensors, shallow network Lem1 Tensors, shallow network Induction over the deep network Induction over the deep network Lem2 Polynomials, Matrices Def2 Polynomials, Tensors Def2 Polynomials, Tensors Lem2 Polynomials, Matrices Lem3a Matrices, Tensors Induction over the deep network Lem3b Measures, Polynomials Lem3a Matrices, Tensors Lem3b Measures, Polynomials 10

Restructuring the proof Before* After* Def1 Tensors Def1 Tensors Lem1 Tensors, shallow network Lem1 Tensors, shallow network Induction over the deep network Induction over the deep network Lem2 Polynomials, Matrices Def2 Polynomials, Tensors Def2 Polynomials, Tensors Lem2 Polynomials, Matrices Lem3a Matrices, Tensors Induction over the deep network Lem3b Measures, Polynomials Lem3a Matrices, Tensors Lem3b Measures, Polynomials * except for a Lebesgue null set * except for a zero set of a polynomial 10

Type for convolutional arithmetic circuits datatype α cac = Input nat | Conv α ( α cac ) | Pool ( α cac ) ( α cac ) 11

Type for convolutional arithmetic circuits datatype α cac = Input nat | Conv α ( α cac ) | Pool ( α cac ) ( α cac ) fun insert_weights :: ( nat × nat ) cac ⇒ ( nat ⇒ real ) ⇒ real mat cac � �� network with weights network without weights weights 11

Type for convolutional arithmetic circuits datatype α cac = Input nat | Conv α ( α cac ) | Pool ( α cac ) ( α cac ) fun insert_weights :: ( nat × nat ) cac ⇒ ( nat ⇒ real ) ⇒ real mat cac � �� network with weights network without weights weights fun evaluate_net :: real mat cac ⇒ real vec list ⇒ real vec � �� network input output 11

Deep network parameters locale deep_net_params = fixes rs :: nat list assumes deep : length rs ≥ 3 and no_zeros : � r . r ∈ set rs = ⇒ 0 < r 12

Deep and shallow networks deep_net = shallow_net Z = Input Input Input Input Input Input Conv Conv Conv Conv Input Conv Conv ( r 2 × r 3 ) ( r 2 × r 3 ) ( r 2 × r 3 ) ( r 2 × r 3 ) ( Z × r 3 ) ( Z × r 3 ) Pool Pool Input Conv Pool ( Z × r 3 ) Conv Conv Conv Pool ( r 1 × r 2 ) ( r 1 × r 2 ) ( Z × r 3 ) Pool Pool Conv Conv ( r 0 × r 1 ) ( r 0 × Z ) 13

Def1 Define a tensor A ( w ) that describes the function expressed by the deep network with weights w definition A :: ( nat ⇒ real ) ⇒ real tensor where A w = tensor_from_net ( insert_weights deep_net w ) The function tensor_from_net represents networks by tensors: fun tensor_from_net :: real mat cac ⇒ real tensor If two networks express the same function, the representing tensors are the same 14

Lem1 The CP-rank of A ( w ) indicates how many nodes the shallow network needs to express the same function lemma cprank_shallow_model : shows cprank ( tensor_from_net ( insert_weights w ( shallow_net Z ))) ≤ Z ◮ Can be proved by definition of the CP-rank 15

Def2 Define a polynomial p with the deep network weights w as variables Easy to define as a function: definition p func :: ( nat ⇒ real ) ⇒ real where p func w = det ( submatrix [ A i w ] rows_with_1 rows_with_1 ) But we must prove that p func is a polynomial function 16

Lem2 If p ( w ) � = 0, then A ( w ) has a high CP-rank lemma assumes p func w � = 0 shows r N_half ≤ cprank ( A w ) ◮ Follows directly from definition of p using properties of matricization and of matrix rank 17

Lem3 p ( w ) � = 0 almost everywhere Zero sets of polynomials �≡ 0 are Lebesgue null sets = ⇒ It suffices to show that p �≡ 0 We need a weight configuration w with p ( w ) � = 0 18

Final theorem theorem ∀ ❛❡ w ❞ ✇ . r . t . lborel f weight_space_dim . ∄ w s Z . Z < r N_half ∧ ∀ is . input_correct is − → � evaluate_net ( insert_weights deep_net w ❞ ) is = evaluate_net ( insert_weights ( shallow_net Z ) w s ) is 19

Conclusion Outcome ◮ First formalization on deep learning Substantial development ( ∼ 7000 lines including developed libraries) ◮ Development of libraries New tensor library and extension of other libraries ◮ Generalization of the theorem Proof restructuring led to a more precise result More information: http://matryoshka.gforge.inria.fr/#Publications AITP abstract M.Sc. thesis Archive of Formal Proofs entry ITP paper draft (coming soon) 20

AnIsabelleFormalization oftheExpressiveness ofDeepLearning - PowerPoint PPT Presentation

AnIsabelleFormalization oftheExpressiveness ofDeepLearning Alexander Bentkamp Vrije Universiteit Amsterdam Jasmin Blanchette Vrije Universiteit Amsterdam Dietrich Klakow Universitt des Saarlandes Motivation Case study of proof

Bit Fu Bi Fusion on Bit-Level Dynamically Composable Architecture for Deep Neural Networks

Collaborative Deep Learning for Recommender Systems Hao Wang Naiyan Wang Dit-Yan Yeung 1

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural

Few-Shot Learning Christian Simon Piotr Koniusz Richard Nock Mehrtash

A Semantic Loss Function for Deep Learning with Symbolic Knowledge Jingyi Xu, Zilu Zhang , Tal

Neural Network Part 2: Regularization Yingyu Liang Computer Sciences 760 Fall 2017

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G.

Reasoning with Deep Learning: an Open Challenge Marco Lippi marco.lippi@unimore.it Marco Lippi

Towards Evaluating the Robustness of Neural Networks Nicholas Carlini and David Wagner

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied &

Deep Learning for Broad Coverage Semantics: SRL, Coreference, and Beyond Luke Zettlemoyer *

Deep Canonical Correlation Analysis Galen Andrew 1 Raman Arora 2 Jeff Bilmes 1 Karen Livescu 2 1

Deep Argument Inspection Linux Plumbers Conference 2019 Kees Cook <keescook@chromium.org>

Deep Equilibrium Models Shaojie Bai Carnegie Mellon University joint work with J. Zico Kolter

Global Optimality in Neural Network Training Benjamin D. Haeffele and Ren Vidal Johns Hopkins

Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation Karn

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep Learning to Evaluate Secure RSA Implementations Mathieu Carbone, Vincent Conin, Marie-Angela

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop November 6 th 2018 Wahid Bhimji

DeepMPLS: Fast Analysis of MPLS Configurations Using Deep Learning Fabien Geyer 1,2 and Stefan

AnIsabelleFormalization oftheExpressiveness ofDeepLearning - PowerPoint PPT Presentation

AnIsabelleFormalization oftheExpressiveness ofDeepLearning Alexander Bentkamp Vrije Universiteit Amsterdam Jasmin Blanchette Vrije Universiteit Amsterdam Dietrich Klakow Universitt des Saarlandes Motivation Case study of proof

Bit Fu Bi Fusion on Bit-Level Dynamically Composable Architecture for Deep Neural Networks

Collaborative Deep Learning for Recommender Systems Hao Wang Naiyan Wang Dit-Yan Yeung 1

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural

Few-Shot Learning Christian Simon Piotr Koniusz Richard Nock Mehrtash

A Semantic Loss Function for Deep Learning with Symbolic Knowledge Jingyi Xu, Zilu Zhang , Tal

Neural Network Part 2: Regularization Yingyu Liang Computer Sciences 760 Fall 2017

Introduction to Deep Learning A. G. Schwing &amp; S. Fidler University of Toronto, 2014 A. G.

Reasoning with Deep Learning: an Open Challenge Marco Lippi marco.lippi@unimore.it Marco Lippi

Towards Evaluating the Robustness of Neural Networks Nicholas Carlini and David Wagner

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied &amp;

Deep Learning for Broad Coverage Semantics: SRL, Coreference, and Beyond Luke Zettlemoyer *

Deep Canonical Correlation Analysis Galen Andrew 1 Raman Arora 2 Jeff Bilmes 1 Karen Livescu 2 1

Deep Argument Inspection Linux Plumbers Conference 2019 Kees Cook &lt;keescook@chromium.org&gt;

Deep Equilibrium Models Shaojie Bai Carnegie Mellon University joint work with J. Zico Kolter

Global Optimality in Neural Network Training Benjamin D. Haeffele and Ren Vidal Johns Hopkins

Deep Fisher Networks and Class Saliency Maps for Object Classification and Localisation Karn

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep Learning to Evaluate Secure RSA Implementations Mathieu Carbone, Vincent Conin, Marie-Angela

Uni.lu HPC School 2019 PS12a: Machine / Deep learning I Keras/Tensorflow CPU/GPU Uni.lu High

Putting Deep Learning Models in Production Sahil Dua @sahildua2305 @sahildua2305 Lets

Interpretable &amp; Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech

Deep learning for HEP/NP at NERSC Jlab Machine Learning Workshop November 6 th 2018 Wahid Bhimji

DeepMPLS: Fast Analysis of MPLS Configurations Using Deep Learning Fabien Geyer 1,2 and Stefan

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2014 A. G.

Solving High-dimensional PDEs Using Deep Learning Jiequn Han The Program in Applied &

Deep Argument Inspection Linux Plumbers Conference 2019 Kees Cook <keescook@chromium.org>

Interpretable & Transparent Deep Learning Fraunhofer HHI, Machine Learning Group Wojciech