AnIsabelleFormalization
- ftheExpressiveness
- fDeepLearning
Alexander Bentkamp
Vrije Universiteit Amsterdam
Jasmin Blanchette
Vrije Universiteit Amsterdam
Dietrich Klakow
Universität des Saarlandes
AnIsabelleFormalization oftheExpressiveness ofDeepLearning - - PowerPoint PPT Presentation
AnIsabelleFormalization oftheExpressiveness ofDeepLearning Alexander Bentkamp Vrije Universiteit Amsterdam Jasmin Blanchette Vrije Universiteit Amsterdam Dietrich Klakow Universitt des Saarlandes Motivation Case study of proof
Alexander Bentkamp
Vrije Universiteit Amsterdam
Jasmin Blanchette
Vrije Universiteit Amsterdam
Dietrich Klakow
Universität des Saarlandes
1
◮ Case study of proof assistance in the field of
machine learning
◮ Development of general-purpose libraries ◮ Study of the mathematics behind deep learning
1
◮ Case study of proof assistance in the field of
machine learning
◮ Development of general-purpose libraries ◮ Study of the mathematics behind deep learning
(Cohen, Sharir & Shashua, 2015)
2
Shallow network
x f(x)
needs exponentially more nodes to express the same function as deep network
x f(x)
for the vast majority of functions*
* except for a Lebesgue null set
3
1x1 Convolution Multiplication by a weight matrix Pooling Componentwise Multiplication Output Input Represen- tational Layer Non-linear functions M M M M M M M M r0 r0 r0 r0 r0 r0 r0 r0 r0 r0 r0 r0 r1 r1 r1 r1 r1 r1 r2 r2 r2 Y
4
1x1 Convolution Multiplication by a weight matrix Pooling Componentwise Multiplication Output Input Representational Layer Non-linear functions M M M M M M M M Z Z Z Z Z Z Z Z Z Y
5
definition lborel :: (α :: euclidean_space) measure vs. definition lborelf :: nat ⇒ (nat ⇒ real) measure where lborelf n =
Isabelle’s standard probability library My new definition
6
◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015)
6
◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) matrix dimension fixed by the type
6
◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) matrix dimension fixed by the type lacking many necessary lemmas
6
◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) matrix dimension fixed by the type lacking many necessary lemmas
I added definitions and lemmas for
◮ matrix rank ◮ submatrices
7 Lochbihler & Haftmann’s polynomial library I added various definitions, lemmas, and the theorem
◮ "Zero sets of polynomials ≡ 0 are Lebesgue null sets."
theorem: fixes p :: real mpoly assumes p = 0 and vars p ⊆ {.. < n} shows {x ∈ space (lborelf n). insertion x p = 0} ∈ null_sets (lborelf n)
8
typedef α tensor = {(ds :: nat list, as :: α list). length as = prod_list ds}
◮ addition, multiplication by scalars,
tensor product, matricization, CP-rank
◮ Powerful induction principle uses subtensors:
◮ Slices a d1 × d2 × · · · × dN tensor into
d1 subtensors of dimension d2 × · · · × dN
definition subtensor :: α tensor ⇒ nat ⇒ α tensor
9
Def1 Define a tensor A(w) that describes the function expressed by the deep network with weights w Lem1 The CP-rank of A(w) indicates how many nodes the shallow network needs to express the same function Def2 Define a polynomial p with the deep network weights w as variables Lem2 If p(w) = 0, then A(w) has a high CP-rank Lem3 p(w) = 0 almost everywhere
10 Before
Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Lem2 Polynomials, Matrices Def2 Polynomials, Tensors Lem3a Matrices, Tensors Lem3b Measures, Polynomials
After
Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Def2 Polynomials, Tensors Lem2 Polynomials, Matrices Induction over the deep network Lem3a Matrices, Tensors Lem3b Measures, Polynomials
10 Before*
Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Lem2 Polynomials, Matrices Def2 Polynomials, Tensors Lem3a Matrices, Tensors Lem3b Measures, Polynomials
After*
Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Def2 Polynomials, Tensors Lem2 Polynomials, Matrices Induction over the deep network Lem3a Matrices, Tensors Lem3b Measures, Polynomials * except for a Lebesgue null set * except for a zero set of a polynomial
11
datatype α cac = Input nat | Conv α (α cac) | Pool (α cac) (α cac)
11
datatype α cac = Input nat | Conv α (α cac) | Pool (α cac) (α cac) fun insert_weights :: (nat × nat) cac
⇒ (nat ⇒ real)
⇒ real mat cac
11
datatype α cac = Input nat | Conv α (α cac) | Pool (α cac) (α cac) fun insert_weights :: (nat × nat) cac
⇒ (nat ⇒ real)
⇒ real mat cac
fun evaluate_net :: real mat cac
⇒ real vec list
⇒ real vec
12
locale deep_net_params = fixes rs :: nat list assumes deep: length rs ≥ 3 and no_zeros: r. r ∈ set rs = ⇒ 0 < r
13 deep_net =
Conv
(r0×r1)
Pool Conv
(r1×r2)
Pool Conv
(r2×r3)
Input Conv
(r2×r3)
Input Conv
(r1×r2)
Pool Conv
(r2×r3)
Input Conv
(r2×r3)
Input
shallow_net Z =
Conv
(r0×Z)
Pool Pool Pool Conv
(Z×r3)
Input Conv
(Z×r3)
Input Conv
(Z×r3)
Input Conv
(Z×r3)
Input
14
Def1 Define a tensor A(w) that describes the function expressed by the deep network with weights w
definition A :: (nat ⇒ real) ⇒ real tensor where A w = tensor_from_net (insert_weights deep_net w)
The function tensor_from_net represents networks by tensors:
fun tensor_from_net :: real mat cac ⇒ real tensor
If two networks express the same function, the representing tensors are the same
15
Lem1 The CP-rank of A(w) indicates how many nodes the shallow network needs to express the same function
lemma cprank_shallow_model: shows cprank (tensor_from_net (insert_weights w (shallow_net Z))) ≤ Z
◮ Can be proved by definition of the CP-rank
16
Def2 Define a polynomial p with the deep network weights w as variables
Easy to define as a function:
definition pfunc :: (nat ⇒ real) ⇒ real where pfunc w = det (submatrix [Ai w] rows_with_1 rows_with_1)
But we must prove that pfunc is a polynomial function
17
Lem2 If p(w) = 0, then A(w) has a high CP-rank
lemma assumes pfunc w = 0 shows rN_half ≤ cprank (A w)
◮ Follows directly from definition of p using
properties of matricization and of matrix rank
18
Lem3 p(w) = 0 almost everywhere
Zero sets of polynomials ≡ 0 are Lebesgue null sets = ⇒ It suffices to show that p ≡ 0 We need a weight configuration w with p(w) = 0
19
theorem ∀❛❡ w❞ ✇.r.t. lborelf weight_space_dim. ∄ws Z. Z < rN_half ∧ ∀is. input_correct is −
evaluate_net (insert_weights deep_net w❞) is = evaluate_net (insert_weights (shallow_net Z) ws) is
20 Outcome
◮ First formalization on deep learning Substantial development (∼ 7000 lines including developed libraries) ◮ Development of libraries New tensor library and extension of other libraries ◮ Generalization of the theorem Proof restructuring led to a more precise result
More information: http://matryoshka.gforge.inria.fr/#Publications
AITP abstract Archive of Formal Proofs entry ITP paper draft (coming soon) M.Sc. thesis