AnIsabelleFormalization oftheExpressiveness ofDeepLearning - - PowerPoint PPT Presentation

anisabelleformalization oftheexpressiveness ofdeeplearning
SMART_READER_LITE
LIVE PREVIEW

AnIsabelleFormalization oftheExpressiveness ofDeepLearning - - PowerPoint PPT Presentation

AnIsabelleFormalization oftheExpressiveness ofDeepLearning Alexander Bentkamp Vrije Universiteit Amsterdam Jasmin Blanchette Vrije Universiteit Amsterdam Dietrich Klakow Universitt des Saarlandes Motivation Case study of proof


slide-1
SLIDE 1

AnIsabelleFormalization

  • ftheExpressiveness
  • fDeepLearning

Alexander Bentkamp

Vrije Universiteit Amsterdam

Jasmin Blanchette

Vrije Universiteit Amsterdam

Dietrich Klakow

Universität des Saarlandes

slide-2
SLIDE 2

Motivation

1

◮ Case study of proof assistance in the field of

machine learning

◮ Development of general-purpose libraries ◮ Study of the mathematics behind deep learning

slide-3
SLIDE 3

Motivation

1

◮ Case study of proof assistance in the field of

machine learning

◮ Development of general-purpose libraries ◮ Study of the mathematics behind deep learning

Just wanted to formalize something!

slide-4
SLIDE 4

Fundamental Theorem of Network Capacity

(Cohen, Sharir & Shashua, 2015)

2

Shallow network

x f(x)

needs exponentially more nodes to express the same function as deep network

x f(x)

for the vast majority of functions*

* except for a Lebesgue null set

slide-5
SLIDE 5

Deep convolutional arithmetic circuit

3

1x1 Convolution Multiplication by a weight matrix Pooling Componentwise Multiplication Output Input Represen- tational Layer Non-linear functions M M M M M M M M r0 r0 r0 r0 r0 r0 r0 r0 r0 r0 r0 r0 r1 r1 r1 r1 r1 r1 r2 r2 r2 Y

slide-6
SLIDE 6

Shallow convolutional arithmetic circuit

4

1x1 Convolution Multiplication by a weight matrix Pooling Componentwise Multiplication Output Input Representational Layer Non-linear functions M M M M M M M M Z Z Z Z Z Z Z Z Z Y

slide-7
SLIDE 7

Lebesgue measure

5

definition lborel :: (α :: euclidean_space) measure vs. definition lborelf :: nat ⇒ (nat ⇒ real) measure where lborelf n =

  • M b ∈ {.. < n}. (lborel :: real measure)

Isabelle’s standard probability library My new definition

slide-8
SLIDE 8

Matrices

6

◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015)

slide-9
SLIDE 9

Matrices

6

◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) matrix dimension fixed by the type

slide-10
SLIDE 10

Matrices

6

◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) matrix dimension fixed by the type lacking many necessary lemmas

slide-11
SLIDE 11

Matrices

6

◮ Isabelle’s multivariate analysis library ◮ Sternagel & Thiemann’s matrix library (Archive of Formal Proofs, 2010) ◮ Thiemann & Yamada’s matrix library (Archive of Formal Proofs, 2015) matrix dimension fixed by the type lacking many necessary lemmas

I added definitions and lemmas for

◮ matrix rank ◮ submatrices

slide-12
SLIDE 12

Multivariate polynomials

7 Lochbihler & Haftmann’s polynomial library I added various definitions, lemmas, and the theorem

◮ "Zero sets of polynomials ≡ 0 are Lebesgue null sets."

theorem: fixes p :: real mpoly assumes p = 0 and vars p ⊆ {.. < n} shows {x ∈ space (lborelf n). insertion x p = 0} ∈ null_sets (lborelf n)

slide-13
SLIDE 13

My tensor library

8

typedef α tensor = {(ds :: nat list, as :: α list). length as = prod_list ds}

◮ addition, multiplication by scalars,

tensor product, matricization, CP-rank

◮ Powerful induction principle uses subtensors:

◮ Slices a d1 × d2 × · · · × dN tensor into

d1 subtensors of dimension d2 × · · · × dN

definition subtensor :: α tensor ⇒ nat ⇒ α tensor

slide-14
SLIDE 14

The proof on one slide

9

Def1 Define a tensor A(w) that describes the function expressed by the deep network with weights w Lem1 The CP-rank of A(w) indicates how many nodes the shallow network needs to express the same function Def2 Define a polynomial p with the deep network weights w as variables Lem2 If p(w) = 0, then A(w) has a high CP-rank Lem3 p(w) = 0 almost everywhere

slide-15
SLIDE 15

Restructuring the proof

10 Before

Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Lem2 Polynomials, Matrices Def2 Polynomials, Tensors Lem3a Matrices, Tensors Lem3b Measures, Polynomials

After

Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Def2 Polynomials, Tensors Lem2 Polynomials, Matrices Induction over the deep network Lem3a Matrices, Tensors Lem3b Measures, Polynomials

slide-16
SLIDE 16

Restructuring the proof

10 Before*

Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Lem2 Polynomials, Matrices Def2 Polynomials, Tensors Lem3a Matrices, Tensors Lem3b Measures, Polynomials

After*

Def1 Tensors Lem1 Tensors, shallow network Induction over the deep network Def2 Polynomials, Tensors Lem2 Polynomials, Matrices Induction over the deep network Lem3a Matrices, Tensors Lem3b Measures, Polynomials * except for a Lebesgue null set * except for a zero set of a polynomial

slide-17
SLIDE 17

Type for convolutional arithmetic circuits

11

datatype α cac = Input nat | Conv α (α cac) | Pool (α cac) (α cac)

slide-18
SLIDE 18

Type for convolutional arithmetic circuits

11

datatype α cac = Input nat | Conv α (α cac) | Pool (α cac) (α cac) fun insert_weights :: (nat × nat) cac

  • network without weights

⇒ (nat ⇒ real)

  • weights

⇒ real mat cac

  • network with weights
slide-19
SLIDE 19

Type for convolutional arithmetic circuits

11

datatype α cac = Input nat | Conv α (α cac) | Pool (α cac) (α cac) fun insert_weights :: (nat × nat) cac

  • network without weights

⇒ (nat ⇒ real)

  • weights

⇒ real mat cac

  • network with weights

fun evaluate_net :: real mat cac

  • network

⇒ real vec list

  • input

⇒ real vec

  • utput
slide-20
SLIDE 20

Deep network parameters

12

locale deep_net_params = fixes rs :: nat list assumes deep: length rs ≥ 3 and no_zeros: r. r ∈ set rs = ⇒ 0 < r

slide-21
SLIDE 21

Deep and shallow networks

13 deep_net =

Conv

(r0×r1)

Pool Conv

(r1×r2)

Pool Conv

(r2×r3)

Input Conv

(r2×r3)

Input Conv

(r1×r2)

Pool Conv

(r2×r3)

Input Conv

(r2×r3)

Input

shallow_net Z =

Conv

(r0×Z)

Pool Pool Pool Conv

(Z×r3)

Input Conv

(Z×r3)

Input Conv

(Z×r3)

Input Conv

(Z×r3)

Input

slide-22
SLIDE 22

14

Def1 Define a tensor A(w) that describes the function expressed by the deep network with weights w

definition A :: (nat ⇒ real) ⇒ real tensor where A w = tensor_from_net (insert_weights deep_net w)

The function tensor_from_net represents networks by tensors:

fun tensor_from_net :: real mat cac ⇒ real tensor

If two networks express the same function, the representing tensors are the same

slide-23
SLIDE 23

15

Lem1 The CP-rank of A(w) indicates how many nodes the shallow network needs to express the same function

lemma cprank_shallow_model: shows cprank (tensor_from_net (insert_weights w (shallow_net Z))) ≤ Z

◮ Can be proved by definition of the CP-rank

slide-24
SLIDE 24

16

Def2 Define a polynomial p with the deep network weights w as variables

Easy to define as a function:

definition pfunc :: (nat ⇒ real) ⇒ real where pfunc w = det (submatrix [Ai w] rows_with_1 rows_with_1)

But we must prove that pfunc is a polynomial function

slide-25
SLIDE 25

17

Lem2 If p(w) = 0, then A(w) has a high CP-rank

lemma assumes pfunc w = 0 shows rN_half ≤ cprank (A w)

◮ Follows directly from definition of p using

properties of matricization and of matrix rank

slide-26
SLIDE 26

18

Lem3 p(w) = 0 almost everywhere

Zero sets of polynomials ≡ 0 are Lebesgue null sets = ⇒ It suffices to show that p ≡ 0 We need a weight configuration w with p(w) = 0

slide-27
SLIDE 27

Final theorem

19

theorem ∀❛❡ w❞ ✇.r.t. lborelf weight_space_dim. ∄ws Z. Z < rN_half ∧ ∀is. input_correct is −

evaluate_net (insert_weights deep_net w❞) is = evaluate_net (insert_weights (shallow_net Z) ws) is

slide-28
SLIDE 28

Conclusion

20 Outcome

◮ First formalization on deep learning Substantial development (∼ 7000 lines including developed libraries) ◮ Development of libraries New tensor library and extension of other libraries ◮ Generalization of the theorem Proof restructuring led to a more precise result

More information: http://matryoshka.gforge.inria.fr/#Publications

AITP abstract Archive of Formal Proofs entry ITP paper draft (coming soon) M.Sc. thesis