Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, - - PowerPoint PPT Presentation

bruno gavranovi c syco2 compositional deep learning
SMART_READER_LITE
LIVE PREVIEW

Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, - - PowerPoint PPT Presentation

Bruno Gavranovi c SYCO2 Compositional Deep Learning December 18, 2018 1 / 36 Compositional Deep Learning Bruno Gavranovi c Faculty of Electrical Engineering and Computing (FER) University of Zagreb, Croatia bruno.gavranovic@fer.hr


slide-1
SLIDE 1

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 1 / 36

slide-2
SLIDE 2

Compositional Deep Learning

Bruno Gavranovi´ c

Faculty of Electrical Engineering and Computing (FER) University of Zagreb, Croatia bruno.gavranovic@fer.hr

December 18, 2018

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 2 / 36

slide-3
SLIDE 3

Overview

Usage of rudimentary category theory

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 3 / 36

slide-4
SLIDE 4

Overview

Usage of rudimentary category theory Neural networks

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 3 / 36

slide-5
SLIDE 5

Overview

Usage of rudimentary category theory Neural networks

They’re compositional. You can stack layers and get better results They’re discovering (compositional) structures in data

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 3 / 36

slide-6
SLIDE 6

Overview

Usage of rudimentary category theory Neural networks

They’re compositional. You can stack layers and get better results They’re discovering (compositional) structures in data

Work in Progress

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 3 / 36

slide-7
SLIDE 7

Overview

Usage of rudimentary category theory Neural networks

They’re compositional. You can stack layers and get better results They’re discovering (compositional) structures in data

Work in Progress Experiments

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 3 / 36

slide-8
SLIDE 8

Generative modelling - State of the art - 2018

We can generate completely realistic looking images

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 4 / 36

slide-9
SLIDE 9

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 5 / 36

slide-10
SLIDE 10

Space of all possible images

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 6 / 36

slide-11
SLIDE 11

Space of all possible images

Natural images form a low dimensional manifold in its embedding space

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 6 / 36

slide-12
SLIDE 12

Generative Adversarial Networks

0http://dl-ai.blogspot.com/2017/08/gan-problems.html

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 7 / 36

slide-13
SLIDE 13

Generative Adversarial Networks

But we have minimal control over the network output!

0http://dl-ai.blogspot.com/2017/08/gan-problems.html

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 7 / 36

slide-14
SLIDE 14

Claim

It’s possible to assign semantics to the network training procedure using the same schemas from Functorial Data Migration1

1

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 8 / 36

slide-15
SLIDE 15

Claim

It’s possible to assign semantics to the network training procedure using the same schemas from Functorial Data Migration1 Functorial Data Migration Compositional Deep Learning F : C → − Set Para F is Fixed Learned

1

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 8 / 36

slide-16
SLIDE 16

Functorial data migration

Categorical schema generated by a graph G and a path equivalence relation: C := (G, ≃)

Beatle

  • Rock-and-roll

instrument

  • Played

1https://arxiv.org/abs/1803.05316

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 9 / 36

slide-17
SLIDE 17

Functorial data migration

Categorical schema generated by a graph G and a path equivalence relation: C := (G, ≃)

Beatle

  • Rock-and-roll

instrument

  • Played

A database instance is a functor F : C → Set Beatle Played George Lead guitar John Rhythm guitar Paul Bass guitar Ringo Drums Rock-and-roll instrument Bass guitar Drums Keyboard Lead guitar Rhythm guitar

1https://arxiv.org/abs/1803.05316

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 9 / 36

slide-18
SLIDE 18

Functorial data migration

Categorical schema generated by a graph G and a path equivalence relation: C := (G, ≃)

Beatle

  • Rock-and-roll

instrument

  • Played

A database instance is a functor F : C → Set Beatle Played George Lead guitar John Rhythm guitar Paul Bass guitar Ringo Drums Rock-and-roll instrument Bass guitar Drums Keyboard Lead guitar Rhythm guitar In databases, we have sets of data and clear mappings between them

1https://arxiv.org/abs/1803.05316

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 9 / 36

slide-19
SLIDE 19

Neural networks

In machine learning all we have is plenty of data, but no known implementations of functions Input DataSample1 DataSample2 DataSample3 DataSample4 Output ExpectedOutput1 ExpectedOutput2 ExpectedOutput3 ExpectedOutput4

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 10 / 36

slide-20
SLIDE 20

1https://arxiv.org/abs/1703.10593

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 11 / 36

slide-21
SLIDE 21

1https://arxiv.org/abs/1703.10593

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 11 / 36

slide-22
SLIDE 22

Style transfer

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 12 / 36

slide-23
SLIDE 23

Style transfer

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 12 / 36

slide-24
SLIDE 24

Style transfer

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 12 / 36

slide-25
SLIDE 25

Style transfer

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 12 / 36

slide-26
SLIDE 26

Style transfer

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 12 / 36

slide-27
SLIDE 27

Style transfer

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 12 / 36

slide-28
SLIDE 28

CycleGAN

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 13 / 36

slide-29
SLIDE 29

Previous work

Backprop as Functor

Compositional perspective on supervised learning Category of learners Learn Category of differentiable parametrized functions Para

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 14 / 36

slide-30
SLIDE 30

Previous work

Backprop as Functor

Compositional perspective on supervised learning Category of learners Learn Category of differentiable parametrized functions Para

The Simple Essence of Automatic Differentiation

Compositional, side-effect free way of performing mode-independent automatic differentiation

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 14 / 36

slide-31
SLIDE 31

Category of differentiable parametrized functions

Para:

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-32
SLIDE 32

Category of differentiable parametrized functions

Para: Objects a, b, c, ... are Euclidean spaces

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-33
SLIDE 33

Category of differentiable parametrized functions

Para: Objects a, b, c, ... are Euclidean spaces For each two objects a, b, we specify a set Para(a, b) whose elements are differentiable functions of type P × A → B.

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-34
SLIDE 34

Category of differentiable parametrized functions

Para: Objects a, b, c, ... are Euclidean spaces For each two objects a, b, we specify a set Para(a, b) whose elements are differentiable functions of type P × A → B. For every object a, we specify an identity morphism ida ∈ Para(a, a), a function of type 1 × A → A, which is just a projection

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-35
SLIDE 35

Category of differentiable parametrized functions

Para: Objects a, b, c, ... are Euclidean spaces For each two objects a, b, we specify a set Para(a, b) whose elements are differentiable functions of type P × A → B. For every object a, we specify an identity morphism ida ∈ Para(a, a), a function of type 1 × A → A, which is just a projection For every three objects a, b, c and morphisms f ∈ Para(A, B) and g ∈ Para(B, C) one specifies a morphism g ◦ f ∈ Para(A, C) in the following way:

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-36
SLIDE 36

Category of differentiable parametrized functions

Para: Objects a, b, c, ... are Euclidean spaces For each two objects a, b, we specify a set Para(a, b) whose elements are differentiable functions of type P × A → B. For every object a, we specify an identity morphism ida ∈ Para(a, a), a function of type 1 × A → A, which is just a projection For every three objects a, b, c and morphisms f ∈ Para(A, B) and g ∈ Para(B, C) one specifies a morphism g ◦ f ∈ Para(A, C) in the following way:

  • : (Q × B → C) × (P × A → B) → ((P × Q) × A → C)

(1)

  • (g, f) = λ((p, q), a) → g(q, f(p, a))

(2)

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-37
SLIDE 37

Category of differentiable parametrized functions

Para: Objects a, b, c, ... are Euclidean spaces For each two objects a, b, we specify a set Para(a, b) whose elements are differentiable functions of type P × A → B. For every object a, we specify an identity morphism ida ∈ Para(a, a), a function of type 1 × A → A, which is just a projection For every three objects a, b, c and morphisms f ∈ Para(A, B) and g ∈ Para(B, C) one specifies a morphism g ◦ f ∈ Para(A, C) in the following way:

  • : (Q × B → C) × (P × A → B) → ((P × Q) × A → C)

(1)

  • (g, f) = λ((p, q), a) → g(q, f(p, a))

(2) I J

Q P A C

Note: Coherence conditions are valid only up to isomorphism! We can consider equivalence classes of morphisms or a consider Para

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 15 / 36

slide-38
SLIDE 38

Category of learners

Learn: Let A and B be sets. A supervised learning algorithm, or simply learner, A → B is a tuple (P, I, U, r) where P is a set, and I, U, and r are functions of types: P : P, I : P × A → B, U : P × A × B → P, r: P × A × B → A. Update: UI(p, a, b) := p − ε∇pEI(p, a, b) Request rI(p, a, b) := fa 1 αB ∇aEI(p, a, b)

  • ,

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 16 / 36

slide-39
SLIDE 39

Many overlapping notions

The update function UI(p, a, b) := p − ε∇pEI(p, a, b) is computing two different things.

It’s calcuating the gradient pg = ∇pEI(p, a, b) It’s computing the parameter update by the rule of stochastic gradient descent: (p, pg) → p − εpg.

Request function r in itself encodes the computation of ∇aEI. Inside both r and U is embedded a notion of a cost function, which is fixed for all learners.

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 17 / 36

slide-40
SLIDE 40

Many overlapping notions

The update function UI(p, a, b) := p − ε∇pEI(p, a, b) is computing two different things.

It’s calcuating the gradient pg = ∇pEI(p, a, b) It’s computing the parameter update by the rule of stochastic gradient descent: (p, pg) → p − εpg.

Request function r in itself encodes the computation of ∇aEI. Inside both r and U is embedded a notion of a cost function, which is fixed for all learners. Problem: These concepts are not separated into abstractions that reuse and compose well!

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 17 / 36

slide-41
SLIDE 41

The Simple Essence of Automatic Differentiation

“Category of differentiable functions” is tricky to get right in a computational setting!

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 18 / 36

slide-42
SLIDE 42

The Simple Essence of Automatic Differentiation

“Category of differentiable functions” is tricky to get right in a computational setting! Implementing an efficient composable differentiation framework is more art than science

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 18 / 36

slide-43
SLIDE 43

The Simple Essence of Automatic Differentiation

“Category of differentiable functions” is tricky to get right in a computational setting! Implementing an efficient composable differentiation framework is more art than science Chain rule isn’t compositional (g ◦ f)′(x) = g′(f(x)) · f ′(x)

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 18 / 36

slide-44
SLIDE 44

The Simple Essence of Automatic Differentiation

“Category of differentiable functions” is tricky to get right in a computational setting! Implementing an efficient composable differentiation framework is more art than science Chain rule isn’t compositional (g ◦ f)′(x) = g′(f(x)) · f ′(x)

Derivative of the composition can’t be expressed only as a composition of derivatives!

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 18 / 36

slide-45
SLIDE 45

The Simple Essence of Automatic Differentiation

“Category of differentiable functions” is tricky to get right in a computational setting! Implementing an efficient composable differentiation framework is more art than science Chain rule isn’t compositional (g ◦ f)′(x) = g′(f(x)) · f ′(x)

Derivative of the composition can’t be expressed only as a composition of derivatives!

You need to store output of every function you evaluate

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 18 / 36

slide-46
SLIDE 46

The Simple Essence of Automatic Differentiation

“Category of differentiable functions” is tricky to get right in a computational setting! Implementing an efficient composable differentiation framework is more art than science Chain rule isn’t compositional (g ◦ f)′(x) = g′(f(x)) · f ′(x)

Derivative of the composition can’t be expressed only as a composition of derivatives!

You need to store output of every function you evaluate Every deep learning framework has a carefully crafted implementation of side-effects

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 18 / 36

slide-47
SLIDE 47

The Simple Essence of Automatic Differentiation

Automatic differentiation - category D of differentiable functions

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 19 / 36

slide-48
SLIDE 48

The Simple Essence of Automatic Differentiation

Automatic differentiation - category D of differentiable functions Morphism A → B is a function of type a → b × (a ⊸ b)

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 19 / 36

slide-49
SLIDE 49

The Simple Essence of Automatic Differentiation

Automatic differentiation - category D of differentiable functions Morphism A → B is a function of type a → b × (a ⊸ b) Composition: g ◦ f = λa → let(b, f ′) = f(a), (c, g′) = g(b) in(c, g′ ◦ f ′)

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 19 / 36

slide-50
SLIDE 50

The Simple Essence of Automatic Differentiation

Automatic differentiation - category D of differentiable functions Morphism A → B is a function of type a → b × (a ⊸ b) Composition: g ◦ f = λa → let(b, f ′) = f(a), (c, g′) = g(b) in(c, g′ ◦ f ′) Structure for splitting and joining wires

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 19 / 36

slide-51
SLIDE 51

The Simple Essence of Automatic Differentiation

Automatic differentiation - category D of differentiable functions Morphism A → B is a function of type a → b × (a ⊸ b) Composition: g ◦ f = λa → let(b, f ′) = f(a), (c, g′) = g(b) in(c, g′ ◦ f ′) Structure for splitting and joining wires Generalization to more than just linear maps

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 19 / 36

slide-52
SLIDE 52

The Simple Essence of Automatic Differentiation

Automatic differentiation - category D of differentiable functions Morphism A → B is a function of type a → b × (a ⊸ b) Composition: g ◦ f = λa → let(b, f ′) = f(a), (c, g′) = g(b) in(c, g′ ◦ f ′) Structure for splitting and joining wires Generalization to more than just linear maps

Forward-mode automatic differentiation Reverse-mode automatic differentiation Backpropagation - DDual→+

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 19 / 36

slide-53
SLIDE 53

BackpropFunctor + SimpleAD

BackpropFunctor doesn’t mention categorical differentiation

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 20 / 36

slide-54
SLIDE 54

BackpropFunctor + SimpleAD

BackpropFunctor doesn’t mention categorical differentiation SimpleAD doesn’t talk about learning itself

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 20 / 36

slide-55
SLIDE 55

BackpropFunctor + SimpleAD

BackpropFunctor doesn’t mention categorical differentiation SimpleAD doesn’t talk about learning itself Both are talking about similar concepts

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 20 / 36

slide-56
SLIDE 56

BackpropFunctor + SimpleAD

BackpropFunctor doesn’t mention categorical differentiation SimpleAD doesn’t talk about learning itself Both are talking about similar concepts For each P × A → B in Hom(a, b) in Para, we’d like to specify a set of functions of type P × A → B × ((P × A) ⊸ B) instead of just P × A → B

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 20 / 36

slide-57
SLIDE 57

BackpropFunctor + SimpleAD

BackpropFunctor doesn’t mention categorical differentiation SimpleAD doesn’t talk about learning itself Both are talking about similar concepts For each P × A → B in Hom(a, b) in Para, we’d like to specify a set of functions of type P × A → B × ((P × A) ⊸ B) instead of just P × A → B Separate the structure needed for parametricity and structure needed for composable differentiability

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 20 / 36

slide-58
SLIDE 58

BackpropFunctor + SimpleAD

BackpropFunctor doesn’t mention categorical differentiation SimpleAD doesn’t talk about learning itself Both are talking about similar concepts For each P × A → B in Hom(a, b) in Para, we’d like to specify a set of functions of type P × A → B × ((P × A) ⊸ B) instead of just P × A → B Separate the structure needed for parametricity and structure needed for composable differentiability Solution: ?

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 20 / 36

slide-59
SLIDE 59

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃)

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-60
SLIDE 60

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃)

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-61
SLIDE 61

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃)

Horse

  • Zebra
  • f

g

f . g = idh g . f = idz

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-62
SLIDE 62

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃) Learn a functor P : C → Para

Horse

  • Zebra
  • f

g

f . g = idh g . f = idz

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-63
SLIDE 63

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃) Learn a functor P : C → Para

Start with a functor Free(G) → Para

Horse

  • Zebra
  • f

g

f . g = idh g . f = idz

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-64
SLIDE 64

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃) Learn a functor P : C → Para

Start with a functor Free(G) → Para Iteratively update it using samples from your datasets The learned functor will also preserve ≃

Horse

  • Zebra
  • f

g

f . g = idh g . f = idz

R64×64×3

  • R64×64×3
  • P f

P g Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-65
SLIDE 65

Main result

Specify the semantics of your datasets with a categorical schema C := (G, ≃) Learn a functor P : C → Para

Start with a functor Free(G) → Para Iteratively update it using samples from your datasets The learned functor will also preserve ≃

Novel regularization mechanism for neural networks.

Horse

  • Zebra
  • f

g

f . g = idh g . f = idz

R64×64×3

  • R64×64×3
  • P f

P g Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 21 / 36

slide-66
SLIDE 66

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-67
SLIDE 67

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-68
SLIDE 68

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-69
SLIDE 69

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Every morphism in Para is a function parametrized by some P

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-70
SLIDE 70

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Every morphism in Para is a function parametrized by some P Initializing P randomly => “initializing” a morphism

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-71
SLIDE 71

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Every morphism in Para is a function parametrized by some P Initializing P randomly => “initializing” a morphism

Get data samples da, db, ... corresponding to every object in C and in every iteration:

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-72
SLIDE 72

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Every morphism in Para is a function parametrized by some P Initializing P randomly => “initializing” a morphism

Get data samples da, db, ... corresponding to every object in C and in every iteration:

For every morphism (f : A → B) in the transitive reduction of morphisms in C, find Pf and minimize the distance between (Pf)(da) and the corresponding image manifold

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-73
SLIDE 73

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Every morphism in Para is a function parametrized by some P Initializing P randomly => “initializing” a morphism

Get data samples da, db, ... corresponding to every object in C and in every iteration:

For every morphism (f : A → B) in the transitive reduction of morphisms in C, find Pf and minimize the distance between (Pf)(da) and the corresponding image manifold For all path equations from A → B where f = g, compute both f(Ra) and g(Ra). Calculate the distance d = ||f(Ra) − g(Ra)||. Minimize d and update all parameters of f and g.

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-74
SLIDE 74

R64×64×3

  • R64×64×3
  • P f

P g

Start with a functor Free(G) → Para

Specify how it acts on objects Start with randomly initialized morphisms

Every morphism in Para is a function parametrized by some P Initializing P randomly => “initializing” a morphism

Get data samples da, db, ... corresponding to every object in C and in every iteration:

For every morphism (f : A → B) in the transitive reduction of morphisms in C, find Pf and minimize the distance between (Pf)(da) and the corresponding image manifold For all path equations from A → B where f = g, compute both f(Ra) and g(Ra). Calculate the distance d = ||f(Ra) − g(Ra)||. Minimize d and update all parameters of f and g.

The path equation regularization term forces the optimization procedure to select functors which preserve the path equivalence relation and, thus, C

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 22 / 36

slide-75
SLIDE 75

Some possible schemas

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 23 / 36

slide-76
SLIDE 76

Some possible schemas

This procedure generalizes several existing network architectures

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 23 / 36

slide-77
SLIDE 77

Some possible schemas

This procedure generalizes several existing network architectures But it also allows us to ask, what other interesting schemas are possible?

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 23 / 36

slide-78
SLIDE 78

Some possible schemas

Latent sp.

  • Image
  • f

no equations

Figure: GAN

Horse

  • Zebra
  • f

g

f . g = idh g . f = idz

Figure: CycleGAN

A

  • B
  • C
  • f

g h f . h = f . g

Figure: Equalizer

A

  • B × C
  • f

g

f . g = idA g . f = idB×C

Figure: Product

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 24 / 36

slide-79
SLIDE 79

Equalizer schema

A

  • B
  • C
  • f

g h

f . h = f . g

Given two networks h, g : B → C, find a subset B′ ⊆ B such that B′ = {b ∈ B | h(b) = g(b)}

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 25 / 36

slide-80
SLIDE 80

Consider two sets of images

Left: Background of color X with a circle with fixed size and position of color Y Right: Background of color Z

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 26 / 36

slide-81
SLIDE 81

Product schema

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 27 / 36

slide-82
SLIDE 82

Product schema

A

  • B × C
  • f

g

f . g = idA g . f = idB×C

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 27 / 36

slide-83
SLIDE 83

Product schema

A

  • B × C
  • f

g

f . g = idA g . f = idB×C

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 27 / 36

slide-84
SLIDE 84

Product schema

A

  • B × C
  • f

g

f . g = idA g . f = idB×C

Same learning algorithm can learn to remove both types of objects

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 27 / 36

slide-85
SLIDE 85

Experiments

CelebA dataset of 200K images of human faces

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 28 / 36

slide-86
SLIDE 86

Experiments

CelebA dataset of 200K images of human faces Conveniently, there is a “glasses” annotation

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 28 / 36

slide-87
SLIDE 87

Experiments

PC :=

R32×32×3

  • R32×32×3×R100
  • P f

P g

f . g = idH g . f = idZ

Collection of neural networks with total 40m parameters 7h training on a GeForce GTX 1080 Successful results

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 29 / 36

slide-88
SLIDE 88

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 30 / 36

slide-89
SLIDE 89

Experiments

Figure: Same image, different Z vector

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 31 / 36

slide-90
SLIDE 90

Experiments

Figure: Same Z vector, different image

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 32 / 36

slide-91
SLIDE 91

Experiments

Figure: Top row: original image, bottom row: Removed glasses

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 33 / 36

slide-92
SLIDE 92

Conclusions

Specify a collection of neural networks which are closed under composition

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 34 / 36

slide-93
SLIDE 93

Conclusions

Specify a collection of neural networks which are closed under composition Specify composition invariants

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 34 / 36

slide-94
SLIDE 94

Conclusions

Specify a collection of neural networks which are closed under composition Specify composition invariants Given the right data and parametrized functions of sufficient complexity, it’s possible to train them with the right inductive bias

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 34 / 36

slide-95
SLIDE 95

Conclusions

Specify a collection of neural networks which are closed under composition Specify composition invariants Given the right data and parametrized functions of sufficient complexity, it’s possible to train them with the right inductive bias Common language to talk about semantics of data and training procedure

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 34 / 36

slide-96
SLIDE 96

Future work

This is still rough around the edges

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 35 / 36

slide-97
SLIDE 97

Future work

This is still rough around the edges What other schemas can we think of?

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 35 / 36

slide-98
SLIDE 98

Future work

This is still rough around the edges What other schemas can we think of? Can we quantify type of informaton we’re giving to the network using these schemas?

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 35 / 36

slide-99
SLIDE 99

Future work

This is still rough around the edges What other schemas can we think of? Can we quantify type of informaton we’re giving to the network using these schemas? Do data migration functors make sense in the context of neural networks?

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 35 / 36

slide-100
SLIDE 100

Future work

This is still rough around the edges What other schemas can we think of? Can we quantify type of informaton we’re giving to the network using these schemas? Do data migration functors make sense in the context of neural networks? Can game-theoretic properties of Generative Adversarial Networks be expressed categorically?

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 35 / 36

slide-101
SLIDE 101

Future work

This is still rough around the edges What other schemas can we think of? Can we quantify type of informaton we’re giving to the network using these schemas? Do data migration functors make sense in the context of neural networks? Can game-theoretic properties of Generative Adversarial Networks be expressed categorically? Coding these ideas in Idris

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 35 / 36

slide-102
SLIDE 102

Thank you!

Bruno Gavranovi´ c Faculty of Electrical Engineering and Computing University of Zagreb bruno.gavranovic@fer.hr Feel free to drop me an email with any questions!

Bruno Gavranovi´ c SYCO2 Compositional Deep Learning December 18, 2018 36 / 36