Bundles, Lenses & Machine Learning Motivation Backprop as - - PowerPoint PPT Presentation

bundles lenses machine learning
SMART_READER_LITE
LIVE PREVIEW

Bundles, Lenses & Machine Learning Motivation Backprop as - - PowerPoint PPT Presentation

Bundles, Lenses & Machine Learning Bundles, Lenses & Machine Learning Motivation Backprop as Functor Jules Hedges 1 Bundles Putting it joint work with together Brendan Fong 2 Eliana Lorch 3 David Spivak 2 1 Max Planck Institute


slide-1
SLIDE 1

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles, Lenses & Machine Learning

Jules Hedges1 joint work with Brendan Fong2 Eliana Lorch3 David Spivak2

1Max Planck Institute for Mathematics in the Sciences 2MIT 3University of Oxford

SYCO 5, Birmingham

slide-2
SLIDE 2

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles, Lenses & Machine Learning

Jules Hedges1 joint work with Brendan Fong2 Eliana Lorch3 David Spivak2

1Max Planck Institute for Mathematics in the Sciences 2MIT 3University of Oxford

SYCO 5, Birmingham

Featuring zero string diagrams :(

slide-3
SLIDE 3

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Motivation

Machine learning is categorical in 2 different ways: Backprop As Functor (compositional description of ML with monoidal categories) + ML as differential geometry

slide-4
SLIDE 4

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Motivation

Machine learning is categorical in 2 different ways: Backprop As Functor (compositional description of ML with monoidal categories) + ML as differential geometry In this talk: smoosh them together (why? Why not)

slide-5
SLIDE 5

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Motivation

Machine learning is categorical in 2 different ways: Backprop As Functor (compositional description of ML with monoidal categories) + ML as differential geometry In this talk: smoosh them together (why? Why not)

It clarifies Backprop as Functor more than anything else

slide-6
SLIDE 6

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Open learners

Definition (Fong, Spivak & Tuy´ eras) : An open learner X → Y consists of:

  • A set P of parameters
slide-7
SLIDE 7

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Open learners

Definition (Fong, Spivak & Tuy´ eras) : An open learner X → Y consists of:

  • A set P of parameters
  • A function I : P × X → Y (the implementation)
slide-8
SLIDE 8

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Open learners

Definition (Fong, Spivak & Tuy´ eras) : An open learner X → Y consists of:

  • A set P of parameters
  • A function I : P × X → Y (the implementation)
  • A function u : P × X × Y → P (the update)
slide-9
SLIDE 9

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Open learners

Definition (Fong, Spivak & Tuy´ eras) : An open learner X → Y consists of:

  • A set P of parameters
  • A function I : P × X → Y (the implementation)
  • A function u : P × X × Y → P (the update)
  • A function r : P × X × Y → X (the request)
slide-10
SLIDE 10

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Open learners

Definition (Fong, Spivak & Tuy´ eras) : An open learner X → Y consists of:

  • A set P of parameters
  • A function I : P × X → Y (the implementation)
  • A function u : P × X × Y → P (the update)
  • A function r : P × X × Y → X (the request)

Composition of open learners is fiddly

slide-11
SLIDE 11

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Open learners

Definition (Fong, Spivak & Tuy´ eras) : An open learner X → Y consists of:

  • A set P of parameters
  • A function I : P × X → Y (the implementation)
  • A function u : P × X × Y → P (the update)
  • A function r : P × X × Y → X (the request)

Composition of open learners is fiddly They form a symmetric monoidal category called Learn

who cares about monoidal bicategories

slide-12
SLIDE 12

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A lens X → Y is a function X → Y and a function X ×Y → X

slide-13
SLIDE 13

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A lens X → Y is a function X → Y and a function X ×Y → X Composition of lenses is also fiddly!

slide-14
SLIDE 14

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A lens X → Y is a function X → Y and a function X ×Y → X Composition of lenses is also fiddly! Theorem (Fong & Johnson): Open learners compose by pullback of lenses: P × Q × X P × X Q × Y X Y Z

  • π2

ℓ1 π2 ℓ2

slide-15
SLIDE 15

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The Para construction

Let C be a monoidal category Define a category1 Para(C) by:

1who cares about monoidal bicategories

slide-16
SLIDE 16

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The Para construction

Let C be a monoidal category Define a category1 Para(C) by:

  • Objects: objects of C
  • Morphisms X → Y : pair (A, f ), A object of C,

f : X ⊗ A → Y

1who cares about monoidal bicategories

slide-17
SLIDE 17

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The Para construction

Let C be a monoidal category Define a category1 Para(C) by:

  • Objects: objects of C
  • Morphisms X → Y : pair (A, f ), A object of C,

f : X ⊗ A → Y

  • Identity on X: (I, X ⊗ I

∼ =

− → X)

1who cares about monoidal bicategories

slide-18
SLIDE 18

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The Para construction

Let C be a monoidal category Define a category1 Para(C) by:

  • Objects: objects of C
  • Morphisms X → Y : pair (A, f ), A object of C,

f : X ⊗ A → Y

  • Identity on X: (I, X ⊗ I

∼ =

− → X)

  • Composition of (B, g) ◦ (A, f ):

(A ⊗ B, X ⊗ A ⊗ B

f ⊗B

− − − → Y ⊗ B

g

− → Z)

1who cares about monoidal bicategories

slide-19
SLIDE 19

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The Para construction

Let C be a monoidal category Define a category1 Para(C) by:

  • Objects: objects of C
  • Morphisms X → Y : pair (A, f ), A object of C,

f : X ⊗ A → Y

  • Identity on X: (I, X ⊗ I

∼ =

− → X)

  • Composition of (B, g) ◦ (A, f ):

(A ⊗ B, X ⊗ A ⊗ B

f ⊗B

− − − → Y ⊗ B

g

− → Z) ⊗ lifts to a monoidal product on Para(C)

1who cares about monoidal bicategories

slide-20
SLIDE 20

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The structure of Para(−)

A lax symmetric monoidal functor F : C → D lifts to Para(F) : Para(D) → Para(D) by F(A, f ) : F(X) ⊗ F(A)

ϕ

− → F(X ⊗ A)

F(f )

− − − → F(Y )

slide-21
SLIDE 21

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The structure of Para(−)

A lax symmetric monoidal functor F : C → D lifts to Para(F) : Para(D) → Para(D) by F(A, f ) : F(X) ⊗ F(A)

ϕ

− → F(X ⊗ A)

F(f )

− − − → F(Y ) Proposition (probably): Para(−) defines a monad on [symmetric monoidal categories, lax symmetric monoidal functors]

slide-22
SLIDE 22

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Backprop as Functor

Theorem (Fong, Spivak & Tuy´ eras): Fix a learning rate ε > 0 and a differentiable cost function2 C : R2 → R.

2such that every ∂ ∂y C(x, y) is invertible

slide-23
SLIDE 23

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Backprop as Functor

Theorem (Fong, Spivak & Tuy´ eras): Fix a learning rate ε > 0 and a differentiable cost function2 C : R2 → R. Then there is a symmetric monoidal functor Fε,C : Para(Euc) → Learn defined by

  • On objects X → underlying set of X

2such that every ∂ ∂y C(x, y) is invertible

slide-24
SLIDE 24

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Backprop as Functor

Theorem (Fong, Spivak & Tuy´ eras): Fix a learning rate ε > 0 and a differentiable cost function2 C : R2 → R. Then there is a symmetric monoidal functor Fε,C : Para(Euc) → Learn defined by

  • On objects X → underlying set of X
  • On morphisms f : P × X → Y :
  • Parameters P
  • Implementation I = f

2such that every ∂ ∂y C(x, y) is invertible

slide-25
SLIDE 25

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Backprop as Functor

Theorem (Fong, Spivak & Tuy´ eras): Fix a learning rate ε > 0 and a differentiable cost function2 C : R2 → R. Then there is a symmetric monoidal functor Fε,C : Para(Euc) → Learn defined by

  • On objects X → underlying set of X
  • On morphisms f : P × X → Y :
  • Parameters P
  • Implementation I = f
  • Update U(a, x, y) = a − ε∇aE(a, x, y)
  • Request r(a, x, y) = (too awkward to write down)

where E(a, x, y) = dim(Y )

i=1

C(f (p, x)i, yi) is total error

2such that every ∂ ∂y C(x, y) is invertible

slide-26
SLIDE 26

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Backprop as Functor

Theorem (Fong, Spivak & Tuy´ eras): Fix a learning rate ε > 0 and a differentiable cost function2 C : R2 → R. Then there is a symmetric monoidal functor Fε,C : Para(Euc) → Learn defined by

  • On objects X → underlying set of X
  • On morphisms f : P × X → Y :
  • Parameters P
  • Implementation I = f
  • Update U(a, x, y) = a − ε∇aE(a, x, y)
  • Request r(a, x, y) = (too awkward to write down)

where E(a, x, y) = dim(Y )

i=1

C(f (p, x)i, yi) is total error Update is gradient descent, and request is backpropagation

2such that every ∂ ∂y C(x, y) is invertible

slide-27
SLIDE 27

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

ML doesn’t work like that

Actual backpropagation backpropagates gradients

slide-28
SLIDE 28

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

ML doesn’t work like that

Actual backpropagation backpropagates gradients Request backpropagates a finite step in the gradient direction

slide-29
SLIDE 29

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

ML doesn’t work like that

Actual backpropagation backpropagates gradients Request backpropagates a finite step in the gradient direction This is a hack because objects of Learn doesn’t have differentiable structure

slide-30
SLIDE 30

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

ML doesn’t work like that

Actual backpropagation backpropagates gradients Request backpropagates a finite step in the gradient direction This is a hack because objects of Learn doesn’t have differentiable structure (The benefit is Learn is more general than just ML)

slide-31
SLIDE 31

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles

Work in a category with finite limits A bundle over X is a morphism E X

p

slide-32
SLIDE 32

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles

Work in a category with finite limits A bundle over X is a morphism E X

p

Examples:

1 Trivial bundle

X × Y X

π1

slide-33
SLIDE 33

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles

Work in a category with finite limits A bundle over X is a morphism E X

p

Examples:

1 Trivial bundle

X × Y X

π1 2 Tangent bundle over a differentiable manifold

TM M

π

slide-34
SLIDE 34

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles

Work in a category with finite limits A bundle over X is a morphism E X

p

Examples:

1 Trivial bundle

X × Y X

π1 2 Tangent bundle over a differentiable manifold

TM M

π 3 Cotangent bundle

T ∗M M

π∗

slide-35
SLIDE 35

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

slide-36
SLIDE 36

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

slide-37
SLIDE 37

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

  • so, the tangent bundle is trivial:

T(X) X

π2

slide-38
SLIDE 38

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

  • so, the tangent bundle is trivial:

T(X) X

π2

Moreover:

  • every T ∗

x (X) ∼

= X unnaturally (since X ∗ ∼ = X)

slide-39
SLIDE 39

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

  • so, the tangent bundle is trivial:

T(X) X

π2

Moreover:

  • every T ∗

x (X) ∼

= X unnaturally (since X ∗ ∼ = X)

  • so, T ∗(X) ∼

= X × X unnaturally

slide-40
SLIDE 40

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

  • so, the tangent bundle is trivial:

T(X) X

π2

Moreover:

  • every T ∗

x (X) ∼

= X unnaturally (since X ∗ ∼ = X)

  • so, T ∗(X) ∼

= X × X unnaturally

  • elements of X × X are called dual numbers
slide-41
SLIDE 41

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

  • so, the tangent bundle is trivial:

T(X) X

π2

Moreover:

  • every T ∗

x (X) ∼

= X unnaturally (since X ∗ ∼ = X)

  • so, T ∗(X) ∼

= X × X unnaturally

  • elements of X × X are called dual numbers
  • the cotangent bundle is unnaturally equivalent to a trivial

bundle

slide-42
SLIDE 42

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Bundles over Euclidean spaces

If X = Rn is a Euclidean space then

  • every Tx(X) ∼

= X

  • so, T(X) ∼

= X × X

  • so, the tangent bundle is trivial:

T(X) X

π2

Moreover:

  • every T ∗

x (X) ∼

= X unnaturally (since X ∗ ∼ = X)

  • so, T ∗(X) ∼

= X × X unnaturally

  • elements of X × X are called dual numbers
  • the cotangent bundle is unnaturally equivalent to a trivial

bundle

  • Nb. Euc doesn’t have finite limits, so we work in Top
slide-43
SLIDE 43

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of bundles

A bundle morphism f : E X

p →

F Y

q is:

slide-44
SLIDE 44

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of bundles

A bundle morphism f : E X

p →

F Y

q is:

  • Morphisms f : X → Y and f # : X ×Y F → E
slide-45
SLIDE 45

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of bundles

A bundle morphism f : E X

p →

F Y

q is:

  • Morphisms f : X → Y and f # : X ×Y F → E
  • such that

X ×Y F F E X Y

f #

  • q

p f

is a pullback

slide-46
SLIDE 46

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of bundles

A bundle morphism f : E X

p →

F Y

q is:

  • Morphisms f : X → Y and f # : X ×Y F → E
  • such that

X ×Y F F E X Y

f #

  • q

p f

is a pullback

  • Equivalently: f such that X ×Y F → X factors through p
slide-47
SLIDE 47

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of bundles

A bundle morphism f : E X

p →

F Y

q is:

  • Morphisms f : X → Y and f # : X ×Y F → E
  • such that

X ×Y F F E X Y

f #

  • q

p f

is a pullback

  • Equivalently: f such that X ×Y F → X factors through p

“Every algebraic geometer knows this definition” – David Spivak

slide-48
SLIDE 48

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The category of bundles

Identity morphism: X ×X E ∼ = E E E X X

  • p

p

slide-49
SLIDE 49

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The category of bundles

Identity morphism: X ×X E ∼ = E E E X X

  • p

p

Composition of morphisms: X ×Z G Y ×Z G G X ×Y F F E X Y Z

f ∗(g#)

  • g#
  • r

f #

  • q

p f g

slide-50
SLIDE 50

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Where does this come from?

From the Grothendieck construction: Bund(C) =

  • X∈C

(C/X)op

slide-51
SLIDE 51

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Where does this come from?

From the Grothendieck construction: Bund(C) =

  • X∈C

(C/X)op This buys us (conjecture) a monoidal structure: E X

p ⊗

F Y

q =

E × F X × Y

p×q

(this might not be the right one!)

slide-52
SLIDE 52

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A (bimorphic) lens λ : (S, T) → (A, B) consists of:

  • a morphism λv : S → A called view
  • a morphism λu : S × B → T called update
slide-53
SLIDE 53

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A (bimorphic) lens λ : (S, T) → (A, B) consists of:

  • a morphism λv : S → A called view
  • a morphism λu : S × B → T called update

Composition of lenses is fiddly

slide-54
SLIDE 54

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A (bimorphic) lens λ : (S, T) → (A, B) consists of:

  • a morphism λv : S → A called view
  • a morphism λu : S × B → T called update

Composition of lenses is fiddly Where does this come from? The Grothendieck construction: Lens(C) =

  • X∈C

coKl(X × −)op

slide-55
SLIDE 55

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses

A (bimorphic) lens λ : (S, T) → (A, B) consists of:

  • a morphism λv : S → A called view
  • a morphism λu : S × B → T called update

Composition of lenses is fiddly Where does this come from? The Grothendieck construction: Lens(C) =

  • X∈C

coKl(X × −)op Theorem (Lambek): coKl(X × −) ∼ = C[x], where C[x] is the polynomial category formed by freely adjoining x : 1 → X and closing under finite products

slide-56
SLIDE 56

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses are bundle morphisms

Another theorem (Lambek): coEM(X × −) ∼ = C/X

slide-57
SLIDE 57

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses are bundle morphisms

Another theorem (Lambek): coEM(X × −) ∼ = C/X So there is a canonoical embedding C[x] ֒ → C/X

slide-58
SLIDE 58

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses are bundle morphisms

Another theorem (Lambek): coEM(X × −) ∼ = C/X So there is a canonoical embedding C[x] ֒ → C/X Grothendieck them all together: Lens(C) → Bund(C) It takes a lens λ : (S, T) → (A, B) to the bundle morphism S × B A × B S × T S A

π1,λu

  • π2

π2 λv

slide-59
SLIDE 59

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Lenses are bundle morphisms

Another theorem (Lambek): coEM(X × −) ∼ = C/X So there is a canonoical embedding C[x] ֒ → C/X Grothendieck them all together: Lens(C) → Bund(C) It takes a lens λ : (S, T) → (A, B) to the bundle morphism S × B A × B S × T S A

π1,λu

  • π2

π2 λv

slide-60
SLIDE 60

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of contangent bundles

There is a functor Cot(−) : DiffMfd → Bund(Top)

slide-61
SLIDE 61

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of contangent bundles

There is a functor Cot(−) : DiffMfd → Bund(Top) It takes f : X → Y to X ×Y T ∗(Y ) T ∗(Y ) T ∗(X) X Y

f ′

  • π∗

π∗ f

where f ′ : (x, c) → (x, c ◦ Jx(f ))

slide-62
SLIDE 62

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Morphisms of contangent bundles

There is a functor Cot(−) : DiffMfd → Bund(Top) It takes f : X → Y to X ×Y T ∗(Y ) T ∗(Y ) T ∗(X) X Y

f ′

  • π∗

π∗ f

where f ′ : (x, c) → (x, c ◦ Jx(f )) Jx(f ) is the Jacobian (matrix of partial derivatives) of f at x

slide-63
SLIDE 63

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The chain rule

Functorality of Cot(−): X ×Z T ∗(Z) Y ×Z T ∗(Z) T ∗(Z) X ×Y T ∗(Y ) T ∗(Y ) T ∗(X) X Y Z

f ∗(g′)

  • g′
  • π∗

f ′

  • π∗

π∗ f g

slide-64
SLIDE 64

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The chain rule

Functorality of Cot(−): X ×Z T ∗(Z) Y ×Z T ∗(Z) T ∗(Z) X ×Y T ∗(Y ) T ∗(Y ) T ∗(X) X Y Z

f ∗(g′)

  • g′
  • π∗

f ′

  • π∗

π∗ f g

(g ◦ f )′ = f ′ ◦ f ∗(g′) is the chain rule in differential geometry

slide-65
SLIDE 65

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

From Para(Bund(Top)) to Learn

Consider a morphism of Para(Bund(Top)) in the image of Para(Cot) : Para(Euc) → Para(Bund(Top)) It looks like (X × A) ×Y T ∗(Y ) Y T ∗(X × A) X × A Y

f ′

  • π∗

π∗ f

slide-66
SLIDE 66

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

From Para(Bund(Top)) to Learn

Consider a morphism of Para(Bund(Top)) in the image of Para(Cot) : Para(Euc) → Para(Bund(Top)) It looks like (X × A) ×Y T ∗(Y ) Y T ∗(X × A) X × A Y

f ′

  • π∗

π∗ f

We’re going to turn it into an open learner, given ε > 0 and differentiable C : R2 → R

slide-67
SLIDE 67

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The setup

Obviously, parameters are A and implementation is f

slide-68
SLIDE 68

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The setup

Obviously, parameters are A and implementation is f We need to define U, r : A × X × Y → A × X so, fix a ∈ A, x ∈ X and y ∈ Y and fix the total error Cy(y′) = dim(Y )

i=1

C(yi, y′

i )

slide-69
SLIDE 69

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The setup

Obviously, parameters are A and implementation is f We need to define U, r : A × X × Y → A × X so, fix a ∈ A, x ∈ X and y ∈ Y and fix the total error Cy(y′) = dim(Y )

i=1

C(yi, y′

i )

Consider the diagram. . .

slide-70
SLIDE 70

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The brain exploding part

R ∼ = T ∗

Cy(f (x,a))(R)

(X × A) ×R T ∗(R) Y ×R T ∗(R) T ∗(R) T ∗

f (x,a)(Y )

(X × A) ×Y T ∗(Y ) T ∗(Y ) T ∗

(x,a)(X × A)

T ∗(X × A) 1 ∼ = T ∗(1) 1 X × A Y R

  • f ∗(C ′

y)

  • C ′

y

  • π∗
  • f ′
  • π∗

!

  • π∗

(x,a) f Cy

slide-71
SLIDE 71

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The part we don’t understand

Now: Chase 1 ∈ R to T ∗(X × A) and then apply µε : T ∗(X × A) → X × A The result is r, U (a, x, y)

slide-72
SLIDE 72

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The part we don’t understand

Now: Chase 1 ∈ R to T ∗(X × A) and then apply µε : T ∗(X × A) → X × A The result is r, U (a, x, y) µε takes a finite step in the gradient direction: µε((x, a), (v, w)) = (x + v, a + εw)

slide-73
SLIDE 73

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The part we don’t understand

Now: Chase 1 ∈ R to T ∗(X × A) and then apply µε : T ∗(X × A) → X × A The result is r, U (a, x, y) µε takes a finite step in the gradient direction: µε((x, a), (v, w)) = (x + v, a + εw) What is µε? We couldn’t find any nice properties

slide-74
SLIDE 74

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The part we don’t understand

Now: Chase 1 ∈ R to T ∗(X × A) and then apply µε : T ∗(X × A) → X × A The result is r, U (a, x, y) µε takes a finite step in the gradient direction: µε((x, a), (v, w)) = (x + v, a + εw) What is µε? We couldn’t find any nice properties It looks a bit like a thing called an exponential map

slide-75
SLIDE 75

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The catch

Conjecture: This defines a symmetric monoidal functor Para(Bund(Top)) ⊇ Im(Para(Cot)) → Learn

slide-76
SLIDE 76

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The catch

Conjecture: This defines a symmetric monoidal functor Para(Bund(Top)) ⊇ Im(Para(Cot)) → Learn Another conjecture: This commutes: Para(Euc) Im(Para(Cot)) Learn

Para(Cot)

slide-77
SLIDE 77

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The catch

Conjecture: This defines a symmetric monoidal functor Para(Bund(Top)) ⊇ Im(Para(Cot)) → Learn Another conjecture: This commutes: Para(Euc) Im(Para(Cot)) Learn

Para(Cot)

The catch: We think Para(Cot) is an equivalence of categories

  • nto its image
slide-78
SLIDE 78

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

The catch

Conjecture: This defines a symmetric monoidal functor Para(Bund(Top)) ⊇ Im(Para(Cot)) → Learn Another conjecture: This commutes: Para(Euc) Im(Para(Cot)) Learn

Para(Cot)

The catch: We think Para(Cot) is an equivalence of categories

  • nto its image

So, we’ve just rewritten Backprop as Functor in a different way!

slide-79
SLIDE 79

Bundles, Lenses & Machine Learning Motivation Backprop as Functor Bundles Putting it together

Even more hard questions

What happens if we extend the functor to the whole of Para(Bund(Top))? We have no idea! Optimistic hope: This allows defining general “ML-like” systems, not necessarily involving gradients (eg. “discrete ML”

  • n Bayesian networks)