A Functional Reboot for Deep Learning Conal Elliott Target August - - PowerPoint PPT Presentation

a functional reboot for deep learning
SMART_READER_LITE
LIVE PREVIEW

A Functional Reboot for Deep Learning Conal Elliott Target August - - PowerPoint PPT Presentation

A Functional Reboot for Deep Learning Conal Elliott Target August 2019 Conal Elliott A Functional Reboot for Deep Learning August 2019 1 / 23 Goal Extract the essence of DL. Shed accidental complexity and artificial limitations, i.e.,


slide-1
SLIDE 1

A Functional Reboot for Deep Learning

Conal Elliott

Target

August 2019

Conal Elliott A Functional Reboot for Deep Learning August 2019 1 / 23

slide-2
SLIDE 2

Goal

Extract the essence of DL. Shed accidental complexity and artificial limitations, i.e., simplify and generalize.

Conal Elliott A Functional Reboot for Deep Learning August 2019 2 / 23

slide-3
SLIDE 3

Essence

Optimization: best element of a set (by objective function). Usually via differentiation and gradient following. For machine learning, sets of functions. Objective function is defined via set of input/output pairs.

Conal Elliott A Functional Reboot for Deep Learning August 2019 3 / 23

slide-4
SLIDE 4

Accidental complexity in deep learning

Conal Elliott A Functional Reboot for Deep Learning August 2019 4 / 23

slide-5
SLIDE 5

Accidental complexity in DL (overview)

Imperative programming Weak typing Graphs (neural networks) Layers Tensors/arrays Back propagation Linearity bias Hyper-parameters Manual differentiation

Conal Elliott A Functional Reboot for Deep Learning August 2019 5 / 23

slide-6
SLIDE 6

Imperative programming

Thwarts correctness/dependability (usually “not even wrong”). Thwarts efficiency (parallelism). Unnecessary for expressiveness. Poor fit. DL is math, so express in a math language.

Conal Elliott A Functional Reboot for Deep Learning August 2019 6 / 23

slide-7
SLIDE 7

Weak typing

Requires people to manage detail & consistency. Run-time errors.

Conal Elliott A Functional Reboot for Deep Learning August 2019 7 / 23

slide-8
SLIDE 8

Graphs (neural networks)

Clutters API, distracting from purpose. Purpose: a representation of functions. We already have a better one: programming language. Can we differentiate?

Conal Elliott A Functional Reboot for Deep Learning August 2019 8 / 23

slide-9
SLIDE 9

Graphs (neural networks)

Clutters API, distracting from purpose. Purpose: a representation of functions. We already have a better one: programming language. Can we differentiate?

An issue of implementation, not language or library definition. Fix accordingly.

Conal Elliott A Functional Reboot for Deep Learning August 2019 8 / 23

slide-10
SLIDE 10

Layers

Strong bias toward sequential composition. Neglects equally important forms: parallel & conditional. Awkward patches: “skip connections”, ResNet, HighwayNet. Don’t patch the problem; eliminate it. Replace with binary sequential, parallel, conditional composition.

Conal Elliott A Functional Reboot for Deep Learning August 2019 9 / 23

slide-11
SLIDE 11

“Tensors”

Really, multi-dimensional arrays. Awkward: imagine you could program only with arrays (Fortran). Unsafe without dependent types. Multiple intents / weakly typed Even as linear maps: meaning of m ˆ n array? Limited: missing almost all differentiable types. Missing more natural & compositional data types, e.g., trees.

Conal Elliott A Functional Reboot for Deep Learning August 2019 10 / 23

slide-12
SLIDE 12

Back propagation

Specialization and rediscovery of reverse-mode auto-diff. Described in terms of graphs. Highly complex due to graph formulation. Stateful:

Hinders parallelism/efficiency. High memory use, limiting problem size.

Conal Elliott A Functional Reboot for Deep Learning August 2019 11 / 23

slide-13
SLIDE 13

Linearity bias

“Dense” & “fully connected” mean arbitrary linear transformation. Sprinkle in “activation functions” as exceptions to linearity. Misses simpler and more efficient architectures.

Conal Elliott A Functional Reboot for Deep Learning August 2019 12 / 23

slide-14
SLIDE 14

Hyper-parameters

Same essential purpose as parameters. Different mechanisms for expression and search. Inefficient and ad hoc

Conal Elliott A Functional Reboot for Deep Learning August 2019 13 / 23

slide-15
SLIDE 15

A functional reboot

Conal Elliott A Functional Reboot for Deep Learning August 2019 14 / 23

slide-16
SLIDE 16

Values

Precision: meaning, reasoning, correctness. Simplicity: practical rigor/dependability. Generality: room to grow; design guidance.

Conal Elliott A Functional Reboot for Deep Learning August 2019 15 / 23

slide-17
SLIDE 17

Essence

Optimization: best element of a set (by objective function). Usually via differentiation and gradient following. For machine learning, sets of functions. Objective function is defined via set of input/output pairs.

Conal Elliott A Functional Reboot for Deep Learning August 2019 16 / 23

slide-18
SLIDE 18

Optimization

Describe a set of values as range of function: f :: p Ñ c. Objective function: q :: c Ñ R. Find argMin pq ˝ f q :: p. When q ˝ f is differentiable, gradient descent can help. Otherwise, other methods. Consider also global optimization, e.g., with interval methods.

Conal Elliott A Functional Reboot for Deep Learning August 2019 17 / 23

slide-19
SLIDE 19

Learning functions

Special case of optimization, where c “ a Ñ b, i.e., f :: p Ñ pa Ñ bq, and q :: pa Ñ bq Ñ R. Objective function often based on sample set S Ď a ˆ b. Measure mis-predictions (loss). Additivity enables parallel, log-time learning step.

Conal Elliott A Functional Reboot for Deep Learning August 2019 18 / 23

slide-20
SLIDE 20

Differentiable functional programming

Directly on Haskell (etc) programs:

Not a library/DSEL No graphs/networks/layers

Conal Elliott A Functional Reboot for Deep Learning August 2019 19 / 23

slide-21
SLIDE 21

Differentiable functional programming

Directly on Haskell (etc) programs:

Not a library/DSEL No graphs/networks/layers

Differentiated at compile time Simple, principled, and general (The simple essence of automatic differentiation) Generating efficient run-time code Amenable to massively parallel execution (GPU, etc)

Conal Elliott A Functional Reboot for Deep Learning August 2019 19 / 23

slide-22
SLIDE 22

Beyond “tensors”

Most differentiable types are not vectors (uniform n-tuples), and most derivatives (linear maps) are not matrices. A more general alternative:

Free vector space over s: i Ñ s » f s (“i indexes f ”) Special case: Finn Ñ s » Vecn s Algebra of representable functors: f ˆ ˆ ˆ g, 1, g ˝ f , Id Your (representable) functor via deriving Generic

Linear map pf s ⊸ g sq » g pf sq » pg ˝ f q s (generalized matrix). Other representations for efficient reverse-mode AD (w/o tears).

Conal Elliott A Functional Reboot for Deep Learning August 2019 20 / 23

slide-23
SLIDE 23

Beyond “tensors”

Most differentiable types are not vectors (uniform n-tuples), and most derivatives (linear maps) are not matrices. A more general alternative:

Free vector space over s: i Ñ s » f s (“i indexes f ”) Special case: Finn Ñ s » Vecn s Algebra of representable functors: f ˆ ˆ ˆ g, 1, g ˝ f , Id Your (representable) functor via deriving Generic

Linear map pf s ⊸ g sq » g pf sq » pg ˝ f q s (generalized matrix). Other representations for efficient reverse-mode AD (w/o tears). Use with Functor, Foldable, Traversable, Scannable, etc. No need for special/limited array “reshaping” operations. Compositional and naturally parallel-friendly (Generic parallel functional programming)

Conal Elliott A Functional Reboot for Deep Learning August 2019 20 / 23

slide-24
SLIDE 24

Modularity

How to build function families from pieces, as in DL? Category of indexed sets of functions. Extract monolithic function after composing. Other uses, including satisfiability. Prototyped, but problem with GHC type-checker.

Conal Elliott A Functional Reboot for Deep Learning August 2019 21 / 23

slide-25
SLIDE 25

Progress

Simple & efficient reverse-mode AD. Some simple regressions, simple DL, and CNN. Some implementation challenges with robustness. Looking for collaborators, including

GHC internals (compiling-to-categories plugin) Background in machine learning and statistics

Conal Elliott A Functional Reboot for Deep Learning August 2019 22 / 23

slide-26
SLIDE 26

Summary

Generalize & simplify DL (more for less). Essence of DL: pure FP with minarg. Generalize from “tensors” (for composition & safety). Collaboration welcome!

Conal Elliott A Functional Reboot for Deep Learning August 2019 23 / 23