Differentiable Functional Programming Atlm Gne Baydin University - PowerPoint PPT Presentation

Differentiable Functional Programming Atılım Güneş Baydin University of Oxford http://www.robots.ox.ac.uk/~gunes/ F#unctional Londoners Meetup, April 28, 2016

About me Current (from 11 April 2016): Postdoctoral researcher, Machine Learning Research Group, University of Oxford http://www.robots.ox.ac.uk/~parg/ Previously: Brain and Computation Lab, National University of Ireland Maynooth : http://www.bcl.hamilton.ie/ Working primarily with F# , on algorithmic differentiation , functional programming , machine learning 1/36

Today’s talk Derivatives in computer programs Differentiable functional programming DiffSharp + Hype libraries Two demos 2/36

Derivatives in computer programs How do we compute them?

Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

Manual differentiation It can get complicated f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 (4th iteration of the logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x ) let f x = 64*x * (1-x) * ((1 - 2*x) ** 2) * ((1 - 8*x + 8*x*x) ** 2) f ′ ( x ) = 128 x ( 1 − x )( − 8 + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 )+ 64 ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 8 x 2 ) 2 let f’ x = 128*x * (1-x) * (-8+16*x) * (1-2*x)**2 * (1-8*x+8*x* x) + 64 * (1-x) * (1-2*x)**2 * (1-8*x+8*x*x)**2 - 64*x(1-2* x)**2 * (1-8*x+8*x*x)**2 - 256*x*(1-x) * (1-2*x) * (1-8*x +8*x*x)**2 4/36

Symbolic differentiation Computer algebra packages help: Mathematica, Maple, Maxima But, it has some serious drawbacks 5/36

Symbolic differentiation We get “ expression swell ” Logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x Number of terms d 600 n l n dx l n d dx l n 1 x 1 500 4 x ( 1 − x ) 4 ( 1 − x ) − 4 x 2 400 16 ( 1 − x )( 1 − 2 x ) 2 − 16 x ( 1 − x )( 1 − 3 2 x ) 2 16 x ( 1 2 x ) 2 − − 300 64 x ( 1 − x )( 1 − 2 x ) 64 x ( 1 − x )( 1 − 128 x ( 1 − x )( − 8 + 200 4 2 x ) 2 ( 1 − 8 x + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 8 x 2 ) + 64 ( 1 − x )( 1 − 100 l n 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 0 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 1 2 3 4 5 8 x 2 ) 2 n 6/36

Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

Numerical differentiation We must select a proper value of h and we face approximation errors Error 10 2 10 0 10 -2 Computed using 10 -4 f ( x ∗ + h ) − f ( x ∗ ) � � Round-off error Truncation error � d � 10 -6 E ( h , x ∗ ) = f ( x ) � � � − dominant dominant � x ∗ � � h dx � � 10 -8 f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 10 -10 x ∗ = 0 . 2 10 -17 10 -15 10 -13 10 -11 10 -9 10 -7 10 -5 10 -3 10 -1 h 9/36

Numerical differentiation Better approximations exist Higher-order finite differences E.g. ∂ f ( x ) = f ( x + h e i ) − f ( x − h e i ) + O ( h 2 ) , ∂ x i 2 h Richardson extrapolation Differential quadrature but they increase rapidly in complexity and never completely eliminate the error 10/36

Numerical differentiation Poor performance: � � f : R n → R , approximate the gradient ∇ f = ∂ x 1 , . . . , ∂ f ∂ f using ∂ x n ∂ f ( x ) ≈ f ( x + h e i ) − f ( x ) , 0 < h ≪ 1 ∂ x i h We must repeat the function evaluation n times for getting ∇ f 11/36

Algorithmic differentiation (AD)

Algorithmic differentiation Also known as automatic differentiation (Griewank & Walther, 2008) Gives numeric code that computes the function AND its derivatives at a given point ❢✭❛✱ ❜✮✿ ❢✬✭❛✱ ❛✬✱ ❜✱ ❜✬✮✿ ✭❝✱ ❝✬✮ ❂ ✭❛✯❜✱ ❛✬✯❜ ✰ ❛✯❜✬✮ ❝ ❂ ❛ ✯ ❜ ❞ ❂ s✐♥ ❝ ✭❞✱ ❞✬✮ ❂ ✭s✐♥ ❝✱ ❝✬ ✯ ❝♦s ❝✮ r❡t✉r♥ ❞ r❡t✉r♥ ✭❞✱ ❞✬✮ Derivatives propagated at the elementary operation level, as a side effect, at the same time when the function itself is computed → Prevents the “expression swell” of symbolic derivatives Full expressive capability of the host language → Including conditionals, looping, branching 12/36

Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) f(a, b): c = a * b if c > 0 d = log c else d = sin c return d 13/36

Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) f(a, b): c = a * b if c > 0 d = log c else d = sin c return d f(2, 3) 13/36

Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) a = 2 f(a, b): c = a * b b = 3 if c > 0 d = log c c = a * b = 6 else d = sin c d = log c = 1.791 return d return d f(2, 3) ( primal ) 13/36

Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) a = 2 a = 2 f(a, b): a’ = 1 c = a * b b = 3 b = 3 if c > 0 b’ = 0 d = log c c = a * b = 6 c = a * b = 6 else c’ = a’ * b + a * b’ = 3 d = sin c d = log c = 1.791 d = log c = 1.791 return d d’ = c’ * (1 / c) = 0.5 return d return d, d’ f(2, 3) ( primal ) ( tangent ) 13/36

Differentiable Functional Programming Atlm Gne Baydin University - PowerPoint PPT Presentation

Differentiable Functional Programming Atlm Gne Baydin University of Oxford http://www.robots.ox.ac.uk/~gunes/ F#unctional Londoners Meetup, April 28, 2016 About me Current (from 11 April 2016): Postdoctoral researcher, Machine

+ f(x) = Python Functional Programming Python Functional Programming Functional Programming by

Functional Programming in 40 minutes @russolsen Functional Programming in 40 minutes

An Enriched Perspective on Differentiable Stacks Benjamin MacAdam Joint work with Jonathan

Introduction to Functional Programming Introduction to Functional Programming Practice Strategy

Functional Programming Pete Graham @petexgraham My Functional Programming Timeline Learned

Pure Functional Programming Functional Programming and Reasoning Dr Hans Georg Schaathun

Functional Programming June 2, 2019 Functional Programming June 2, 2019 1 / 24 Mayer Goldberg \

FFR Guided Functional FFR Guided Functional FFR Guided Functional FFR Guided Functional

Q: A Functional Programming Language for Q: A Functional Programming Language Multimedia

Functional Programming February 25, 2019 Functional Programming February 25, 2019 1 / 63 Mayer

Functional Programming March 16, 2019 Functional Programming March 16, 2019 1 / 12 Mayer

Functional Programming Languages Liam OConnor CSE, UNSW (and data61) Term3 2019 1 The

Functional Programming Languages Liam OConnor CSE, UNSW (and data61) Term3 2019 1 The

Introduction to Functional Programming in Python David Jones drj@ravenbrook.com Python and

Learning with Differentiable Perturbed Optimizers Quentin Berthet Youth in High-dimensions -

Learning with Differentiable Perturbed Optimizers Quentin Berthet Optimization for ML - CIRM -

CS 4803 / 7643: Deep Learning Topics: (Finish) Computing Gradients Backprop in Conv

Computer Applications Lab Computer Applications Lab Lab 9 Lab 9 Numerical Calculus and Symbolic

Numerical Differentiation CIS 541 - Differentiation The mathematical definition: Roger

Numerical methods for Mean Field Games: additional material Y. Achdou (LJLL, Universit e

General Information: 1/7 General Information: 1/7 Course: Course: CS3911 Introduction to

Keren Li Supervisor: Tanaji Sen July 15, 2001 Long range interaction without crab cavity causes

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

LGSL: Numerical algorithms for Lua A Lua-ish interface to the GNU Scientific Library Lesley De