differentiable functional programming
play

Differentiable Functional Programming Atlm Gne Baydin University - PowerPoint PPT Presentation

Differentiable Functional Programming Atlm Gne Baydin University of Oxford http://www.robots.ox.ac.uk/~gunes/ F#unctional Londoners Meetup, April 28, 2016 About me Current (from 11 April 2016): Postdoctoral researcher, Machine


  1. Differentiable Functional Programming Atılım Güneş Baydin University of Oxford http://www.robots.ox.ac.uk/~gunes/ F#unctional Londoners Meetup, April 28, 2016

  2. About me Current (from 11 April 2016): Postdoctoral researcher, Machine Learning Research Group, University of Oxford http://www.robots.ox.ac.uk/~parg/ Previously: Brain and Computation Lab, National University of Ireland Maynooth : http://www.bcl.hamilton.ie/ Working primarily with F# , on algorithmic differentiation , functional programming , machine learning 1/36

  3. Today’s talk Derivatives in computer programs Differentiable functional programming DiffSharp + Hype libraries Two demos 2/36

  4. Derivatives in computer programs How do we compute them?

  5. Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

  6. Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

  7. Manual differentiation f ( x ) = sin ( exp x ) let f x = sin (exp x) Calculus 101: differentiation rules d ( fg ) = df dxg + f dg dx dx d ( af + bg ) = adf dx + bdg dx dx . . . f ′ ( x ) = cos ( exp x ) × exp x let f’ x = (cos (exp x)) * (exp x) 3/36

  8. Manual differentiation It can get complicated f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 (4th iteration of the logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x ) let f x = 64*x * (1-x) * ((1 - 2*x) ** 2) * ((1 - 8*x + 8*x*x) ** 2) f ′ ( x ) = 128 x ( 1 − x )( − 8 + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 )+ 64 ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 8 x 2 ) 2 let f’ x = 128*x * (1-x) * (-8+16*x) * (1-2*x)**2 * (1-8*x+8*x* x) + 64 * (1-x) * (1-2*x)**2 * (1-8*x+8*x*x)**2 - 64*x(1-2* x)**2 * (1-8*x+8*x*x)**2 - 256*x*(1-x) * (1-2*x) * (1-8*x +8*x*x)**2 4/36

  9. Manual differentiation It can get complicated f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 (4th iteration of the logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x ) let f x = 64*x * (1-x) * ((1 - 2*x) ** 2) * ((1 - 8*x + 8*x*x) ** 2) f ′ ( x ) = 128 x ( 1 − x )( − 8 + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 )+ 64 ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 8 x 2 ) 2 let f’ x = 128*x * (1-x) * (-8+16*x) * (1-2*x)**2 * (1-8*x+8*x* x) + 64 * (1-x) * (1-2*x)**2 * (1-8*x+8*x*x)**2 - 64*x(1-2* x)**2 * (1-8*x+8*x*x)**2 - 256*x*(1-x) * (1-2*x) * (1-8*x +8*x*x)**2 4/36

  10. Symbolic differentiation Computer algebra packages help: Mathematica, Maple, Maxima But, it has some serious drawbacks 5/36

  11. Symbolic differentiation Computer algebra packages help: Mathematica, Maple, Maxima But, it has some serious drawbacks 5/36

  12. Symbolic differentiation We get “ expression swell ” Logistic map l n + 1 = 4 l n ( 1 − l n ) , l 1 = x Number of terms d 600 n l n dx l n d dx l n 1 x 1 500 4 x ( 1 − x ) 4 ( 1 − x ) − 4 x 2 400 16 ( 1 − x )( 1 − 2 x ) 2 − 16 x ( 1 − x )( 1 − 3 2 x ) 2 16 x ( 1 2 x ) 2 − − 300 64 x ( 1 − x )( 1 − 2 x ) 64 x ( 1 − x )( 1 − 128 x ( 1 − x )( − 8 + 200 4 2 x ) 2 ( 1 − 8 x + 16 x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 8 x 2 ) + 64 ( 1 − x )( 1 − 100 l n 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 − 64 x ( 1 − 2 x ) 2 ( 1 − 8 x + 0 8 x 2 ) 2 − 256 x ( 1 − x )( 1 − 2 x )( 1 − 8 x + 1 2 3 4 5 8 x 2 ) 2 n 6/36

  13. Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

  14. Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

  15. Symbolic differentiation We are limited to closed-form formulae You can find the derivative of math expressions: f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 But not of algorithms, branching, control flow: let f x n = if n = 1 then x else let mutable v = x for i = 1 to n v <- 4 * v * (1 - v) v let a = f x 4 7/36

  16. Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

  17. Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

  18. Numerical differentiation A very common hack: Use the limit definition of the derivative f ( x + h ) − f ( x ) df dx = lim h → 0 h to approximate the numerical value of the derivative let diff f x = let h = 0.00001 (f (x + h) - f (x)) / h Again, some serious drawbacks 8/36

  19. Numerical differentiation We must select a proper value of h and we face approximation errors Error 10 2 10 0 10 -2 Computed using 10 -4 f ( x ∗ + h ) − f ( x ∗ ) � � Round-off error Truncation error � d � 10 -6 E ( h , x ∗ ) = f ( x ) � � � − dominant dominant � x ∗ � � h dx � � 10 -8 f ( x ) = 64 x ( 1 − x )( 1 − 2 x ) 2 ( 1 − 8 x + 8 x 2 ) 2 10 -10 x ∗ = 0 . 2 10 -17 10 -15 10 -13 10 -11 10 -9 10 -7 10 -5 10 -3 10 -1 h 9/36

  20. Numerical differentiation Better approximations exist Higher-order finite differences E.g. ∂ f ( x ) = f ( x + h e i ) − f ( x − h e i ) + O ( h 2 ) , ∂ x i 2 h Richardson extrapolation Differential quadrature but they increase rapidly in complexity and never completely eliminate the error 10/36

  21. Numerical differentiation Poor performance: � � f : R n → R , approximate the gradient ∇ f = ∂ x 1 , . . . , ∂ f ∂ f using ∂ x n ∂ f ( x ) ≈ f ( x + h e i ) − f ( x ) , 0 < h ≪ 1 ∂ x i h We must repeat the function evaluation n times for getting ∇ f 11/36

  22. Algorithmic differentiation (AD)

  23. Algorithmic differentiation Also known as automatic differentiation (Griewank & Walther, 2008) Gives numeric code that computes the function AND its derivatives at a given point ❢✭❛✱ ❜✮✿ ❢✬✭❛✱ ❛✬✱ ❜✱ ❜✬✮✿ ✭❝✱ ❝✬✮ ❂ ✭❛✯❜✱ ❛✬✯❜ ✰ ❛✯❜✬✮ ❝ ❂ ❛ ✯ ❜ ❞ ❂ s✐♥ ❝ ✭❞✱ ❞✬✮ ❂ ✭s✐♥ ❝✱ ❝✬ ✯ ❝♦s ❝✮ r❡t✉r♥ ❞ r❡t✉r♥ ✭❞✱ ❞✬✮ Derivatives propagated at the elementary operation level, as a side effect, at the same time when the function itself is computed → Prevents the “expression swell” of symbolic derivatives Full expressive capability of the host language → Including conditionals, looping, branching 12/36

  24. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) f(a, b): c = a * b if c > 0 d = log c else d = sin c return d 13/36

  25. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) f(a, b): c = a * b if c > 0 d = log c else d = sin c return d f(2, 3) 13/36

  26. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) a = 2 f(a, b): c = a * b b = 3 if c > 0 d = log c c = a * b = 6 else d = sin c d = log c = 1.791 return d return d f(2, 3) ( primal ) 13/36

  27. Function evaluation traces All numeric evaluations are sequences of elementary operations: a “trace,” also called a “Wengert list” (Wengert, 1964) a = 2 a = 2 f(a, b): a’ = 1 c = a * b b = 3 b = 3 if c > 0 b’ = 0 d = log c c = a * b = 6 c = a * b = 6 else c’ = a’ * b + a * b’ = 3 d = sin c d = log c = 1.791 d = log c = 1.791 return d d’ = c’ * (1 / c) = 0.5 return d return d, d’ f(2, 3) ( primal ) ( tangent ) 13/36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend