Implementing autograd Slides by Matthew Johnson Autograds - - PowerPoint PPT Presentation

implementing autograd
SMART_READER_LITE
LIVE PREVIEW

Implementing autograd Slides by Matthew Johnson Autograds - - PowerPoint PPT Presentation

Implementing autograd Slides by Matthew Johnson Autograds implementation github.com/hips/autograd Dougal Maclaurin, David Duvenaud, Matt Johnson differentiates native Python code handles most of Numpy + Scipy loops, branching,


slide-1
SLIDE 1

Implementing autograd

Slides by Matthew Johnson

slide-2
SLIDE 2

github.com/hips/autograd

  • differentiates native Python code
  • handles most of Numpy + Scipy
  • loops, branching, recursion, closures
  • arrays, tuples, lists, dicts...
  • derivatives of derivatives
  • a one-function API!

Autograd’s implementation Dougal Maclaurin, David Duvenaud, Matt Johnson

slide-3
SLIDE 3

autodiff implementation options

  • A. direct specification of computation graph
  • B. source code inspection
  • C. monitoring function execution
slide-4
SLIDE 4

ingredients:

  • 1. tracing composition of primitive functions
  • 2. vector-Jacobian product for each primitive
  • 3. composing VJPs backward
slide-5
SLIDE 5

ingredients:

  • 1. tracing composition of primitive functions
  • 2. vector-Jacobian product for each primitive
  • 3. composing VJPs backward
slide-6
SLIDE 6

numpy.sum

slide-7
SLIDE 7

numpy.sum autograd.numpy.sum primitive

slide-8
SLIDE 8

numpy.sum autograd.numpy.sum primitive Node ã value: function: parents: a F [x]

slide-9
SLIDE 9

numpy.sum autograd.numpy.sum primitive Node ã value: function: parents: a F [x] unbox a

slide-10
SLIDE 10

numpy.sum autograd.numpy.sum primitive Node ã value: function: parents: a F [x] Node b value: function: parents: b anp.sum [ã] unbox box a b ˜ ˜

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

start_node

x

slide-15
SLIDE 15

start_node

x a = A(x)

slide-16
SLIDE 16

b = B(a)

start_node

x a = A(x)

slide-17
SLIDE 17

b = B(a) c = C(b)

start_node

x a = A(x)

slide-18
SLIDE 18

b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x)

slide-19
SLIDE 19

start_node end_node

No control flow!

slide-20
SLIDE 20

ingredients:

  • 1. tracing composition of primitive functions
  • 2. vector-Jacobian product for each primitive
  • 3. composing VJPs backward
slide-21
SLIDE 21

x a = A(x)

slide-22
SLIDE 22

x a = A(x) ∂y ∂a

slide-23
SLIDE 23

x a = A(x) ∂y ∂a ∂y ∂x = ?

slide-24
SLIDE 24

x a = A(x) ∂y ∂a ∂y ∂x = ∂y ∂a · ∂a ∂x

slide-25
SLIDE 25

x a = A(x) ∂y ∂a ∂y ∂x = ∂y ∂a · A0(x)

vector-Jacobian product

slide-26
SLIDE 26
slide-27
SLIDE 27

ingredients:

  • 1. tracing composition of primitive functions
  • 2. vector-Jacobian product for each primitive
  • 3. composing VJPs backward
slide-28
SLIDE 28

b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x)

slide-29
SLIDE 29

b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x) ∂y ∂y = 1

slide-30
SLIDE 30

b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x) ∂y ∂c ∂y ∂y = 1

slide-31
SLIDE 31

∂y ∂b ∂y ∂c b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x) ∂y ∂y = 1

slide-32
SLIDE 32

∂y ∂a ∂y ∂b ∂y ∂c b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x) ∂y ∂y = 1

slide-33
SLIDE 33

∂y ∂x ∂y ∂a ∂y ∂b ∂y ∂c b = B(a) c = C(b)

end_node

y = D(c)

start_node

x a = A(x) ∂y ∂y = 1

slide-34
SLIDE 34

higher-order autodiff just works: the backward pass can itself be traced

slide-35
SLIDE 35

b = B(a) c = C(b)

end_node

y = D(c) ∂y ∂y = 1

start_node

x a = A(x)

slide-36
SLIDE 36

b = B(a) c = C(b)

end_node

y = D(c) ∂y ∂c ∂y ∂y = 1

start_node

x a = A(x)

slide-37
SLIDE 37

b = B(a) c = C(b)

end_node

y = D(c) ∂y ∂b ∂y ∂c ∂y ∂y = 1

start_node

x a = A(x)

slide-38
SLIDE 38

b = B(a) c = C(b)

end_node

y = D(c) ∂y ∂a ∂y ∂b ∂y ∂c ∂y ∂y = 1

start_node

x a = A(x)

slide-39
SLIDE 39

b = B(a) c = C(b)

end_node

y = D(c) ∂y ∂x ∂y ∂a ∂y ∂b ∂y ∂c ∂y ∂y = 1

start_node

x a = A(x)

slide-40
SLIDE 40

∂y ∂y = 1 b = B(a) c = C(b) y = D(c)

start_node

x a = A(x)

end_node

slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

ingredients:

  • 1. tracing composition of primitive functions


Node, primitive, forward_pass

  • 2. vector-Jacobian product for each primitive


defvjp

  • 3. composing VJPs backward


backward_pass, make_vjp, grad

slide-44
SLIDE 44

what’s the point? easy to extend!

  • develop autograd!
  • forward mode
  • log joint densities from sampler programs