Implementing autograd
Slides by Matthew Johnson
Implementing autograd Slides by Matthew Johnson Autograds - - PowerPoint PPT Presentation
Implementing autograd Slides by Matthew Johnson Autograds implementation github.com/hips/autograd Dougal Maclaurin, David Duvenaud, Matt Johnson differentiates native Python code handles most of Numpy + Scipy loops, branching,
Slides by Matthew Johnson
github.com/hips/autograd
Autograd’s implementation Dougal Maclaurin, David Duvenaud, Matt Johnson
autodiff implementation options
ingredients:
ingredients:
numpy.sum
numpy.sum autograd.numpy.sum primitive
numpy.sum autograd.numpy.sum primitive Node ã value: function: parents: a F [x]
numpy.sum autograd.numpy.sum primitive Node ã value: function: parents: a F [x] unbox a
numpy.sum autograd.numpy.sum primitive Node ã value: function: parents: a F [x] Node b value: function: parents: b anp.sum [ã] unbox box a b ˜ ˜
start_node
x
start_node
x a = A(x)
b = B(a)
start_node
x a = A(x)
b = B(a) c = C(b)
start_node
x a = A(x)
b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x)
start_node end_node
No control flow!
ingredients:
x a = A(x)
x a = A(x) ∂y ∂a
x a = A(x) ∂y ∂a ∂y ∂x = ?
x a = A(x) ∂y ∂a ∂y ∂x = ∂y ∂a · ∂a ∂x
x a = A(x) ∂y ∂a ∂y ∂x = ∂y ∂a · A0(x)
vector-Jacobian product
ingredients:
b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x)
b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x) ∂y ∂y = 1
b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x) ∂y ∂c ∂y ∂y = 1
∂y ∂b ∂y ∂c b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x) ∂y ∂y = 1
∂y ∂a ∂y ∂b ∂y ∂c b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x) ∂y ∂y = 1
∂y ∂x ∂y ∂a ∂y ∂b ∂y ∂c b = B(a) c = C(b)
end_node
y = D(c)
start_node
x a = A(x) ∂y ∂y = 1
higher-order autodiff just works: the backward pass can itself be traced
b = B(a) c = C(b)
end_node
y = D(c) ∂y ∂y = 1
start_node
x a = A(x)
b = B(a) c = C(b)
end_node
y = D(c) ∂y ∂c ∂y ∂y = 1
start_node
x a = A(x)
b = B(a) c = C(b)
end_node
y = D(c) ∂y ∂b ∂y ∂c ∂y ∂y = 1
start_node
x a = A(x)
b = B(a) c = C(b)
end_node
y = D(c) ∂y ∂a ∂y ∂b ∂y ∂c ∂y ∂y = 1
start_node
x a = A(x)
b = B(a) c = C(b)
end_node
y = D(c) ∂y ∂x ∂y ∂a ∂y ∂b ∂y ∂c ∂y ∂y = 1
start_node
x a = A(x)
∂y ∂y = 1 b = B(a) c = C(b) y = D(c)
start_node
x a = A(x)
end_node
ingredients:
Node, primitive, forward_pass
defvjp
backward_pass, make_vjp, grad
what’s the point? easy to extend!