SLIDE 17 Automatic Differentiation
Automa
ic diffe ifferentia iation ion soft
Theano, Te TensorF
PyTorch
- Only need to program the function g(
g(x, x,y, y,w)
automat atical cally co compute e all derivatives w.r.t. all entries in w
- This is typically done by caching info during forward computation
pass of f, and then doing a backw kward pass = “backp kpropagation”
Autodiff ff / Backp kpropagation can often be done at computational cost comparable to the forward pass
know this exists
How t this i s is d s done? -- outside of scope of CS4100
Summary of Key Ideas
Optimize pr proba babi bility of labe bel given inpu put
Cont ntinuo nuous us opt ptimization
Gradie ient ascent
Compute gr gradient (=steepest uphill direction) – just a vector of partial derivatives
ke step in the gradient direction
Repeat (until held-out data accuracy starts to drop = “early stopping”)
Deep neura ral l nets
Last st la layer: still logistic regression
- Now also many more layers before this last layer
- =
= computing the features
features are learned rather than hand-designed
Univ iversal l functio ion approxim imatio ion theorem
- If neural net is large enough
- Then neural net can represent an
any continuous mapping from input to output with arbitrary accuracy
But remember: need to avoid overfitting / memorizing the training data à early stopping!
Automatic di differentiation gives the derivatives efficiently (how? = outside of scope of CS 4100)
max
w
ll(w) = max
w
X
i
log P(y(i)|x(i); w)