autograd January 31, 2019 1 Automatic Differentiation 1.1 Import - PDF document

May 25, 2023 •475 likes •534 views

autograd January 31, 2019 1 Automatic Differentiation 1.1 Import autograd and create a variable In [1]: from mxnet import autograd, nd x = nd.arange(4).reshape((4, 1)) print(x) [[0.] [1.] [2.] [3.]] <NDArray 4x1 @cpu(0)> 1.2 Attach

autograd January 31, 2019 1 Automatic Differentiation 1.1 Import autograd and create a variable In [1]: from mxnet import autograd, nd x = nd.arange(4).reshape((4, 1)) print(x) [[0.] [1.] [2.] [3.]] <NDArray 4x1 @cpu(0)> 1.2 Attach gradient to x • It allocates memory to store its gradient, which has the same shape as x . • It also tell the system that we need to compute its gradient. In [3]: x.attach_grad() x.grad Out[3]: [[0.] [0.] [0.] [0.]] <NDArray 4x1 @cpu(0)> 1.3 Forward Now compute y = 2 x ⊤ x by placing code inside a with autograd.record(): block. MXNet will build the according computation graph. 1
In [4]: with autograd.record(): y = 2 * nd.dot(x.T, x) y Out[4]: [[28.]] <NDArray 1x1 @cpu(0)> 1.4 Backward In [5]: y.backward() 1.5 Get the gradient Given y = 2 x ⊤ x , we know ∂ y ∂ x = 4 x Now verify the result: In [6]: print((x.grad - 4 * x).norm().asscalar() == 0) print(x.grad) True [[ 0.] [ 4.] [ 8.] [12.]] <NDArray 4x1 @cpu(0)> 1.6 Backward on non-scalar y.backward() equals to y.sum().backward() In [7]: with autograd.record(): y = 2 * x * x print(y.shape) y.backward() print(x.grad) (4, 1) [[ 0.] [ 4.] [ 8.] [12.]] <NDArray 4x1 @cpu(0)> 2
1.7 Training mode and prediction mode The record scope will alter the mode by assuming that gradient is only required for training. It’s necessary since some layers, e.g. batch normalization, behavior differently in the training and prediction modes. In [7]: print(autograd.is_training()) with autograd.record(): print(autograd.is_training()) False True 1.8 Computing the hradient of Python control flow Autograd also works with Python functions and control flows. In [10]: def f(a): b = a * 2 while b.norm().asscalar() < 1000: b = b * 2 if b.sum().asscalar() > 0: c = b else : c = 100 * b return c 1.9 Function behaviors depends on inputs In [11]: a = nd.random.normal(shape=1) a.attach_grad() with autograd.record(): d = f(a) d.backward() 1.10 Verify the results f is piecewise linear in its input a . There exists g such as f ( a ) = ga and ∂ f ∂ a = g . Verify the result: In [12]: print(a.grad == (d / a)) [1.] <NDArray 1 @cpu(0)> 3
1.11 Head gradients and the chain rule ∂ y ∂ x . y.backward() will only compute ∂ y We can break the chain rule manually. Assume ∂ z ∂ x = ∂ z ∂ x . ∂ y To get ∂ z ∂ x , we can first compute ∂ z ∂ y , and then pass it as head gradient to y.backward . In [11]: with autograd.record(): y = x * 2 y.attach_grad() with autograd.record(): z = y * x z.backward() # y.grad = \partial z / \partial y y.backward(y.grad) x.grad == 2*x # x.grad = \partial z / \partial x Out[11]: [[1.] [1.] [1.] [1.]] <NDArray 4x1 @cpu(0)> 4

Recommend

Deep learning 5.7. Writing an autograd function Fran cois Fleuret https://fleuret.org/ee559/

Deep learning 5.7. Writing an autograd function Fran cois Fleuret https://fleuret.org/ee559/ Nov 1, 2020 We have seen how to write new torch.nn.Module s. We may have to implement new functions usable with autograd, so that Module s remain

502 views • 18 slides

Implementing autograd Slides by Matthew Johnson Autograds implementation

Implementing autograd Slides by Matthew Johnson Autograds implementation github.com/hips/autograd Dougal Maclaurin, David Duvenaud, Matt Johnson differentiates native Python code handles most of Numpy + Scipy loops, branching,

967 views • 44 slides

Convolution Layers Convolution Layers In [1]: from mxnet import autograd, nd from mxnet.gluon

2/23/2019 conv-layer slides Convolution Layers Convolution Layers In [1]: from mxnet import autograd, nd from mxnet.gluon import nn http://127.0.0.1:8000/conv-layer.slides.html?print-pdf/#/ 1/9 2/23/2019 conv-layer slides The Cross

232 views • 9 slides

Autoregressive Models Autoregressive Models In [1]: from mxnet import autograd, nd, gluon, init

4/3/2019 sequence slides Autoregressive Models Autoregressive Models In [1]: from mxnet import autograd, nd, gluon, init import d2l # display routines % matplotlib inline from matplotlib import pyplot as plt from IPython import display

196 views • 8 slides

Lecture 17: 3D Vision Justin Johnson November 13, 2019 Lecture 17 - 1 Reminder: A4 A4 due

Lecture 17: 3D Vision Justin Johnson November 13, 2019 Lecture 17 - 1 Reminder: A4 A4 due Today, Wednesday, November 13, 11:59pm A4 covers: - PyTorch autograd - Residual networks - Recurrent neural networks - Attention - Feature

1.19k views • 104 slides

Activation Functions Activation Functions In [1]: % matplotlib inline import d2l from mxnet

Activation Functions Activation Functions In [1]: % matplotlib inline import d2l from mxnet import autograd, nd def xyplot(x_vals, y_vals, name): d2l.set_figsize(figsize=(5, 2.5)) d2l.plt.plot(x_vals.asnumpy(), y_vals.asnumpy())

226 views • 7 slides

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l import mxnet as mx from mxnet import autograd, gluon, image, init, nd from mxnet.gluon import data as gdata, loss as gloss, utils as gutils import sys

109 views • 8 slides

LeNet LeNet http://127.0.0.1:8000/lenet.slides.html?print-pdf/#/ 1/7 2/23/2019 lenet slides

2/23/2019 lenet slides LeNet LeNet http://127.0.0.1:8000/lenet.slides.html?print-pdf/#/ 1/7 2/23/2019 lenet slides In [1]: import d2l import mxnet as mx from mxnet import autograd, gluon, init, nd from mxnet.gluon import loss as gloss, nn

829 views • 7 slides

Concise Implementation of Linear Regression Concise Implementation of Linear Regression

Concise Implementation of Linear Regression Concise Implementation of Linear Regression Generating Data Sets Generating Data Sets In [1]: from mxnet import autograd, nd num_inputs = 2 num_examples = 1000 true_w = nd.array([2, -3.4]) true_b =

782 views • 9 slides

Linear Regression Implementation from Scratch Linear Regression Implementation from Scratch In

Linear Regression Implementation from Scratch Linear Regression Implementation from Scratch In [1]: % matplotlib inline from IPython import display from matplotlib import pyplot as plt from mxnet import autograd, nd import random Generating Data

494 views • 14 slides

Choosing Your Advisor Andrew Wood and Nadezhda Voronova CS 697: Graduate Initiation 2/05/2020

Choosing Your Advisor Andrew Wood and Nadezhda Voronova CS 697: Graduate Initiation 2/05/2020 So you want a PhD Lifecycle of a PhD student: 1) Find a research topic you are interested in 2) Pass your qualifiers: worthy of your

289 views • 15 slides

Union High School Class of 2020 Agenda: Senior Reminders Graduation Information

Union High School Class of 2020 Agenda: Senior Reminders Graduation Information Graduation Auditions Senior Art Class Photo Info Senior Survey Check your grad requirements now Something look wrong? Ask your

810 views • 11 slides

Experimental Par;cle Physics April 4, 2011 Welcome to Grad

Experimental Par;cle Physics April 4, 2011 Welcome to Grad Admits! 1 HEP people Lidija Zivkovic April 4, 2011 Welcome to Grad Admits! 2

121 views • 10 slides

1 Syntactic Analysis (Parsing) Bottom-Up Parsing: Shift-Reduce Grammer a + b + c Impose

Undergraduate Compilers Review and Intro to MJC Some Thoughts on Grad School Goal Announcements learn how to learn a subject in depth Mailing list is in full swing learn how to organize a project, execute it, and write about it

736 views • 5 slides

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret https://fleuret.org/ammi-2018/ Sat Nov 10 11:27:22 UTC 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE The PyTorch module torch.optim provides many

791 views • 14 slides

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks Matrix calculus primer Example: 2-layer Neural Network Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the

428 views • 28 slides

CSC2547: Learning to Search Lecture 2: Background and gradient esitmators Sept 20, 2019

CSC2547: Learning to Search Lecture 2: Background and gradient esitmators Sept 20, 2019 Admin Course email: learn.search.2547@gmail.com Piazza: piazza.com/utoronto.ca/fall2019/csc2547hf Good place to find project partners

874 views • 68 slides

Binary Activated Neural Networks via Continuous Binarization Charbel Sakr *# , Jungwook Choi + ,

True Gradient-Based Training of Deep Binary Activated Neural Networks via Continuous Binarization Charbel Sakr *# , Jungwook Choi + , Zhuo Wang + , Kailash Gopalakrishnan + , Naresh Shanbhag * * University of Illinois at Urbana-Champaign + IBM

628 views • 19 slides

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University of Virginia Overview 1. From Logistic Regression to Neural Networks 2. Expressive Power of Neural Networks 3. Learning Neural Networks 4.

871 views • 49 slides

Selected Topics in Optimization Some slides borrowed from

Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and

403 views • 21 slides

Gradient Gibbs measures with disorder Codina Cotar University College London June 25, 2015, GGI

Gradient Gibbs measures with disorder Gradient Gibbs measures with disorder Codina Cotar University College London June 25, 2015, GGI Partly based on joint works with Christof Klske Gradient Gibbs measures with disorder Outline 1 The model

987 views • 35 slides

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Convolutional neural networks Strong empirical application performance Convolutional networks: neural networks that

755 views • 39 slides

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe)

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe) DataRobot Universit e de Li` ege, Belgium Motivation Motivation Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in scikit-learn 4

819 views • 39 slides

Gradient Descent for L2 Penalized Logistic Regr. N 1 X log BernPMF( t n | ( w T ( x n )))

<latexit

530 views • 20 slides