Some Notes on Automatic Differentiation Chih-Jen Lin National - - PowerPoint PPT Presentation

some notes on automatic differentiation
SMART_READER_LITE
LIVE PREVIEW

Some Notes on Automatic Differentiation Chih-Jen Lin National - - PowerPoint PPT Presentation

Some Notes on Automatic Differentiation Chih-Jen Lin National Taiwan University Last updated: May 25, 2020 Chih-Jen Lin (National Taiwan Univ.) 1 / 13 Here we give some notes on the slides at https://www.cs.toronto.edu/~rgrosse/courses/


slide-1
SLIDE 1

Some Notes on Automatic Differentiation

Chih-Jen Lin

National Taiwan University Last updated: May 25, 2020

Chih-Jen Lin (National Taiwan Univ.) 1 / 13

slide-2
SLIDE 2

Here we give some notes on the slides at https://www.cs.toronto.edu/~rgrosse/courses/ csc321_2018/slides/lec10.pdf

Chih-Jen Lin (National Taiwan Univ.) 2 / 13

slide-3
SLIDE 3

P6 I

The expression on the right means ∂L ∂L = 1 ∂L ∂y = y − t ∂L ∂z = ∂L ∂y σ′(z) ∂L ∂w = ∂L ∂z · x ∂L ∂b = ∂L ∂z · 1

Chih-Jen Lin (National Taiwan Univ.) 3 / 13

slide-4
SLIDE 4

P6 II

“transform the left-hand side into the right-hand side”: we want to calculate ∂L ∂w and it can be replaced by ∂L ∂z · x What we have discussed is the so called reverse mode of automatic differentiation

Chih-Jen Lin (National Taiwan Univ.) 4 / 13

slide-5
SLIDE 5

P6 III

We notice that in every expression we deal with ∂L ∂(something) Note that the backward setting to calculate the gradient of neural networks is a special case of the reverse mode automatic differentiation There is also forward mode of automatic differentiation, in which every node is ∂(something) ∂(variables)

Chih-Jen Lin (National Taiwan Univ.) 5 / 13

slide-6
SLIDE 6

P6 IV

We will probably see an example of forward mode later in discussing Newton methods

Chih-Jen Lin (National Taiwan Univ.) 6 / 13

slide-7
SLIDE 7

P13 I

Things shown on this slide are more general Even for scalar we do the same thing We will see this on p15

Chih-Jen Lin (National Taiwan Univ.) 7 / 13

slide-8
SLIDE 8

P15 I

We want to calculate ∂ ∂x = ∂ ∂y ∂y ∂x So we need ∂ ∂y and ∂y ∂x Further ∂y ∂x may involve y or x

Chih-Jen Lin (National Taiwan Univ.) 8 / 13

slide-9
SLIDE 9

P15 II

That’s why it says to get ∂ ∂x we need ∂ ∂y , x, and y

Chih-Jen Lin (National Taiwan Univ.) 9 / 13

slide-10
SLIDE 10

P15 III

Example: y = exp(x) ∂ ∂x = ∂ ∂y ∂y ∂x = ∂ ∂y exp(x) = ∂ ∂y y

Chih-Jen Lin (National Taiwan Univ.) 10 / 13

slide-11
SLIDE 11

P15 IV

For this case we need ∂ ∂y and y

Chih-Jen Lin (National Taiwan Univ.) 11 / 13

slide-12
SLIDE 12

P17 I

The lines for argnum, parent in zip(argnums, node.parents): vjp = primitive_vjps[fun][argnum] parent_grad = vjp(outgrad, value, *args,

  • utgrads[parent] = add_outgrads(outgrads.

roughly correspond to ∂ ∂xj =

  • i

∂ ∂yi ∂yi ∂xj

  • n p13

Chih-Jen Lin (National Taiwan Univ.) 12 / 13

slide-13
SLIDE 13

P20 I

We do not discuss pages 20-23

Chih-Jen Lin (National Taiwan Univ.) 13 / 13