How to compute a derivative Computing derivatives of complicated - PowerPoint PPT Presentation

How to compute a derivative

Computing derivatives of complicated functions • How do you compute the derivatives in an LSTM or GRU cell? • How do you compute derivatives of complicated functions in general • In these slides we will give you some hints • In the slides we will assume vector functions and vector activations • But we will also give you scalar versions of the equations to provide intuition • The two sets will be almost identical, except that when we deal with vector functions • The notation becomes uglier and less intuitive • We must ensure that the dimensions come out right • Please compare vector versions of equations to their scalar counterparts for better intuition, if needed

First: Some notation and conventions • We will refer to the derivative of scalar with respect to as • Regardless of whether the derivative is a scalar, vector, matrix or tensor • The derivative of a scalar w.r.t an column vector is a row vector • The derivative of a scalar w.r.t an matrix is an matrix • Remember our gradient update rule : � � • The derivative of an vector w.r.t an vector is an matrix • The Jacobian

Rules: 1 (scalar) • All terms are scalars • is known

Rules: 1 (vector) • is an vector • is an vector • is an matrix • is a function of • is known (and is a vector) Please verify that the dimensions match!

Rules: 2 (vector, schur multiply) • and are all vectors • “ ” represents component-wise multiplication • is known (and is a vector) Please verify that the dimensions match!

Rules: 3 (scalar) • All terms are scalars • is known

Rules: 3 (vector) • and are all vectors • is known (and is a vector) Please verify that the dimensions match!

Rules: 4 (scalar) • and are scalars • is known

Rules: 4 (vector) • and are vectors • is known (and is a vector) • is the Jacobian of with respect to • May be a diagonal matrix Please verify that the dimensions match!

Rules: 4b (vector) component-wise multiply notation • and are vectors • is known (and is a vector) • is actually a vector of component-wise functions • i.e. � � • is a column vector consisting of the derivatives of the individual components of w.r.t individual components of Please verify that the dimensions match!

Rule 5: Addition of derivatives • Given two variables • And given and • we get • The rule also extends to vector derivatives

Computing derivatives of complex functions • We now are prepared to compute very complex derivatives • Procedure: • Express the computation as a series of computations of intermediate values • Each computation must comprise either a unary or binary relation • Unary relation: RHS has one argument, e.g. • Binary relation: RHS has two arguments e.g. or • Work your way backward through the derivatives of the simple relations

Example: LSTM • Full set of LSTM equations (in the order in which they must be computed) 1 2 3 4 5 6 • Its actually much cleaner to separate the individual components, so lets do that first

LSTM � �� • This is the full set of equations in the order in which they must be computed • Lets rewrite these in terms of unary and binary operations

LSTM � �� • Lets rewrite these in terms of unary and binary operations

LSTM 8. 1. 9. 2. 10. 3. 11. 4. 12. 5. 13. 6. 14. 7.

LSTM �� • Lets rewrite these in terms of unary and binary operations

LSTM 15. 16. 17. 18. 19.

LSTM 15. 16. 17. 18. 19. 20. 21. 22.

LSTM 15. 23. 16. 24. 17. 25. 18. 26. 27. 19. 20. 28. 29. 21. 22.

LSTM 23. 15. 24. 16. 25. 17. 26. 18. 27. 19. 28. 20. 29. 21. 30. 22. 31.

LSTM forward • The full forward computation of the LSTM can be performed by computing Equations 1-31 in sequence • Every one of these equations is unary or binary

LSTM 8. 1. 9. 2. 10. 3. 11. 4. 12. 5. 13. 6. 14. 7.

LSTM 23. 15. 24. 16. 25. 17. 26. 18. 27. 19. 28. 20. 29. 21. 30. 22. 31.

Computing derivatives Derivative shapes: 𝑢 �� • We will now work our way backward �� • We assume derivatives �� and �� of the loss w.r.t ℎ � and 𝐷 � are given �� • We must compute �� , �� and �� • And also derivatives w.r.t the parameters within the cell • Recall: the shape of the derivative for any variable will be transposed with respect to that variable

LSTM 23. 1. � � 24. 2. �� 25. 26. 27. 28. 29. 30. 31.

LSTM 23. 1. � � 24. 2. �� 25. 3. � �� 26. 27. 28. 29. 30. 31.

LSTM 23. 1. � � 24. 2. �� 25. 3. � �� 26. 27. 4. �� 28. 29. 30. 31.

LSTM 23. 1. � � 24. 2. �� 25. 3. � �� 26. 27. 4. �� 28. 29. 5. �� 30. 6. 31. � �� Equations highlighted in yellow show derivatives w.r.t. parameters

LSTM 23. 7. �� 24. 8. �� 25. 26. 27. 28. 29. 30. 31.

LSTM 23. 7. �� 24. 8. �� 25. 9. �� 26. 10. 27. � �� 28. 29. 30. 31.

LSTM 23. 7. �� 24. 8. �� 25. 9. �� 26. 10. 27. � �� 11. 28. �� 12. 29. �� 30. 31.

LSTM 23. 7. �� 24. 8. �� 25. 9. �� 26. 10. 27. � �� 11. 28. �� 12. 29. �� 30. 13. �� 31. 14. ��

LSTM 7. �� 23. 8. �� 24. 9. �� 25. 10. � �� 26. 11. 27. �� 12. 28. �� 13. 29. �� 30. 14. �� 31. 15. �� 16. ��

LSTM 15. 7. �� 16. 8. �� 17. 18. 19. 20. 21. 22.

LSTM 15. 7. �� 16. 8. �� 17. 9. � �� 18. 10. �� 19. 20. 21. 22.

LSTM 15. 7. �� 16. 8. �� 17. 9. � �� 18. 10. �� 19. 11. �� 20. 12. � �� 21. 22. Second time we’re computing a derivative for C t-1 , so we increment the derivative (“+=“)

LSTM 15. 7. �� 16. 8. �� 17. 9. � �� 18. 10. �� 19. 11. �� 20. 12. � �� 21. 13. 22. ��

LSTM 15. 14. � �� 16. 15. �� 17. 18. 19. 20. 21. 22.

LSTM 15. 14. � �� 16. 15. �� 17. 16. � �� 18. 17. �� 19. 20. 21. 22.

LSTM 15. 14. � �� 16. 15. �� 17. 16. � �� 18. 17. �� 19. 18. �� 20. 19. � �� 21. 22. Note the “+=“

LSTM 15. 14. � �� 16. 15. �� 17. 16. � �� 18. 17. �� 19. 18. �� 20. 19. � �� 21. 20. �� 22. 21. �� Note the “+=“

Continuing the computation • Continue the backward progression until the derivatives from forward Equation 1 have been computed • At this point all derivatives will be computed.

Overall procedure • Express the overall computation as a sequence of unary or binary operations • Can be automated • Computes derivatives incrementally, going backward over the sequence of equations! • Since each atomic computation is simple and belongs to one of a small set of possibilities, the conversion to derivatives is trivial once the computation is serialized as above

How to compute a derivative Computing derivatives of complicated - PowerPoint PPT Presentation

How to compute a derivative Computing derivatives of complicated functions How do you compute the derivatives in an LSTM or GRU cell? How do you compute derivatives of complicated functions in general In these slides we will give you

PARTIAL DERIVATIVES MATH 200 GOALS Figure out how to take derivatives of functions of

Geometric Interpretation of the Derivative (Review) Geometric Interpretation of the Derivative

2. Theory of the Derivative 2.1 Tangent Lines 2.2 Definition of Derivative 2.3 Rates of Change

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

MATHEMATICS 1 CONTENTS Derivatives for functions of two variables Higher-order partial

DIRECTIONAL DERIVATIVE MATH 200 GOALS Be able to compute a gradient vector, and use it to

Derivative Function Math 132 Stewart 2.2 In Notes 2.1, we defined the derivative of a

Adjoint Derivative Computation Moritz Diehl and Carlo Savorgnan Adjoint Derivative Computation

Derivatives of Exponential and Logarithm Functions 10/17/2011 The Derivative of y = e x Recall!

Derivatives of Exponential and Logarithmic Functions Michael Freeze MAT 151 UNC Wilmington

3. Applications of the Derivative 3.1 Plotting with Derivatives 3.2 Rate of Change Problems

3.1 Iterated Partial Derivatives Prof. Tesler Math 20C Fall 2018 Prof. Tesler 3.1 Iterated

JSE Limited ALT x Main Equity Agricultural Yield-X Board Derivatives Derivatives Bonds

Calculating Derivatives There are two types of formulas for calculating derivatives, which we may

Derivatives Background (uncertainty) Intro: Derivatives Futures Options

Derivatives Differentiability problems in Banach spaces For vector valued functions there are two

Discrete entropy methods for nonlinear diffusive evolution equations Ansgar J ungel Vienna

Differential Privacy for Relational Algebra: improving the sensitivity bounds via constraint

MATH 12002 - CALCULUS I 2.7: Related Rates Part 2: Examples Professor Donald L. White

qDSA: Small and Secure Digital Signatures with Curve-based Diffie-Hellman Key Pairs Joost Renes 1

Digital Humanities, Computational Linguistics, and Natural Language Processing Dr.-Ing. Michael

Shake: A Better Make Neil Mitchell, Standard Chartered Haskell Implementors Workshop 2010

Algorithms for Big Data (VII) Chihao Zhang Shanghai Jiao Tong University Nov. 1, 2019

1. CDH and DDH One of the most important goals in Cryptography is to identify the exact complexity