computation graphs
play

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn - PowerPoint PPT Presentation

Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020 Neural Network Cartoon 1 A common way to illustrate a neural network x h y Philipp Koehn Machine Translation:


  1. Computation Graphs Philipp Koehn 29 September 2020 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  2. Neural Network Cartoon 1 • A common way to illustrate a neural network x h y Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  3. Neural Network Math 2 • Hidden layer h = sigmoid ( W 1 x + b 1 ) • Final layer y = sigmoid ( W 2 h + b 2 ) Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  4. Computation Graph 3 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  5. Simple Neural Network 4 3.7 2.9 4.5 3.7 -5.2 2.9 -1.5 0 . -4.6 2 - 1 1 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  6. Computation Graph 5 � � 3 . 7 3 . 7 W 1 x 2 . 9 2 . 9 � � − 1 . 5 b 1 prod − 4 . 6 sum � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  7. Processing Input 6 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 prod − 4 . 6 sum � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  8. Processing Input 7 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 sum � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  9. Processing Input 8 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 � � 2 . 2 sum − 1 . 6 � 4 . 5 − 5 . 2 � W 2 sigmoid � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  10. Processing Input 9 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 � � 2 . 2 sum − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid . 168 � − 2 . 0 � b 2 prod sum sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  11. Processing Input 10 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 � � 3 . 7 b 1 prod − 4 . 6 2 . 9 � � 2 . 2 sum − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid . 168 � − 2 . 0 � b 2 � 3 . 18 � prod � 1 . 18 � sum � . 765 � sigmoid Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  12. Error Function 11 • For training, we need a measure how well we do ⇒ Cost function also known as objective function, loss, gain, cost, ... • For instance L2 norm error = 1 2( t − y ) 2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  13. Gradient Descent 12 • We view the error as a function of the trainable parameters error ( λ ) • We want to optimize error ( λ ) by moving it towards its optimum 2 error( λ ) error( λ ) error( λ ) = t n e i d a r g gradient = 1 gradient = 0.2 λ λ λ optimal λ current λ optimal λ current λ optimal λ current λ • Why not just set it to its optimum? – we are updating based on one training example, do not want to overfit to it – we are also changing all the other parameters, the curve will look different Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  14. Calculus Refresher: Chain Rule 13 • Formula for computing derivative of composition of two or more functions – functions f and g – composition f ◦ g maps x to f ( g ( x )) • Chain rule ( f ◦ g ) ′ = ( f ′ ◦ g ) · g ′ or F ′ ( x ) = f ′ ( g ( x )) g ′ ( x ) • Leibniz’s notation dy · dy dx = dz dz dx if z = f ( y ) and y = g ( x ) , then dy · dy dz dx = dz dx = f ′ ( y ) g ′ ( x ) = f ′ ( g ( x )) g ′ ( x ) Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  15. Final Layer Update 14 • Linear combination of weights s = � k w k h k • Activation function y = sigmoid ( s ) • Error (L2 norm) E = 1 2 ( t − y ) 2 • Derivative of error with regard to one weight w k dE = dE dy ds dw k dy ds dw k Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  16. Error Computation in Computation Graph 15 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum sigmoid t L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  17. Error Propagation in Computation Graph 16 A B E • Compute derivative at node A : dE dA = dE dB dB dA • Assume that we already computed dE dB (backward pass through graph) • So now we only have to get the formula for dB dA • For instance B is a square node – forward computation: B = A 2 dA = dA 2 – backward computation: dB dA = 2 A Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  18. Derivatives for Each Node 17 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum sigmoid t 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  19. Derivatives for Each Node 18 W 1 x b 1 prod sum W 2 sigmoid b 2 prod sum d sigmoid = do di = d sigmoid di σ ( i ) = σ ( i )(1 − σ ( i )) t d sum 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  20. Derivatives for Each Node 19 W 1 x b 1 prod sum W 2 sigmoid b 2 prod d prod = do d sum di 1 i 1 + i 2 = 1 , do d di 1 = di 2 = 1 sum d sigmoid = do di = d sigmoid di σ ( i ) = σ ( i )(1 − σ ( i )) t d sum 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  21. Derivatives for Each Node 20 W 1 x b 1 prod sum W 2 sigmoid d sum d prod = do di 1 i 1 i 2 = i 2 , do d b 2 di 1 = di 2 = i 1 prod d prod = do d sum di 1 i 1 + i 2 = 1 , do d di 1 = di 2 = 1 sum d sigmoid = do di = d sigmoid di σ ( i ) = σ ( i )(1 − σ ( i )) t d sum 2 ( t − i ) 2 = t − i d sigmoid = do d L2 di = d 1 L2 di Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  22. Backward Pass: Derivative Computation 21 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � 3 . 7 − 4 . 6 prod i 2 , i 1 2 . 9 � � 2 . 2 sum 1 , 1 − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 � 3 . 18 � prod i 2 , i 1 � − 2 . 0 � b 2 � 1 . 18 � 1 , 1 sum � 1 . 0 � t � . 765 � sigmoid σ ′ ( i ) � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  23. Backward Pass: Derivative Computation 22 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � 3 . 7 − 4 . 6 prod i 2 , i 1 2 . 9 � � 2 . 2 sum 1 , 1 − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 � 3 . 18 � prod i 2 , i 1 � − 2 . 0 � b 2 � 1 . 18 � 1 , 1 sum � 1 . 0 � t � . 765 � � . 180 � � . 235 � � . 0424 � sigmoid σ ′ ( i ) = × � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  24. Backward Pass: Derivative Computation 23 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � 3 . 7 − 4 . 6 prod i 2 , i 1 2 . 9 � � 2 . 2 sum 1 , 1 − 1 . 6 � � . 900 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 � 3 . 18 � prod i 2 , i 1 � − 2 . 0 � b 2 � 1 . 18 � � . 0424 � � . 0424 � 1 , 1 sum , � 1 . 0 � t � . 765 � � . 180 � � . 235 � � . 0424 � sigmoid σ ′ ( i ) = × � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  25. Backward Pass: Derivative Computation 24 � � 3 . 7 3 . 7 � � 1 . 0 W 1 x 2 . 9 2 . 9 0 . 0 � � − 1 . 5 b 1 � � � � � � 3 . 7 − . 0260 0171 0 − 4 . 6 prod i 2 , i 1 , 2 . 9 − . 0260 − . 0308 0 � � � � � � 2 . 2 . 0171 . 0171 sum 1 , 1 , − 1 . 6 − . 0308 − . 0308 � � � � . 900 . 0171 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) . 17 − . 0308 � � . 191 � 3 . 18 � � . 0382 . 00712 � prod i 2 , i 1 , � − 2 . 0 � b 2 − . 220 � 1 . 18 � � . 0424 � � . 0424 � 1 , 1 sum , � 1 . 0 � t � . 765 � � . 180 � � . 235 � � . 0424 � sigmoid σ ′ ( i ) = × � . 0277 � � . 235 � i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

  26. Gradients for Parameter Update 25 � � 3 . 7 3 . 7 W 1 x 2 . 9 2 . 9 � � − 1 . 5 b 1 � � . 0171 0 − 4 . 6 prod i 2 , i 1 − . 0308 0 � � . 0171 sum 1 , 1 − . 0308 � 4 . 5 − 5 . 2 � W 2 sigmoid σ ′ ( i ) � . 0382 . 00712 � prod i 2 , i 1 � − 2 . 0 � b 2 � . 0424 � 1 , 1 sum t sigmoid σ ′ ( i ) i 2 − i 1 L2 Philipp Koehn Machine Translation: Computation Graphs 29 September 2020

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend