machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39 Final projects WSDM Cup SemEval 2018 Machine Learning:


  1. Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39

  2. Final projects • WSDM Cup • SemEval 2018 Machine Learning: Chenhao Tan | Boulder | 2 of 39

  3. Overview Forward propagation recap Back propagation Chain rule Back propagation Full algorithm Machine Learning: Chenhao Tan | Boulder | 3 of 39

  4. Forward propagation recap Outline Forward propagation recap Back propagation Chain rule Back propagation Full algorithm Machine Learning: Chenhao Tan | Boulder | 4 of 39

  5. Forward propagation recap Forward propagation algorithm Store the biases for layer l in b l , weight matrix in W l W 1 , b 1 W 2 , b 2 W 3 , b 3 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 5 of 39

  6. Forward propagation recap Forward propagation algorithm Suppose your network has L layers Make a prediction based on test point x 1: Initialize a 0 = x 2: for l = 1 to L do z l = W l a l − 1 + b l 3: a l = g ( z l ) 4: 5: end for y is simply a L 6: The prediction ˆ Machine Learning: Chenhao Tan | Boulder | 6 of 39

  7. Forward propagation recap Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) W l , b l , l = 1 , . . . , L • Loss function (objective function) L ( y , ˆ y ) • How do we learn the parameters? Machine Learning: Chenhao Tan | Boulder | 7 of 39

  8. Forward propagation recap Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) W l , b l , l = 1 , . . . , L • Loss function (objective function) L ( y , ˆ y ) • How do we learn the parameters? Stochastic gradient descent, W l ← W l − η∂ L ( y , ˆ y ) ∂ W l Machine Learning: Chenhao Tan | Boulder | 7 of 39

  9. Forward propagation recap Challenge • Challenge : How the heck do we compute derivatives of the loss function with respect to weights and biases? • Solution : Back Propagation Machine Learning: Chenhao Tan | Boulder | 8 of 39

  10. Back propagation Outline Forward propagation recap Back propagation Chain rule Back propagation Full algorithm Machine Learning: Chenhao Tan | Boulder | 9 of 39

  11. Back propagation | Chain rule The Chain Rule The chain rule allows us to take derivatives of nested functions. There are two forms of the Chain Rule Machine Learning: Chenhao Tan | Boulder | 10 of 39

  12. Back propagation | Chain rule The Chain Rule The chain rule allows us to take derivatives of nested functions. There are two forms of the Chain Rule Baby Chain Rule : dx f ( g ( x )) = f ′ ( g ( x )) g ′ ( x ) = df d dg dg dx Machine Learning: Chenhao Tan | Boulder | 10 of 39

  13. Back propagation | Chain rule The Chain Rule The chain rule allows us to take derivatives of nested functions. There are two forms of the Chain Rule Baby Chain Rule : dx f ( g ( x )) = f ′ ( g ( x )) g ′ ( x ) = df d dg dg dx d dx sin( x 2 ) = cos( x 2 ) 2 x Example: Machine Learning: Chenhao Tan | Boulder | 10 of 39

  14. Back propagation | Chain rule The Chain Rule Full-Grown Adult Chain Rule : x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 11 of 39

  15. Back propagation | Chain rule The Chain Rule Full-Grown Adult Chain Rule : x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Derivative of L with respect to x : ∂ f ∂ x Similarly, ∂ f ∂ y , ∂ f ∂ z Machine Learning: Chenhao Tan | Boulder | 11 of 39

  16. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to r ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 12 of 39

  17. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to r ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) ∂ f ∂ r = ∂ f ∂ x ∂ r + ∂ f ∂ r + ∂ f ∂ y ∂ z ∂ x ∂ y ∂ z ∂ r Machine Learning: Chenhao Tan | Boulder | 12 of 39

  18. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 13 of 39

  19. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) ∂ f ∂ s = ∂ f ∂ x ∂ s + ∂ f ∂ s + ∂ f ∂ y ∂ z ∂ x ∂ y ∂ z ∂ s Machine Learning: Chenhao Tan | Boulder | 13 of 39

  20. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = ∂ f ∂ x ∂ s + ∂ f ∂ s + ∂ f ∂ y ∂ z ∂ x ∂ y ∂ z ∂ s Machine Learning: Chenhao Tan | Boulder | 14 of 39

  21. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = yz · 0 + xz · r + xy · 1 Machine Learning: Chenhao Tan | Boulder | 14 of 39

  22. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = rs 2 · 0 + rs · r + r 2 s · 1 Machine Learning: Chenhao Tan | Boulder | 14 of 39

  23. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) r ( u , v ) u y ( r , s ) f ( x , y , z ) s ( u , v ) v z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f ∂ s = 2 r 2 s Machine Learning: Chenhao Tan | Boulder | 14 of 39

  24. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to s ? x ( r , s ) r ( u , v ) u y ( r , s ) f ( x , y , z ) s ( u , v ) v z ( r , s ) Example : Let f = xyz , x = r , y = rs , and z = s . Find ∂ f /∂ s ∂ f f ( r , s ) = r · rs · s = r 2 s 2 ∂ s = 2 r 2 s � ⇒ Machine Learning: Chenhao Tan | Boulder | 15 of 39

  25. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Machine Learning: Chenhao Tan | Boulder | 16 of 39

  26. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) ∂ f ∂ f ∂ r ∂ u + ∂ f ∂ s = ∂ u ∂ r ∂ s ∂ u Machine Learning: Chenhao Tan | Boulder | 16 of 39

  27. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Crux : If you know derivative of objective w.r.t. intermediate value in the chain, can eliminate everything in between. Machine Learning: Chenhao Tan | Boulder | 17 of 39

  28. Back propagation | Chain rule The Chain Rule What is the derivative of f with respect to u ? x ( r , s ) u r ( u , v ) y ( r , s ) f ( x , y , z ) v s ( u , v ) z ( r , s ) Crux : If you know derivative of objective w.r.t. intermediate value in the chain, can eliminate everything in between. This is the cornerstone of the Back Propagation algorithm. Machine Learning: Chenhao Tan | Boulder | 17 of 39

  29. Back propagation | Back propagation Back Propagation W 1 , b 1 W 2 , b 2 W 3 , b 3 W 4 , b 4 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 18 of 39

  30. Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 We want to use back propagation to compute partial derivative of L w.r.t. the weights and biases ∂ L , for l = 1 , 2 ∂ w 2 ij Machine Learning: Chenhao Tan | Boulder | 19 of 39

  31. Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 We need to choose an intermediate term that lives on the nodes, that we can easily compute derivative with respect to Could choose a ’s, but we’ll choose z ’s because math is easier Machine Learning: Chenhao Tan | Boulder | 19 of 39

  32. Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 Define the derivative w.r.t. the z ’s by δ : j = ∂ L δ l ∂ z l j Note that δ l has the same size as z l and a l Machine Learning: Chenhao Tan | Boulder | 19 of 39

  33. Back propagation | Back propagation Back Propagation For the derivation, we’ll consider a simplified network L ( y , a 2 ) z 1 | a 1 W 2 z 2 | a 2 a 0 W 1 Let’s compute δ L for output layer L : da L j = ∂ L = ∂ L j δ L ∂ z L ∂ a L dz L j j j Machine Learning: Chenhao Tan | Boulder | 19 of 39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend