about me ens mva fair engineer fair phd 3rd year about my
play

About me : ENS -> MVA -> FAIR (engineer) -> FAIR (PhD 3rd - PowerPoint PPT Presentation

Presentation About me : ENS -> MVA -> FAIR (engineer) -> FAIR (PhD 3rd year) About my PhD : Interested in sign matrices and tensors (graphs / multi-graphs) Observe a few entries, predict the remaining edges Factorization


  1. Learning Gradient Descent Back to minimizing Problem : How do we compute the gradient ? Large -> Stochastic Gradient Descent n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1 Complicated function (neural net) -> BackProp

  2. Learning Gradient Descent Back to minimizing Problem : How do we compute the gradient ? Large -> Stochastic Gradient Descent n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1

  3. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1 One function n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j )

  4. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ The Gradient of the Average n i =1 n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j )

  5. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ The Gradient of the Average n i =1 n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) = The Average of the Gradients n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j )

  6. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ The Gradient of the Average n i =1 n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) = The Average of the Gradients n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j ) In expectation, for uniform j

  7. Learning Stochastic Gradient Descent Killing n For some number of iterations : Gradient Step Pick some random example ( x j , y j ) ✓ n +1 ✓ n � ⌘ r θ ` ( f ( x j ; ✓ n ) , y j ) Learning-rate

  8. Learning Back Propagation Computing the gradient Problem : How do we compute the gradient ? n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1 Complicated function (neural net) -> BackProp

  9. Learning BackProp Computing the gradient Problem : How do we compute the gradient ? f i ( x ) = σ ( A i x + b i ) Hidden Layer i

  10. Learning BackProp Computing the gradient Problem : How do we compute the gradient ? f i ( x ) = σ ( A i x + b i ) Hidden Layer i f = f h ( f h − 1 ( f h − 2 ( . . . ))) = ( f h � f h − 1 � . . . � f 1 )( x ) Complete Neural Network

  11. Learning BackProp Computing the gradient Problem : How do we compute the gradient ? f i ( x ) = σ ( A i x + b i ) Hidden Layer i f = f h ( f h − 1 ( f h − 2 ( . . . ))) = ( f h � f h − 1 � . . . � f 1 )( x ) Complete Neural Network r θ ` ( f ( x i ; ✓ ) , y i ) ??

  12. Learning BackProp Computing the gradient ∂ f ∂ x = ∂ f ∂ y Chain-rule : ∂ y ∂ x

  13. Learning BackProp Computing the gradient y h ( y h − 1 ) x y 1 ( x ) y h − 1 ( y h − 2 ) y 2 ( y 1 ) y h − 2 ( y h − 3 ) … ` ( y h ) θ h − 2 θ 2 θ 1 θ h − 1 θ h

  14. Learning BackProp Computing the gradient r θ h ` ( y h ) … ` ( y h ) θ h

  15. Learning BackProp r θ h ` ( y h ) Computing the gradient θ h

  16. Learning BackProp r θ h ` ( y h ) Computing the gradient @` ( y h ) = @` ( y h ) @ y h @✓ h,i @ y h @✓ h,i Chain-Rule θ h

  17. Learning BackProp r θ h ` ( y h ) Computing the gradient @` ( y h ) = @` ( y h ) @ y h @✓ h,i @ y h @✓ h,i Doesn’t depend on current layer Only depends on ` θ h

  18. Learning BackProp r θ h ` ( y h ) Computing the gradient @` ( y h ) = @` ( y h ) @ y h @✓ h,i @ y h @✓ h,i Only depends on current layer θ h

  19. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) θ h − 1

  20. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) Depends on current layer’s structure θ h − 1

  21. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) Known Depends on current layer’s structure θ h − 1

  22. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) Known Depends on Already computed current layer’s structure θ h − 1

  23. Learning BackProp r y h ( ` ( y h )) It’s Backwards !

  24. Learning BackProp r y h ( ` ( y h )) It’s Backwards ! r θ h ( ` ( y h ))

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend