backpropagation and gradient descent
play

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 - PowerPoint PPT Presentation

Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016 Overview Notation/background | Neural networks | Activation functions | Vectorization | Cost functions Introduction Algorithm Overview Four fundamental


  1. Backpropagation and Gradient Descent Brian Carignan, Dec 5 2016

  2. Overview ▪ Notation/background | Neural networks | Activation functions | Vectorization | Cost functions ▪ Introduction ▪ Algorithm Overview ▪ Four fundamental equations | Definitions (all 4) and proofs (1 and 2) ▪ Example from thesis related work 2

  3. Neural Networks 1 3

  4. Neural Networks 2 ▪ a – Activation of a neuron is related to the activations in the previous layer ▪ b – bias of a neuron 4

  5. Activation Functions ▪ Similar to an ON/ OFF switch ▪ Required properties | Nonlinear | Continuously differentiable 5

  6. Vectorization ▪ Represent each layer as a vector | Simplifies notation | Leads to faster computation by exploiting vector math ▪ z – weighted input vector 6

  7. Cost Function ▪ Objective Function ▪ Example: ▪ Optimization Problem ▪ Assumptions | Can average over C x | Function of the outputs ▪ x – individual training examples (fixed) 7

  8. Introduction ▪ Backpropagation | Backward propagation of errors | Calculate gradients | One way to train neural networks ▪ Gradient Descent | Optimization method | Finds a local minimum | Takes steps proportional to -gradient at current point 8

  9. Algorithm Overview 9

  10. Equation 1 ▪ Definition of error: 10

  11. Equation 2 ▪ Key difference | Transpose of weight matrix ▪ Pushes error backwards 11

  12. Equation 3 ▪ Note that previous equations computed error 12

  13. Equation 4 ▪ Describes learning rate ▪ General insights | Slow learning when: | Input activation approaches 0 | Output activation approaches 0 or 1 (from derivative of sigmoid) 13

  14. Proof – Equation 1 ▪ Steps 1. Definition of error 2. Chain rule 3. k=j 4. BP1 (components) 14

  15. Proof – Equation 2 ▪ Steps 1. Definition of error 2. Chain rule 3. Substitute definition of error 4. Derivative of weighted input vector 5. BP2 (components) ▪ Recall: 15

  16. Example – Thesis Related Work 16

  17. 
 References ▪ Michael A. Nielsen, "Neural Networks and Deep Learning", Determination Press, (2015) ▪ Bordes et al. “Translating embeddings for modeling multi-relational data”, NIPS'13, (2013) 
 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend