in introductio ion to
play

In Introductio ion to Neural l Networks I2DL: Prof. Niessner, - PowerPoint PPT Presentation

In Introductio ion to Neural l Networks I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 2 Recap I2DL: Prof. Niessner, Prof. Leal-Taix 2 Lin inear Regression = a supervised le learn rning method to find a lin linear r model of


  1. In Introductio ion to Neural l Networks I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 1

  2. Lecture 2 Recap I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 2

  3. Lin inear Regression = a supervised le learn rning method to find a lin linear r model of the form 𝑒 𝑧 𝑗 = πœ„ 0 + ෍ ො 𝑦 π‘—π‘˜ πœ„ π‘˜ = πœ„ 0 + 𝑦 𝑗1 πœ„ 1 + 𝑦 𝑗2 πœ„ 2 + β‹― + 𝑦 𝑗𝑒 πœ„ 𝑒 π‘˜=1 Goal: find a model that explains a target y given πœ„ 0 the input x I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 3

  4. Logistic ic Regression β€’ Loss function β„’ 𝑧 𝑗 , ෝ 𝑧 𝑗 = βˆ’π‘§ 𝑗 βˆ™ log ෝ 𝑧 𝑗 + (1 βˆ’ 𝑧 𝑗 ) βˆ™ log[1 βˆ’ ෝ 𝑧 𝑗 ]) β€’ Cost function π‘œ π’Ÿ 𝜾 = βˆ’ ෍ (𝑧 𝑗 βˆ™ log ෝ 𝑧 𝑗 + (1 βˆ’ 𝑧 𝑗 ) βˆ™ log[1 βˆ’ ෝ 𝑧 𝑗 ]) 𝑗=1 𝑧 𝑗 = 𝜏(𝑦 𝑗 𝜾) ෝ Minimization I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 4

  5. Lin inear vs Logistic Regressio ion y=1 y=0 Predictions can exceed the range Predictions are guaranteed of the training samples to be within [0;1] β†’ in the case of classification [0;1] this becomes a real issue I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 5

  6. How to obtain in the Model? Data points Labels (ground truth) 𝒛 π’š Optimization Loss function Model parameters Estimation ෝ 𝒛 𝜾 I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 6

  7. Lin inear Score re Functio ions β€’ Linear score function as seen in linear regression π’ˆ 𝒋 = ෍ π‘₯ 𝑙,π‘˜ 𝑦 π‘˜,𝑗 π’Œ (Matrix Notation) π’ˆ = 𝑿 π’š I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 7

  8. Lin inear Score re Functio ions on Im Images β€’ Linear score function π’ˆ = π‘Ώπ’š On CIFAR-10 On ImageNet Source:: Li/Karpathy/Johnson I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 8

  9. Lin inear Score re Functio ions? Linear Separation Impossible! Logistic Regression I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 9

  10. Lin inear Score re Functio ions? β€’ Can we make linear regression better? – Multiply with another weight matrix 𝑿 πŸ‘ ΰ·  π’ˆ = 𝑿 πŸ‘ β‹… π’ˆ ΰ·  π’ˆ = 𝑿 πŸ‘ β‹… 𝑿 β‹… π’š β€’ Operation is still linear. ΰ·’ 𝑿 = 𝑿 πŸ‘ β‹… 𝑿 ΰ·  π’ˆ = ΰ·’ 𝑿 π’š β€’ Solution β†’ add non-linearity!! I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 10

  11. Neural Network β€’ Linear score function π’ˆ = π‘Ώπ’š β€’ Neural network is a nesting of β€˜functions’ – 2-layers: π’ˆ = 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š) – 3-layers: π’ˆ = 𝑿 πŸ’ max(𝟏, 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š)) – 4-layers: π’ˆ = 𝑿 πŸ“ tanh (𝑿 πŸ’ , max(𝟏, 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š))) – 5-layers: π’ˆ = 𝑿 πŸ” 𝜏(𝑿 πŸ“ tanh(𝑿 πŸ’ , max(𝟏, 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š)))) – … up to hundreds of layers I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 11

  12. In Introductio ion to Neural l Networks I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 12

  13. His istory of of Neural Networks Source: http://beamlab.org/deeplearning/2017/02/23/deep_learning_101_part1.html I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 13

  14. Neural Network Neural Networks Logistic Regression I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 14

  15. Neural Network β€’ Non-linear score function π’ˆ = … (max(𝟏, 𝑿 𝟐 π’š)) On CIFAR-10 Visualizing activations of first layer. Source: ConvNetJS I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 15

  16. Neural Network 1-layer network: π’ˆ = π‘Ώπ’š 2-layer network: π’ˆ = 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š) π’š π’š 𝑿 π’ˆ 𝑿 𝟐 𝑿 2 π’ˆ π’Š 128 Γ— 128 = 16384 1000 10 128 Γ— 128 = 16384 10 Why is this structure useful? I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 16

  17. Neural Network 2-layer network: π’ˆ = 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š) π’š 𝑿 𝟐 𝑿 2 π’ˆ π’Š 128 Γ— 128 = 16384 1000 10 Input Layer Hidden Layer Output Layer I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 17

  18. Net of f Art rtif ificial Neurons 𝑔(𝑋 0,0 𝑦 + 𝑐 0,0 ) 𝑔(𝑋 1,0 𝑦 + 𝑐 1,0 ) 𝑦 1 𝑔(𝑋 0,1 𝑦 + 𝑐 0,1 ) 𝑦 2 𝑔(𝑋 1,1 𝑦 + 𝑐 1,1 ) 𝑔(𝑋 2,0 𝑦 + 𝑐 2,0 ) 𝑔(𝑋 0,2 𝑦 + 𝑐 0,2 ) 𝑦 3 𝑔(𝑋 1,2 𝑦 + 𝑐 1,2 ) 𝑔(𝑋 0,3 𝑦 + 𝑐 0,3 ) I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 18

  19. Neural Network Source: https://towardsdatascience.com/training-deep-neural-networks-9fdb1964b964 I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 19

  20. Activ ivatio ion Functions Leaky ReLU: max 0.1𝑦, 𝑦 1 Sigmoid: 𝜏 𝑦 = (1+𝑓 βˆ’π‘¦ ) tanh: tanh 𝑦 Parametric ReLU: max 𝛽𝑦, 𝑦 Maxout max π‘₯ 1 π‘ˆ 𝑦 + 𝑐 1 , π‘₯ 2 π‘ˆ 𝑦 + 𝑐 2 ReLU: max 0, 𝑦 𝑦 if 𝑦 > 0 ELU f x = α‰Š Ξ± e 𝑦 βˆ’ 1 if 𝑦 ≀ 0 I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 20

  21. Neural Network π’ˆ = 𝑿 πŸ’ β‹… (𝑿 πŸ‘ β‹… 𝑿 𝟐 β‹… π’š )) Why activation functions? Simply concatenating linear layers would be so much cheaper... I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 21

  22. Neural Network Why organize a neural network into layers? I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 22

  23. Bio iolo logical Neurons Credit: Stanford CS 231n I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 23

  24. Bio iolo logical Neurons Credit: Stanford CS 231n I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 24

  25. Art rtif ificial Neural Networks vs Bra rain Artificial neural networks are insp spired by the brain, but not even close in terms of complexity! The comparison is great for the media and news articles however...  I2DL: Prof. Niessner, Prof. Leal-Taixé 25

  26. Art rtif ificial Neural Network 𝑔(𝑋 0,0 𝑦 + 𝑐 0,0 ) 𝑔(𝑋 1,0 𝑦 + 𝑐 1,0 ) 𝑦 1 𝑔(𝑋 0,1 𝑦 + 𝑐 0,1 ) 𝑦 2 𝑔(𝑋 1,1 𝑦 + 𝑐 1,1 ) 𝑔(𝑋 2,0 𝑦 + 𝑐 2,0 ) 𝑔(𝑋 0,2 𝑦 + 𝑐 0,2 ) 𝑦 3 𝑔(𝑋 1,2 𝑦 + 𝑐 1,2 ) 𝑔(𝑋 0,3 𝑦 + 𝑐 0,3 ) I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 26

  27. Neural Network β€’ Summary – Given a dataset with ground truth training pairs [𝑦 𝑗 ; 𝑧 𝑗 ] , – Find optimal weights 𝑿 using stochastic gradient descent, such that the loss function is minimized β€’ Compute gradients with backpropagation (use batch-mode; more later) β€’ Iterate many times over training set (SGD; more later) I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 27

  28. Computatio ional l Graphs I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 28

  29. Computatio ional Gra raphs β€’ Directional graph β€’ Matrix operations are represented as compute nodes. β€’ Vertex nodes are variables or operators like +, -, *, /, log(), exp() … β€’ Directional edges show flow of inputs to vertices I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 29

  30. Computatio ional Gra raphs β€’ 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 β‹… 𝑨 sum 𝑔 𝑦, 𝑧, 𝑨 mult I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 30

  31. Evalu luation: : Forw rward Pass β€’ 𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 β‹… 𝑨 Initialization 𝑦 = 1, 𝑧 = βˆ’3, 𝑨 = 4 1 1 𝑒 = βˆ’2 βˆ’3 sum sum 𝑔 = βˆ’8 βˆ’3 mult 4 4 I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 31

  32. Computatio ional Gra raphs β€’ Why discuss compute graphs? β€’ Neural networks have complicated architectures π’ˆ = 𝑿 πŸ” 𝜏(𝑿 πŸ“ tanh(𝑿 πŸ’ , max(𝟏, 𝑿 πŸ‘ max(𝟏, 𝑿 𝟐 π’š)))) β€’ Lot of matrix operations! β€’ Represent NN as computational graphs! I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 32

  33. Computatio ional Gra raphs A neural network can be represented as a computational graph... – it has compute nodes (operations) – it has edges that connect nodes (data flow) – it is directional – it can be organized into β€˜layers’ I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 33

  34. Computatio ional Gra raphs (2) 𝑔 π‘₯ 11 (2) (2) 𝑦 1 (3) 𝑨 1 𝑏 1 π‘₯ 11 (2) π‘₯ 12 (2) = ෍ (2) + 𝑐 𝑗 (2) (2) (3) π‘₯ 13 π‘₯ 12 𝑨 𝑙 𝑦 𝑗 π‘₯ 𝑗𝑙 (2) π‘₯ 21 (3) 𝑔 𝑗 π‘₯ 21 (2) (3) π‘₯ 22 (2) 𝑦 2 (2) 𝑨 1 𝑨 2 𝑏 2 (2) (2) = 𝑔(𝑨 𝑙 2 ) π‘₯ 23 (3) π‘₯ 22 𝑏 𝑙 (2) (3) π‘₯ 31 π‘₯ 31 𝑔 (2) π‘₯ 32 (2) (3) 𝑦 3 (2) 𝑨 3 𝑨 2 𝑏 3 (2) π‘₯ 33 (3) (3) = ෍ (3) + 𝑐 𝑗 π‘₯ 32 (2) π‘₯ 𝑗𝑙 (3) 𝑨 𝑙 𝑏 𝑗 (2) 𝑐 2 (2) (3) 𝑐 1 𝑐 1 𝑗 (3) (2) 𝑐 2 𝑐 3 … + οΌ‘ + οΌ‘ I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 34

  35. Computatio ional Gra raphs β€’ From a set of neurons to a Structured Compute Pipeline [Szegedy et al.,CVPR’15] Going Deeper with Convolutions I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 35

  36. Computatio ional Gra raphs β€’ The computation of Neural Network has further meanings: – The multiplication of 𝑿 𝒋 and π’š : encode input information – The activation function: select the key features Source; https://www.zybuluo.com/liuhui0803/note/981434 I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 36

  37. Computatio ional Gra raphs β€’ The computations of Neural Networks have further meanings: – The convolutional layers: extract useful features with shared weights Source: https://www.zcfy.cc/original/understanding-convolutions-colah-s-blog I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 37

  38. Computatio ional Gra raphs β€’ The computations of Neural Networks have further meanings: – The convolutional layers: extract useful features with shared weights Source: https://www.zybuluo.com/liuhui0803/note/981434 I2DL: Prof. Niessner, Prof. Leal-TaixΓ© 38

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend