deep learning for classification
play

Deep Learning for Classification CS293S, Yang, 2017 Computational - PowerPoint PPT Presentation

Deep Learning for Classification CS293S, Yang, 2017 Computational graph for classification w 1 f 1 w 2 S f 2 >0? w 3 f 3 Objective: Classification Accuracy m l acc ( w ) = 1 sign( w > f ( x ( i ) )) == y ( i ) X m i =1


  1. Deep Learning for Classification CS293S, Yang, 2017

  2. Computational graph for classification w 1 f 1 w 2 S f 2 >0? w 3 f 3 • Objective: Classification Accuracy m l acc ( w ) = 1 ⇣ sign( w > f ( x ( i ) )) == y ( i ) ⌘ X m i =1 – Issue: How to find these parameteres? Slide 1

  3. Neural Net with Soft-Max • Score for y=1: Score for y=-1: − w > f ( x ) w > f ( x ) e w > f ( x ( i ) ) • Probability of label: p ( y = 1 | f ( x ); w ) = e w > f ( x ) + e − w > f ( x ) e − w > f ( x ) p ( y = − 1 | f ( x ); w ) = e w > f ( x ) + e − w > f ( x ) m Y p ( y = y ( i ) | f ( x ( i ) ); w ) l ( w ) = • Objective: i =1 m X log p ( y = y ( i ) | f ( x ( i ) ); w ) • Log: ll ( w ) = i =1 Slide 2

  4. Two-Layer Neural Network w 1 S w 2 1 >0? w 3 1 w 1 1 f 1 w 1 S w 2 w 2 S 2 f 2 >0? w 3 2 2 w 3 f 3 w 1 S w 2 3 >0? w 3 3 z → tanh( z ) = e z − e − z 3 e z + e − z Slide 3

  5. N-Layer Neural Network S S S >0? >0? >0? … f 1 S S S S f 2 >0? >0? >0? … f 3 S S S … >0? >0? >0? Slide 4

  6. Convolutional Network (AlexNet) input image weights loss 5 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 5

  7. Activation Functions Leaky ReLU max(0.1x, x) Sigmoid Maxout tanh tanh(x) ELU ReLU max(0,x) 6 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 6

  8. Multi-class Softmax • 3-class softmax – classes A, B, C – 3 weight vectors: w A , w B , w C • Probability of label A: (similar for B, C) e w > A f ( x ) p ( y = A | f ( x ); w ) = A f ( x ) + e w > B f ( x ) + e w > e w > C f ( x ) m • Objective: Y p ( y = y ( i ) | f ( x ( i ) ; w ) l ( w ) = i =1 m X log p ( y = y ( i ) | f ( x ( i ) ; w ) ll ( w ) = • Log: i =1 Slide 7

  9. Multi-class Two-Layer Neural Network w 1 A Score for A w 1 S S w 2 1 >0? A w 2 w 3 1 A w 3 1 f 1 B w 1 w 1 Score for B B w 2 S S w 2 2 f 2 >0? B w 3 w 3 2 2 f 3 C w 1 Score for C w 1 S C w 2 S w 2 3 >0? C w 3 w 3 3 z → tanh( z ) = e z − e − z 3 e z + e − z Slide 8

  10. Gradient Descent Method for Optimization • How to find parameters that minimize an objective function? • Idea: – Start somewhere – Repeat: Take a step in the steepest descent direction Figure source: Mathworks Slide 9

  11. Generally, Steepest Direction • Steepest Direction = direction of the gradient   ∂ g ∂ w 1 ∂ g   ∂ w 2 r g =   § Gradient Descent   · · ·   ∂ g ∂ w n • Init: w • For i = 1, 2, … w w � α ⇤ r g ( w ) Slide 10

  12. What is the Steepest Descent Direction? min 2 ≤ ✏ g ( w + ∆ ) ∆ : ∆ 2 1 + ∆ 2 g ( w + ∆ ) ≈ g ( w ) + ∂ g ∆ 1 + ∂ g • First-Order Taylor Expansion: ∆ 2 ∂ w 1 ∂ w 2 ∂ g ∆ 1 + ∂ g ∆ 2 min • Steepest Descent Direction: ∂ w 1 ∂ w 2 ∆ : ∆ 2 1 + ∆ 2 2 ≤ ✏ a = � b ✏ a : k a k ✏ a > b min • Recall: à k b k ✏ �r g • Hence, solution: " # ∂ g kr g k ∂ w 1 r g = ∂ g ∂ w 2 Slide 11

  13. How to Calculate a Partial Deriviate in a Computational Graph Given a function f(x,y,z)= (x+y)z, What is the partial derivie of f with respect to x, y, z? 12 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 12

  14. e.g. x = -2, y = 5, z = -4 x, y, z values are from a training example 13 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 13

  15. e.g. x = -2, y = 5, z = -4 Want: 14 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 14

  16. e.g. x = -2, y = 5, z = -4 Want: 15 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 15

  17. e.g. x = -2, y = 5, z = -4 Want: 16 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 16

  18. e.g. x = -2, y = 5, z = -4 Want: 17 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 17

  19. e.g. x = -2, y = 5, z = -4 Want: 18 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 18

  20. e.g. x = -2, y = 5, z = -4 Want: 19 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 19

  21. e.g. x = -2, y = 5, z = -4 Want: 20 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 20

  22. e.g. x = -2, y = 5, z = -4 Want: 21 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 21

  23. e.g. x = -2, y = 5, z = -4 Chain rule: Want: 22 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 22

  24. e.g. x = -2, y = 5, z = -4 Want: 23 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 23

  25. e.g. x = -2, y = 5, z = -4 Chain rule: Want: 24 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 24

  26. activations f 25 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 25

  27. activations “local gradient” f 26 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 26

  28. activations “local gradient” f gradients 27 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 27

  29. activations “local gradient” f gradients 28 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 28

  30. activations “local gradient” f gradients 29 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 29

  31. activations “local gradient” f gradients 30 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 30

  32. Another example: 31 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 31

  33. Another example: 32 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 32

  34. Another example: 33 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 33

  35. Another example: 34 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 34

  36. Another example: 35 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 35

  37. Another example: 36 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 36

  38. Another example: 37 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 37

  39. Another example: 38 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 38

  40. Another example: 39 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 39

  41. Another example: (-1) * (-0.20) = 0.20 40 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 40

  42. Another example: 41 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 41

  43. Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) 42 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 42

  44. Another example: 43 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 43

  45. Another example: [local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2 44 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 44

  46. sigmoid function sigmoid gate 45 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 45

  47. sigmoid function sigmoid gate (0.73) * (1 - 0.73) = 0.2 46 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 46

  48. Gradients add at branches + 47 Lecture 5 - 20 Jan 2016 Fei-Fei Li & Andrej Karpathy & Justin Johnson Slide 47

  49. Summary • Deep learning – New direction for test processing given its success in image/audio processing – Framworks and software • TensorFllow (Google). • Others: Theano, Torch, CAFFE, computation graph toolkit (CGT) Slide 48

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend