ece 5984 introduction to machine learning
play

ECE 5984: Introduction to Machine Learning Topics: Neural Networks - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: Neural Networks Backprop Readings: Murphy 16.5 Dhruv Batra Virginia Tech Administrativia HW3 Due: in 2 weeks You will implement primal & dual SVMs Kaggle


  1. ECE 5984: Introduction to Machine Learning Topics: – Neural Networks – Backprop Readings: Murphy 16.5 Dhruv Batra Virginia Tech

  2. Administrativia • HW3 – Due: in 2 weeks – You will implement primal & dual SVMs – Kaggle competition: Higgs Boson Signal vs Background classification – https://inclass.kaggle.com/c/2015-Spring-vt-ece-machine- learning-hw3 – https://www.kaggle.com/c/higgs-boson (C) Dhruv Batra 2

  3. Administrativia • Project Mid-Sem Spotlight Presentations – Friday: 5-7pm, 3-5pm Whittemore 654 – 5 slides (recommended) – 4 minute time (STRICT) + 1-2 min Q&A – Tell the class what you’re working on – Any results yet? – Problems faced? – Upload slides on Scholar (C) Dhruv Batra 3

  4. Recap of Last Time (C) Dhruv Batra 4

  5. Not linearly separable data • Some datasets are not linearly separable! – http://www.eee.metu.edu.tr/~alatan/Courses/Demo/ AppletSVM.html

  6. Addressing non-linearly separable data – Option 1, non-linear features • Choose non-linear features, e.g., – Typical linear features: w 0 + ∑ i w i x i – Example of non-linear features: • Degree 2 polynomials, w 0 + ∑ i w i x i + ∑ ij w ij x i x j • Classifier h w ( x ) still linear in parameters w – As easy to learn – Data is linearly separable in higher dimensional spaces – Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

  7. Addressing non-linearly separable data – Option 2, non-linear classifier • Choose a classifier h w ( x ) that is non-linear in parameters w , e.g., – Decision trees, neural networks, … • More general than linear classifiers • But, can often be harder to learn (non-convex optimization required) • Often very useful (outperforms linear classifiers) • In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

  8. Biological Neuron (C) Dhruv Batra 8

  9. Recall: The Neuron Metaphor • Neurons – accept information from multiple inputs, – transmit information to other neurons. • Multiply inputs by weights along edges • Apply some function to the set of inputs at each node Slide Credit: HKUST 9

  10. Types of Neurons 1 θ 1 θ 0 θ 2 f ( ~ x, ✓ ) 1 θ D θ 1 θ 0 Linear Neuron θ 2 f ( ~ x, ✓ ) 1 θ D θ 1 θ 0 Logistic Neuron θ 2 f ( ~ x, ✓ ) θ D Potentially more. Require a convex loss function for gradient descent training. Perceptron Slide Credit: HKUST 10

  11. Limitation • A single “neuron” is still a linear decision boundary • What to do? • Idea: Stack a bunch of them together! (C) Dhruv Batra 11

  12. Multilayer Networks • Cascade Neurons together • The output from one layer is the input to the next • Each Layer has its own sets of weights ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ f ( x, ~ ✓ 1 , 1 ✓ ) θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 12

  13. Universal Function Approximators • Theorem – 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra 13

  14. Plan for Today • Neural Networks – Parameter learning – Backpropagation (C) Dhruv Batra 14

  15. Forward Propagation • On board (C) Dhruv Batra 15

  16. Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 16

  17. Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 17

  18. Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 18

  19. Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 19

  20. Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 20

  21. Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 21

  22. Gradient Computation • First let’s try: – Single Neuron for Linear Regression – Single Neuron for Logistic Regresion (C) Dhruv Batra 22

  23. Logistic regression • Learning rule – MLE: (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

  24. Gradient Computation • First let’s try: – Single Neuron for Linear Regression – Single Neuron for Logistic Regresion • Now let’s try the general case • Backpropagation! – Really efficient (C) Dhruv Batra 24

  25. Neural Nets • Best performers on OCR – http://yann.lecun.com/exdb/lenet/index.html • NetTalk – Text to Speech system from 1987 – http://youtu.be/tXMaFhO6dIY?t=45m15s • Rick Rashid speaks Mandarin – http://youtu.be/Nu-nlQqFCKg?t=7m30s (C) Dhruv Batra 25

  26. Neural Networks • Demo – http://neuron.eng.wayne.edu/bpFunctionApprox/ bpFunctionApprox.html (C) Dhruv Batra 26

  27. Historical Perspective (C) Dhruv Batra 27

  28. Convergence of backprop • Perceptron leads to convex optimization – Gradient descent reaches global minima • Multilayer neural nets not convex – Gradient descent gets stuck in local minima – Hard to set learning rate – Selecting number of hidden units and layers = fuzzy process – NNs had fallen out of fashion in 90s, early 2000s – Back with a new name and significantly improved performance!!!! • Deep networks – Dropout and trained on much larger corpus (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

  29. Overfitting • Many many many parameters • Avoiding overfitting? – More training data – Regularization – Early stopping (C) Dhruv Batra 29

  30. A quick note (C) Dhruv Batra Image Credit: LeCun et al. ‘98 30

  31. Rectified Linear Units (ReLU) (C) Dhruv Batra 31

  32. Convolutional Nets • Basic Idea – On board – Assumptions: • Local Receptive Fields • Weight Sharing / Translational Invariance / Stationarity – Each layer is just a convolution! Sub-sampling Input image Convolutional layer layer (C) Dhruv Batra Image Credit: Chris Bishop 32

  33. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 33

  34. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 34

  35. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 35

  36. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 36

  37. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 37

  38. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 38

  39. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 39

  40. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 40

  41. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 41

  42. Convolutional Nets • Example: – http://yann.lecun.com/exdb/lenet/index.html C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Convolutions Convolutions Full connection (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy 42

  43. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 43

  44. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 44

  45. (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 45

  46. Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 46

  47. Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 47

  48. Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 48

  49. Autoencoders • Goal – Compression: Output tries to predict input (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders 49

  50. Autoencoders • Goal – Learns a low-dimensional “basis” for the data (C) Dhruv Batra Image Credit: Andrew Ng 50

  51. Stacked Autoencoders • How about we compress the low-dim features more? (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders 51

  52. Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le (C) Dhruv Batra 52

  53. Stacked Autoencoders • Finally perform classification with these low-dim features. (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders 53

  54. What you need to know about neural networks • Perceptron: – Representation – Derivation • Multilayer neural nets – Representation – Derivation of backprop – Learning rule – Expressive power

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend