ECE 5984: Introduction to Machine Learning Topics: Neural Networks - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: – Neural Networks – Backprop Readings: Murphy 16.5 Dhruv Batra Virginia Tech

Administrativia • HW3 – Due: in 2 weeks – You will implement primal & dual SVMs – Kaggle competition: Higgs Boson Signal vs Background classification – https://inclass.kaggle.com/c/2015-Spring-vt-ece-machine- learning-hw3 – https://www.kaggle.com/c/higgs-boson (C) Dhruv Batra 2

Administrativia • Project Mid-Sem Spotlight Presentations – Friday: 5-7pm, 3-5pm Whittemore 654 – 5 slides (recommended) – 4 minute time (STRICT) + 1-2 min Q&A – Tell the class what you’re working on – Any results yet? – Problems faced? – Upload slides on Scholar (C) Dhruv Batra 3

Recap of Last Time (C) Dhruv Batra 4

Not linearly separable data • Some datasets are not linearly separable! – http://www.eee.metu.edu.tr/~alatan/Courses/Demo/ AppletSVM.html

Addressing non-linearly separable data – Option 1, non-linear features • Choose non-linear features, e.g., – Typical linear features: w 0 + ∑ i w i x i – Example of non-linear features: • Degree 2 polynomials, w 0 + ∑ i w i x i + ∑ ij w ij x i x j • Classifier h w ( x ) still linear in parameters w – As easy to learn – Data is linearly separable in higher dimensional spaces – Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin 6

Addressing non-linearly separable data – Option 2, non-linear classifier • Choose a classifier h w ( x ) that is non-linear in parameters w , e.g., – Decision trees, neural networks, … • More general than linear classifiers • But, can often be harder to learn (non-convex optimization required) • Often very useful (outperforms linear classifiers) • In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin 7

Biological Neuron (C) Dhruv Batra 8

Recall: The Neuron Metaphor • Neurons – accept information from multiple inputs, – transmit information to other neurons. • Multiply inputs by weights along edges • Apply some function to the set of inputs at each node Slide Credit: HKUST 9

Types of Neurons 1 θ 1 θ 0 θ 2 f ( ~ x, ✓ ) 1 θ D θ 1 θ 0 Linear Neuron θ 2 f ( ~ x, ✓ ) 1 θ D θ 1 θ 0 Logistic Neuron θ 2 f ( ~ x, ✓ ) θ D Potentially more. Require a convex loss function for gradient descent training. Perceptron Slide Credit: HKUST 10

Limitation • A single “neuron” is still a linear decision boundary • What to do? • Idea: Stack a bunch of them together! (C) Dhruv Batra 11

Multilayer Networks • Cascade Neurons together • The output from one layer is the input to the next • Each Layer has its own sets of weights ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ f ( x, ~ ✓ 1 , 1 ✓ ) θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 12

Universal Function Approximators • Theorem – 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra 13

Plan for Today • Neural Networks – Parameter learning – Backpropagation (C) Dhruv Batra 14

Forward Propagation • On board (C) Dhruv Batra 15

Feed-Forward Networks • Predictions are fed forward through the network to classify ~ ~ ✓ 1 , 0 ✓ 0 , 0 x 0 θ 2 , 0 ~ x 1 ✓ 0 , 1 θ 2 , 1 ~ ✓ 1 , 1 θ 2 , 2 x 2 ~ ~ ✓ 0 , 2 ✓ 1 , 2 x P Slide Credit: HKUST 16

Gradient Computation • First let’s try: – Single Neuron for Linear Regression – Single Neuron for Logistic Regresion (C) Dhruv Batra 22

Logistic regression • Learning rule – MLE: (C) Dhruv Batra Slide Credit: Carlos Guestrin 23

Gradient Computation • First let’s try: – Single Neuron for Linear Regression – Single Neuron for Logistic Regresion • Now let’s try the general case • Backpropagation! – Really efficient (C) Dhruv Batra 24

Neural Nets • Best performers on OCR – http://yann.lecun.com/exdb/lenet/index.html • NetTalk – Text to Speech system from 1987 – http://youtu.be/tXMaFhO6dIY?t=45m15s • Rick Rashid speaks Mandarin – http://youtu.be/Nu-nlQqFCKg?t=7m30s (C) Dhruv Batra 25

Neural Networks • Demo – http://neuron.eng.wayne.edu/bpFunctionApprox/ bpFunctionApprox.html (C) Dhruv Batra 26

Historical Perspective (C) Dhruv Batra 27

Convergence of backprop • Perceptron leads to convex optimization – Gradient descent reaches global minima • Multilayer neural nets not convex – Gradient descent gets stuck in local minima – Hard to set learning rate – Selecting number of hidden units and layers = fuzzy process – NNs had fallen out of fashion in 90s, early 2000s – Back with a new name and significantly improved performance!!!! • Deep networks – Dropout and trained on much larger corpus (C) Dhruv Batra Slide Credit: Carlos Guestrin 28

Overfitting • Many many many parameters • Avoiding overfitting? – More training data – Regularization – Early stopping (C) Dhruv Batra 29

A quick note (C) Dhruv Batra Image Credit: LeCun et al. ‘98 30

Rectified Linear Units (ReLU) (C) Dhruv Batra 31

Convolutional Nets • Basic Idea – On board – Assumptions: • Local Receptive Fields • Weight Sharing / Translational Invariance / Stationarity – Each layer is just a convolution! Sub-sampling Input image Convolutional layer layer (C) Dhruv Batra Image Credit: Chris Bishop 32

(C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato 33

Convolutional Nets • Example: – http://yann.lecun.com/exdb/lenet/index.html C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Convolutions Convolutions Full connection (C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy 42

Visualizing Learned Filters (C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14] 46

Autoencoders • Goal – Compression: Output tries to predict input (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders 49

Autoencoders • Goal – Learns a low-dimensional “basis” for the data (C) Dhruv Batra Image Credit: Andrew Ng 50

Stacked Autoencoders • How about we compress the low-dim features more? (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders 51

Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le (C) Dhruv Batra 52

Stacked Autoencoders • Finally perform classification with these low-dim features. (C) Dhruv Batra Image Credit: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders 53

What you need to know about neural networks • Perceptron: – Representation – Derivation • Multilayer neural nets – Representation – Derivation of backprop – Learning rule – Expressive power

ECE 5984: Introduction to Machine Learning Topics: Neural Networks - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: Neural Networks Backprop Readings: Murphy 16.5 Dhruv Batra Virginia Tech Administrativia HW3 Due: in 2 weeks You will implement primal & dual SVMs Kaggle

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

ECE 5984: Introduction to Machine Learning Topics: Decision/Classification Trees Readings:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introduction to Machine Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

In Introductio ion to Neural l Networks I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 2

Linearly-polarized small-x gluons in forward heavy quark production Pieter Taels, INFN Pavia REF

Space-Time Discontinuous Galerkin Discretizations for Linear First-Order Hyperbolic Evolution

Measurement of helicity dependence of p 0 photoproduction on deuteron Federico Cividini

Deep Neural Networks and Mixed Integer Linear Optimization Matteo Fischetti, University of

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

Neural Networks with Googles TensorFlow Shuo Zhang Computational discourse analysis 11/22/16

dt < | ( ) | h t (this has to do with system stability system stability (ECE

ECE 5984: Introduction to Machine Learning Topics: Neural Networks - PowerPoint PPT Presentation

ECE 5984: Introduction to Machine Learning Topics: Neural Networks Backprop Readings: Murphy 16.5 Dhruv Batra Virginia Tech Administrativia HW3 Due: in 2 weeks You will implement primal & dual SVMs Kaggle

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning Measuring

ECE 5984: Introduction to Machine Learning Topics: Supervised Learning General Setup,

ECE 5984: Introduction to Machine Learning Topics: Decision/Classification Trees Readings:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introduction to Machine Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

In Introductio ion to Neural l Networks I2DL: Prof. Niessner, Prof. Leal-Taix 1 Lecture 2

Linearly-polarized small-x gluons in forward heavy quark production Pieter Taels, INFN Pavia REF

Space-Time Discontinuous Galerkin Discretizations for Linear First-Order Hyperbolic Evolution

Measurement of helicity dependence of p 0 photoproduction on deuteron Federico Cividini

Deep Neural Networks and Mixed Integer Linear Optimization Matteo Fischetti, University of

CS885 Reinforcement Learning Lecture 4b: May 11, 2018 Deep Q-networks [SutBar] Sec. 9.4, 9.7,

Neural Networks with Googles TensorFlow Shuo Zhang Computational discourse analysis 11/22/16

dt &lt; | ( ) | h t (this has to do with system stability system stability (ECE

dt < | ( ) | h t (this has to do with system stability system stability (ECE