Lecture 7: Neural Nets Mark Hasegawa-Johnson ECE 417: Multimedia - PowerPoint PPT Presentation

Intro Example #1 Example #2 Learning Backprop Example Summary Lecture 7: Neural Nets Mark Hasegawa-Johnson ECE 417: Multimedia Signal Processing, Fall 2020

Intro Example #1 Example #2 Learning Backprop Example Summary Intro 1 Example #1: Neural Net as Universal Approximator 2 Example #2: Semicircle → Parabola 3 Learning: Gradient Descent and Back-Propagation 4 Backprop Example: Semicircle → Parabola 5 Summary 6

Intro Example #1 Example #2 Learning Backprop Example Summary Outline Intro 1 Example #1: Neural Net as Universal Approximator 2 Example #2: Semicircle → Parabola 3 Learning: Gradient Descent and Back-Propagation 4 Backprop Example: Semicircle → Parabola 5 Summary 6

Intro Example #1 Example #2 Learning Backprop Example Summary What is a Neural Network? Computation in biological neural networks is performed by trillions of simple cells (neurons), each of which performs one very simple computation. Biological neural networks learn by strengthening the connections between some pairs of neurons, and weakening other connections.

Intro Example #1 Example #2 Learning Backprop Example Summary What is an Artificial Neural Network? Computation in an artificial neural network is performed by thousands of simple cells (nodes), each of which performs one very simple computation. Artificial neural networks learn by strengthening the connections between some pairs of nodes, and weakening other connections.

Intro Example #1 Example #2 Learning Backprop Example Summary Two-Layer Feedforward Neural Network x , W (1) , � b (1) , W (2) , � b (2) ) y = h ( � ˆ . . . y k = e (2) y 1 ˆ y 2 ˆ y K ˆ ˆ k e (2) = b (2) + � N j =1 w (2) kj h j k k . . . h k = σ ( e (1) 1 h 1 h 2 h N k ) e (1) = b (1) j =1 w (1) + � D kj x j k k . . . x 1 x 2 x D x is the input vector � 1

Intro Example #1 Example #2 Learning Backprop Example Summary Neural Network = Universal Approximator Assume. . . y k = e (2) Linear Output Nodes: ˆ k Smoothly Nonlinear Hidden Nodes: d σ de finite Smooth Target Function: ˆ y = h ( � x , W , b ) approximates y = h ∗ ( � � x ) ∈ H , where H is some class of sufficiently smooth functions of � x (functions whose Fourier transform has a first moment less than some finite number C ) There are N hidden nodes, ˆ y k , 1 ≤ k ≤ N The input vectors are distributed with some probability density function, p ( � x ), over which we can compute expected values. Then (Barron, 1993) showed that. . . � 1 � x ) | 2 � � x , W , b ) − h ∗ ( � max x ) ∈H min h ( � ≤ O W , b E N h ∗ ( �

Intro Example #1 Example #2 Learning Backprop Example Summary Target: Can we get the neural net to compute this function? Suppose our goal is to find some weights and biases, W (1) , � b (1) , b (2) so that ˆ W (2) , and � y ( � x ) is the nonlinear function shown here:

Intro Example #1 Example #2 Learning Backprop Example Summary Excitation, First Layer: e (1) = b (1) j =1 w (1) + � 2 kj x j k k The first layer of the neural net just computes a linear function of � x . Here’s an example:

Intro Example #1 Example #2 Learning Backprop Example Summary Activation, First Layer: h k = tanh( e (1) k ) The activation nonlinearity then “squashes” the linear function:

Intro Example #1 Example #2 Learning Backprop Example Summary y k = b (2) j =1 w (2) + � 2 Second Layer: ˆ kj h k k The second layer then computes a linear combination of the first-layer activations, which is sufficient to match our desired function:

Intro Example #1 Example #2 Learning Backprop Example Summary Example #2: Semicircle → Parabola Can we design a neural net that converts a semicircle ( x 2 0 + x 2 1 = 1) to a parabola ( y 1 = y 2 0 )?

Intro Example #1 Example #2 Learning Backprop Example Summary Two-Layer Feedforward Neural Network x , W (1) , � b (1) , W (2) , � b (2) ) y = h ( � ˆ . . . y k = e (2) y 1 ˆ y 2 ˆ y K ˆ ˆ k e (2) = b (2) + � N j =1 w (2) kj h j k k . . . h k = σ ( e (1) 1 h 1 h 2 h N k ) e (1) = b (1) j =1 w (1) + � D kj x j k k . . . x 1 x 2 x D x is the input vector � 1

Intro Example #1 Example #2 Learning Backprop Example Summary Example #2: Semicircle → Parabola Let’s define some vector notation: � w (2) � , the j th column of w (2) 0 j Second Layer: Define � = j w (2) 1 j the W (2) matrix, so that w (2) w (2) y = � � � ˆ b + � h j means y k = b k + ˆ kj h j ∀ k . j j j First Layer Activation Function: � � e (1) h k = σ k w (1) = [ w (1) k 0 , w (1) k 1 ], the k th First Layer Excitation: Define ¯ k row of the W (1) matrix, so that e (1) w (1) e (1) w (1) � = ¯ means = kj x j ∀ k . k � x k k j

Intro Example #1 Example #2 Learning Backprop Example Summary Second Layer = Piece-Wise Approximation The second layer of the network approximates ˆ y using a bias term w (2) � b , plus correction vectors � , each scaled by its activation h j : j b (2) + � w (2) y = � ˆ � h j j j The activation, h j , is a number between 0 and 1. For example, we could use the logistic sigmoid function: 1 � � e (1) h k = σ = ∈ (0 , 1) k 1 + exp( − e (1) k ) The logistic sigmoid is a differentiable approximation to a unit step function.

Intro Example #1 Example #2 Learning Backprop Example Summary Step and Logistic nonlinearities Signum and Tanh nonlinearities

Intro Example #1 Example #2 Learning Backprop Example Summary First Layer = A Series of Decisions The first layer of the network decides whether or not to “turn on” each of the h j ’s. It does this by comparing � x to a series of linear threshold vectors: � w (1) 1 ¯ x > 0 k � � � w (1) h k = σ ¯ k � x ≈ w (1) 0 ¯ k � x < 0

Intro Example #1 Example #2 Learning Backprop Example Summary Example #2: Semicircle → Parabola

Intro Example #1 Example #2 Learning Backprop Example Summary How to train a neural network 1 Find a training dataset that contains n examples showing the desired output, � y i , that the NN should compute in response to input vector � x i : D = { ( � x 1 , � y 1 ) , . . . , ( � x n , � y n ) } 2 Randomly initialize the weights and biases, W (1) , � b (1) , W (2) , and � b (2) . 3 Perform forward propagation : find out what the neural net computes as ˆ y i for each � x i . 4 Define a loss function that measures how badly ˆ y differs from � y . 5 Perform back propagation to improve W (1) , � b (1) , W (2) , and � b (2) . 6 Repeat steps 3-5 until convergence.

Intro Example #1 Example #2 Learning Backprop Example Summary x ) be “similar to” h ∗ ( � Loss Function: How should h ( � x )? Minimum Mean Squared Error (MMSE) n W ∗ , b ∗ = arg min L = arg min 1 � x i ) � 2 � � y i − ˆ y ( � 2 n i =1 MMSE Solution: ˆ y → E [ � y | � x ] If the training samples ( � x i , � y i ) are i.i.d., then n →∞ L = 1 y � 2 � � lim � � y − ˆ 2 E which is minimized by y MMSE ( � ˆ x ) = E [ � y | � x ]

Intro Example #1 Example #2 Learning Backprop Example Summary Gradient Descent: How do we improve W and b ? Given some initial neural net parameter (called u kj in this figure), we want to find a better value of the same parameter. We do that using gradient descent: u kj ← u kj − η d L , du kj where η is a learning rate (some small constant, e.g., η = 0 . 02 or so).

Intro Example #1 Example #2 Learning Backprop Example Summary Gradient Descent = Local Optimization Given an initial W , b , find new values of W , b with lower error. kj − η d L w (1) ← w (1) kj dw (1) kj kj − η d L w (2) ← w (2) kj dw (2) kj η =Learning Rate If η too large, gradient descent won’t converge. If too small, convergence is slow. Second-order methods like L-BFGS and Adam choose an optimal η at each step, so they’re MUCH faster.

Lecture 7: Neural Nets Mark Hasegawa-Johnson ECE 417: Multimedia - PowerPoint PPT Presentation

Intro Example #1 Example #2 Learning Backprop Example Summary Lecture 7: Neural Nets Mark Hasegawa-Johnson ECE 417: Multimedia Signal Processing, Fall 2020 Intro Example #1 Example #2 Learning Backprop Example Summary Intro 1

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Today CS 188: Artificial Intelligence Neural Nets (wrap-up) and Decision Trees Neural Nets --

From DB-nets to Coloured Petri Nets with Priorities Marco Montali and Andrey Rivkin KRDB Research

Why Are Convlotuional Nets More Sample-Efficient than Fully-Connected Nets? Zhiyuan Li Joint

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CSC421/2516 Lecture 19: Bayesian Neural Nets Roger Grosse and Jimmy Ba Roger Grosse and Jimmy Ba

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

MorphNet Elad Eban Faster Neural Nets with Hardware-Aware Architecture Learning Where Do

Incident Mobilization Incident Mobilization (R- -T T- -S) Nets S) Nets (R Mobilization

Iridex Group Plastic Protective sleeves Rabitz mesh Soil stabilization grid W Fencing nets

Regularity of the singular set in the fully nonlinear obstacle problem Hui Yu (Columbia) Joint

Reflector Antennas Prof. Girish Kumar Electrical Engineering Department, IIT Bombay

Pictorial structures for object recognition Josef Sivic http://www.di.ens.fr/~josef

optomechanical design: an update Davide Greggio The SHARK-NIR Team: J.Farinato 1 , F.Pedichini 2

A Three-Dimensional Laguerre Geometry Hans Havlicek Institut f ur Geometrie Technische

d i E Quadratic functions a l l u d Dr. Abdulla Eid b A College of Science . r D

10/5/09 cs242 Parametric polymorphism Single algorithm may be given many types

Linear Regression 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Fitting curves to