From Logistic Regression to Neural Networks CMSC 470 Marine - PowerPoint PPT Presentation

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat

Logistic Regression What you should know How to make a prediction with logistic regression classifier How to train a logistic regression classifier Machine learning concepts: Loss function Gradient Descent Algorithm

SGD hyperparameter: the learning rate • The hyperparameter 𝜃 that control the size of the step down the gradient is called the learning rate • If 𝜃 is too large, training might not converge; if 𝜃 is too small, training might be very slow. • How to set the learning rate? Common strategies: 1 • decay over time: 𝜃 = 𝐷+𝑢 Constant Number of hyperparameter samples set by user • Use held-out test set, increase learning rate when likelihood increases

Multiclass Logistic Regression

Formalizing classification Task definition Classifier definition • Given inputs : A function g: x  g(x) = y • an example x often x is a D-dimensional vector of Many different types of functions/classifiers can binary or real values be defined • a fixed set of classes Y • We’ll talk about perceptron, logistic Y = { y 1 , y 2 ,…, y J } regression, neural networks. e.g. word senses from WordNet So far we’ve only worked • Output : a predicted class y  Y with binary classification problems i.e. J = 2

A multiclass logistic regression classifier aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier Goal: predict probability P(y=c|x), where c is one of k classes in set C

The softmax function • A generalization of the sigmoid • Input: a vector z of dimensionality k • Output: a vector of dimensionality k Looks like a probability distribution!

The softmax function Example All values are in [0,1] and sum up to 1: they can be interpreted as probabilities!

A multiclass logistic regression classifier aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier Goal: predict probability P(y=c|x), where c is one of k classes in set C Model definition: We now have one weight vector and one bias PER CLASS

Features in multiclass logistic regression • Features are a function of the input example and of a candidate output class c • represents feature i for a particular class c for a given example x

Example: sentiment analysis with 3 classes {positive (+), negative (-), neutral (0)} • Starting from the features for binary classification • We create one copy of each feature per class

Learning in Multiclass Logistic Regression • Loss function for a single example 1{ } is an indicator function that evaluates to 1 if the condition in the brackets is true, and to 0 otherwise

Learning in Multiclass Logistic Regression • Loss function for a single example

Learning in Multiclass Logistic Regression

Logistic Regression What you should know How to make a prediction with logistic regression classifier How to train a logistic regression classifier For both binary and multiclass problems Machine learning concepts: Loss function Gradient Descent Algorithm Learning rate

Neural Networks

From logistic regression to a neural network unit

Limitation of perceptron ● can only find linear separations between positive and negative examples X O O X

Example: binary classification with a neural network ● Create two classifiers w 0,0 φ 0 [0] 1 φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} 1 φ 0 [1] φ 1 [0] sign φ 0 [1] X O -1 1 b 0,0 φ 0 [0] w 0,1 O X φ 0 [0] -1 φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} -1 φ 0 [1] φ 1 [1] sign -1 1 b 0,1

Example: binary classification with a neural network ● These classifiers map to a new space φ 1 (x 3 ) = {-1, 1} φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} φ 1 [1] φ 2 O X O φ 1 φ 1 [0] X O X O φ 1 (x 1 ) = {-1, -1} φ 1 (x 2 ) = {1, -1} φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} φ 1 (x 4 ) = {-1, -1} 1 φ 1 [0] 1 -1 -1 φ 1 [1] -1 -1

Example: binary classification with a neural network φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} φ 0 [1] X O φ 0 [0] 1 φ 2 [0] = y 1 1 O X φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} φ 1 [1] 1 φ 1 (x 3 ) = {-1, 1} O φ 1 [0] 1 -1 φ 1 [0] -1 φ 1 (x 1 ) = {-1, -1} X φ 1 [1] -1 O φ 1 (x 2 ) = {1, -1} -1 φ 1 (x 4 ) = {-1, -1}

Example: the final network can correctly classify the examples that the perceptron could not. φ 0 [0] Replace “sign” with 1 smoother non-linear function (e.g. tanh, sigmoid) 1 φ 0 [1] φ 1 [0] tanh 1 -1 1 φ 0 [0] φ 2 [0] -1 tanh 1 -1 φ 0 [1] φ 1 [1] tanh -1 1 1 1

Feedforward Neural Networks • Components: • an input layer • an output layer • one or more hidden layers In a fully connected network: each hidden unit takes as input all the units in the previous layer No loops! A 2-layer feedforward neural network

Designing Neural Networks: • Hidden layer can be viewed as set of Activation functions hidden features • The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input • The activation function is a non-linear function that determines range of hidden feature values

Designing Neural Networks: Activation functions

Designing Neural Networks: Network structure • 2 key decisions: • Width (number of nodes per layer) • Depth (number of hidden layers) • More parameters means that the network can learn more complex functions of the input

Forward Propagation: For a given network, and some input values, compute output

Forward Propagation: For a given network, and some input values, compute output Given input (1,0) (and sigmoid non-linearities), we can calculate the output by processing one layer at a time:

Forward Propagation: For a given network, and some input values, compute output Output table for all possible inputs:

Neural Networks as Computation Graphs

Computation Graphs Make Prediction Easy: Forward Propagation consists in traversing graph in topological order

Neural Networks so far • Powerful non-linear models for classification • Predictions are made as a sequence of simple operations • matrix-vector operations • non-linear activation functions • Choices in network structure • Width and depth • Choice of activation function • Feedforward networks • no loop • Next: how to train

From Logistic Regression to Neural Networks CMSC 470 Marine - PowerPoint PPT Presentation

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What you should know How to make a prediction with logistic regression classifier How to train a logistic regression classifier Machine learning

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

XL4A: Logistic Model using OLS1A in Excel 2013 1 Mar 2017 V0E 2x XL4A: V0E2x XL4A: V0E2x 2015

Nonlinear Control Strategies for Aircraft Path Following N. Harris McClamroch Department of

Robots interacting with Humans: confronting the Critical Challenge of Machine Intelligence

Online Feedback Optimization with Applications to Power Systems Florian Drfler ETH Zrich

Dynamic Controllers Simulation x i Newtonian laws gravity ground contact forces x i +1 . . .

Reinforcement Learning-Based End-to-End Parking for Automatic Parking System CS885

A Bayesian Approach to Empirical Local Linearization for Robotics Jo-Anne Ting 1 , Aaron

Lecture 9 Recurrent Neural Networks Im glad that Im Turing Complete now Xinyu Zhou

UTIAS C. J. Damaren University of Toronto Institute for Aerospace Studies 4925 Dufferin