from logistic regression
play

From Logistic Regression to Neural Networks CMSC 470 Marine - PowerPoint PPT Presentation

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What you should know How to make a prediction with logistic regression classifier How to train a logistic regression classifier Machine learning


  1. From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat

  2. Logistic Regression What you should know How to make a prediction with logistic regression classifier How to train a logistic regression classifier Machine learning concepts: Loss function Gradient Descent Algorithm

  3. SGD hyperparameter: the learning rate • The hyperparameter 𝜃 that control the size of the step down the gradient is called the learning rate • If 𝜃 is too large, training might not converge; if 𝜃 is too small, training might be very slow. • How to set the learning rate? Common strategies: 1 • decay over time: 𝜃 = 𝐷+𝑢 Constant Number of hyperparameter samples set by user • Use held-out test set, increase learning rate when likelihood increases

  4. Multiclass Logistic Regression

  5. Formalizing classification Task definition Classifier definition • Given inputs : A function g: x  g(x) = y • an example x often x is a D-dimensional vector of Many different types of functions/classifiers can binary or real values be defined • a fixed set of classes Y • We’ll talk about perceptron, logistic Y = { y 1 , y 2 ,…, y J } regression, neural networks. e.g. word senses from WordNet So far we’ve only worked • Output : a predicted class y  Y with binary classification problems i.e. J = 2

  6. A multiclass logistic regression classifier aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier Goal: predict probability P(y=c|x), where c is one of k classes in set C

  7. The softmax function • A generalization of the sigmoid • Input: a vector z of dimensionality k • Output: a vector of dimensionality k Looks like a probability distribution!

  8. The softmax function Example All values are in [0,1] and sum up to 1: they can be interpreted as probabilities!

  9. A multiclass logistic regression classifier aka multinomial logistic regression, softmax logistic regression, maximum entropy (or maxent) classifier Goal: predict probability P(y=c|x), where c is one of k classes in set C Model definition: We now have one weight vector and one bias PER CLASS

  10. Features in multiclass logistic regression • Features are a function of the input example and of a candidate output class c • represents feature i for a particular class c for a given example x

  11. Example: sentiment analysis with 3 classes {positive (+), negative (-), neutral (0)} • Starting from the features for binary classification • We create one copy of each feature per class

  12. Learning in Multiclass Logistic Regression • Loss function for a single example 1{ } is an indicator function that evaluates to 1 if the condition in the brackets is true, and to 0 otherwise

  13. Learning in Multiclass Logistic Regression • Loss function for a single example

  14. Learning in Multiclass Logistic Regression

  15. Logistic Regression What you should know How to make a prediction with logistic regression classifier How to train a logistic regression classifier For both binary and multiclass problems Machine learning concepts: Loss function Gradient Descent Algorithm Learning rate

  16. Neural Networks

  17. From logistic regression to a neural network unit

  18. Limitation of perceptron ● can only find linear separations between positive and negative examples X O O X

  19. Example: binary classification with a neural network ● Create two classifiers w 0,0 φ 0 [0] 1 φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} 1 φ 0 [1] φ 1 [0] sign φ 0 [1] X O -1 1 b 0,0 φ 0 [0] w 0,1 O X φ 0 [0] -1 φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} -1 φ 0 [1] φ 1 [1] sign -1 1 b 0,1

  20. Example: binary classification with a neural network ● These classifiers map to a new space φ 1 (x 3 ) = {-1, 1} φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} φ 1 [1] φ 2 O X O φ 1 φ 1 [0] X O X O φ 1 (x 1 ) = {-1, -1} φ 1 (x 2 ) = {1, -1} φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} φ 1 (x 4 ) = {-1, -1} 1 φ 1 [0] 1 -1 -1 φ 1 [1] -1 -1

  21. Example: binary classification with a neural network φ 0 (x 1 ) = {-1, 1} φ 0 (x 2 ) = {1, 1} φ 0 [1] X O φ 0 [0] 1 φ 2 [0] = y 1 1 O X φ 0 (x 3 ) = {-1, -1} φ 0 (x 4 ) = {1, -1} φ 1 [1] 1 φ 1 (x 3 ) = {-1, 1} O φ 1 [0] 1 -1 φ 1 [0] -1 φ 1 (x 1 ) = {-1, -1} X φ 1 [1] -1 O φ 1 (x 2 ) = {1, -1} -1 φ 1 (x 4 ) = {-1, -1}

  22. Example: the final network can correctly classify the examples that the perceptron could not. φ 0 [0] Replace “sign” with 1 smoother non-linear function (e.g. tanh, sigmoid) 1 φ 0 [1] φ 1 [0] tanh 1 -1 1 φ 0 [0] φ 2 [0] -1 tanh 1 -1 φ 0 [1] φ 1 [1] tanh -1 1 1 1

  23. Feedforward Neural Networks • Components: • an input layer • an output layer • one or more hidden layers In a fully connected network: each hidden unit takes as input all the units in the previous layer No loops! A 2-layer feedforward neural network

  24. Designing Neural Networks: • Hidden layer can be viewed as set of Activation functions hidden features • The output of the hidden layer indicates the extent to which each hidden feature is “activated” by a given input • The activation function is a non-linear function that determines range of hidden feature values

  25. Designing Neural Networks: Activation functions

  26. Designing Neural Networks: Network structure • 2 key decisions: • Width (number of nodes per layer) • Depth (number of hidden layers) • More parameters means that the network can learn more complex functions of the input

  27. Forward Propagation: For a given network, and some input values, compute output

  28. Forward Propagation: For a given network, and some input values, compute output Given input (1,0) (and sigmoid non-linearities), we can calculate the output by processing one layer at a time:

  29. Forward Propagation: For a given network, and some input values, compute output Output table for all possible inputs:

  30. Neural Networks as Computation Graphs

  31. Computation Graphs Make Prediction Easy: Forward Propagation consists in traversing graph in topological order

  32. Neural Networks so far • Powerful non-linear models for classification • Predictions are made as a sequence of simple operations • matrix-vector operations • non-linear activation functions • Choices in network structure • Width and depth • Choice of activation function • Feedforward networks • no loop • Next: how to train

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend