Neural Networks Representations Learning in the net Problem: Given - PowerPoint PPT Presentation

Neural Networks Representations

Learning in the net • Problem: Given a collection of input-output pairs, learn the function

Learning for classification x 2 x 1 • When the net must learn to classify.. – Learn the classification boundaries that separate the training instances

Learning for classification x 2 • In reality – In general not really cleanly separated • So what is the function we learn?

In reality: Trivial linear example x 2 x 1 5 • Two-dimensional example – Blue dots (on the floor) on the “red” side – Red dots (suspended at Y=1) on the “blue” side – No line will cleanly separate the two colors 5

Non-linearly separable data: 1-D example y x • One-dimensional example for visualization – All (red) dots at Y=1 represent instances of class Y=1 – All (blue) dots at Y=0 are from class Y=0 – The data are not linearly separable • In this 1-D example, a linear separator is a threshold • No threshold will cleanly separate red and blue dots 6

Undesired Function y x • One-dimensional example for visualization – All (red) dots at Y=1 represent instances of class Y=1 – All (blue) dots at Y=0 are from class Y=0 – The data are not linearly separable • In this 1-D example, a linear separator is a threshold • No threshold will cleanly separate red and blue dots 7

What if? y x • One-dimensional example for visualization – All (red) dots at Y=1 represent instances of class Y=1 – All (blue) dots at Y=0 are from class Y=0 – The data are not linearly separable • In this 1-D example, a linear separator is a threshold • No threshold will cleanly separate red and blue dots 8

What if? y 90 instances 10 instances x • What must the value of the function be at this X? – 1 because red dominates? – 0.9 : The average? 9

What if? y 90 instances 10 instances x • What must the value of the function be at this X? Estimate: – 1 because red dominates? Potentially much more useful than a simple 1/0 decision – 0.9 : The average? Also, potentially more realistic 10

What if? y 90 instances Should an infinitesimal nudge of the red dot change the function estimate entirely? 10 instances If not, how do we estimate 𝑄(1|𝑌) ? (since the positions of the red and blue X Values are different) x • What must the value of the function be at this X? Estimate: – 1 because red dominates? Potentially much more useful than a simple 1/0 decision – 0.9 : The average? Also, potentially more realistic 11

The probability of y=1 y x • Consider this differently: at each point look at a small window around that point • Plot the average value within the window – This is an approximation of the probability of Y=1 at that point 12

The probability of y=1 y x • Consider this differently: at each point look at a small window around that point • Plot the average value within the window – This is an approximation of the probability of 1 at that point 13

The logistic regression model 1   P ( y 1 x )     ( w w x ) 1 e  y=1 y=0 x • Class 1 becomes increasingly probable going left to right – Very typical in many problems 25

The logistic perceptron 1  P ( y x )     ( w w x ) 1 e  � � • A sigmoid perceptron with a single input models the a posteriori probability of the class given the input

Non-linearly separable data x 2 x 1 27 • Two-dimensional example – Blue dots (on the floor) on the “red” side – Red dots (suspended at Y=1) on the “blue” side – No line will cleanly separate the two colors 27

Logistic regression Decision: y > 0.5? � � � � x 2 � � � � � When X is a 2-D variable x 1 • This the perceptron with a sigmoid activation – It actually computes the probability that the input belongs to class 1 – Decision boundaries may be obtained by comparing the probability to a threshold • These boundaries will be lines (hyperplanes in higher dimensions) • The sigmoid perceptron is a linear classifier 28

Estimating the model y x 1   P ( y x ) f ( x )     ( w w x ) 1 e  • Given the training data (many pairs represented by the dots), estimate and for the curve 29

Estimating the model y x • Easier to represent using a y = +1/-1 notation 1 1      P ( y 1 x ) P ( y 1 x )       ( w w x )  ( w w x ) 1 e 1 e   1  P ( y x )     y ( w w x ) 1 e  30

Estimating the model • Given: Training data • s are vectors, s are binary (0/1) class values • Total probability of data � � � � 31

Estimating the model • Likelihood � � � � • Log likelihood 32

Maximum Likelihood Estimate � � • Equals (note argmin rather than argmax) • Identical to minimizing the KL divergence between the desired output and actual output • Cannot be solved directly, needs gradient descent 33

So what about this one? x 2 • Non-linear classifiers..

First consider the separable case.. x 2 x 1 • When the net must learn to classify..

First consider the separable case.. x 2 x 1 x 2 x 1 • For a “sufficient” net

First consider the separable case.. x 2 x 1 x 2 x 1 • For a “sufficient” net • This final perceptron is a linear classifier

First consider the separable case.. ??? x 2 x 1 x 2 x 1 • For a “sufficient” net • This final perceptron is a linear classifier over the output of the penultimate layer

First consider the separable case.. � � y 2 x 1 x 2 y 1 • For perfect classification the output of the penultimate layer must be linearly separable

First consider the separable case.. � � y 2 x 1 x 2 y 1 • The rest of the network may be viewed as a transformation that transforms data from non-linear classes to linearly separable features – We can now attach any linear classifier above it for perfect classification – Need not be a perceptron – In fact, slapping on an SVM on top of the features may be more generalizable!

First consider the separable case.. � � y 2 x 1 x 2 y 1 • The rest of the network may be viewed as a transformation that transforms data from non-linear classes to linearly separable features – We can now attach any linear classifier above it for perfect classification – Need not be a perceptron – In fact, for binary classifiers an SVM on top of the features may be more generalizable!

Neural Networks Representations Learning in the net Problem: Given - PowerPoint PPT Presentation

Neural Networks Representations Learning in the net Problem: Given a collection of input-output pairs, learn the function Learning for classification x 2 x 1 When the net must learn to classify.. Learn the classification boundaries

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Universes and the limits of Martin-Lf type theory Michael Rathjen School of Mathematics

Constructive tractability of the Helmholtz problem Arthur G. Werschulz Henryk Wo zniakowski

Getting the most out of gravitational wave merger observations Frans Pretorius Princeton

Dynamic Index NAT as a Mobility Solution in OMNeT++ Atheer Al-Rubaye, Jochen Seitz Communication

Building Bridges, What resources do you think are needed to One Reviewer at a Time mentor

Recap: Strategic Manipulation We had seen two theorems that show that we cannot rule out strategic

Coherent backscattering in the Fock space of a disordered Bose-Hubbard system Peter Schlagheck

Purpose of Session Clarify intent of the Choosing Wisely recommendation against routine annual