statistical natural language processing recap perceptron
play

Statistical Natural Language Processing Recap: perceptron algorithm - PDF document

Statistical Natural Language Processing Recap: perceptron algorithm CNNs Deep ANNs ANNs Preliminaries 5 / 72 Summer Semester 2018 SfS / University of Tbingen . ltekin, linearly separable converge if classes are and sets an


  1. Statistical Natural Language Processing Recap: perceptron algorithm CNNs Deep ANNs ANNs Preliminaries 5 / 72 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, linearly separable converge if classes are and sets an misclassifjed example, minimizes the function Autoencoders Autoencoders Artifjcial Neural networks & deep learning CNNs Deep ANNs ANNs Preliminaries 4 / 72 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, . . . activation function . otherwise RNNs Recap: logistic regression where 1 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, 5 4 3 2 1 0 5 4 3 2 0 where Logistic regression is also a linear classifjer Autoencoders RNNs CNNs Deep ANNs ANNs Preliminaries 6 / 72 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, . . . if RNNs 7 / 72 ANNs do not mimic biological networks simple processing units Artifjcial and biological neural networks 1950’s – with some ups and downs Autoencoders ‘ deep learning ’ methods Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 RNNs CNNs 1 / 72 Preliminaries Deep ANNs Preliminaries SfS / University of Tübingen 2 / 72 Summer Semester 2018 SfS / University of Tübingen Ç. Çöltekin, Axon terminall Axon Soma Dendrite (showing a picture of a real neuron is mandatory in every ANN lecture) The biological neuron ANNs Deep ANNs CNNs RNNs Ç. Çöltekin, Autoencoders Summer Semester 2018 Deep ANNs Çağrı Çöltekin University of Tübingen Recap: the perceptron Seminar für Sprachwissenschaft Summer Semester 2018 Preliminaries Autoencoders ANNs Deep ANNs RNNs CNNs CNNs RNNs 3 / 72 Autoencoders ANNs Preliminaries Artifjcial neural networks models inspired by biological neural networks fjnding a global minimum of the error function • Artifjcial neural networks (ANNs) are machine learning • ANNs are powerful non-linear models • Power comes with a price: there are no guarantees of • ANNs have been used in ML, AI, Cognitive science since • Currently they are the driving force behind the popular • ANNs are inspired by biological neural networks • Similar to biological networks, ANNs are made of many • Despite the similarities, there are many difgerences: ANNs • ANNs are a practical statistical machine learning methods • Perceptron algorithm   x 0 = 1 m ∑   y = f w j x j ∑ w J ( w ) = max ( 0 , − wx i y i ) j x 1 w 0 i w • The online version picks 1 { x 2 y + 1 wx > 0 w 2 f ( x ) = − 1 w m w ← w + x i y i In ANN-speak f ( · ) is called an x m • Algorithm is guaranteed to x 0 = 1   m ∑ x 1 w 0   P ( y ) = f w j x j w x 2 j 0 1 = x 2 P ( y ) x 2 8 w 2 5 2 . + x 1 1 3 5 f ( x ) = 2 w m . − 1 + e − wx 1 1 p = 0 . 1 + e −( 0 . 1 − 2 . 53x1 + 2 . 58x2 ) x m x 1 Note: the decision boundary is wx = 0

  2. Preliminaries SfS / University of Tübingen . . . weighted sum of the inputs transformation non-linear activation Ç. Çöltekin, Summer Semester 2018 An artifjcial neuron 12 / 72 Preliminaries ANNs Deep ANNs CNNs RNNs Autoencoders Artifjcial neuron ANNs Autoencoders . Each unit takes a weighted sum of their input, Autoencoders Multi-layer perceptron the picture y Input Hidden Output and applies a (non-linear) activation function . RNNs Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 11 / 72 Preliminaries ANNs Deep ANNs CNNs an example . CNNs Autoencoders Summer Semester 2018 14 / 72 Preliminaries ANNs Deep ANNs CNNs RNNs Activation functions in ANNs Ç. Çöltekin, output units the task – For regression, identity function – For binary classifjcation, logistic sigmoid – For multi-class classifjcation, softmax Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 SfS / University of Tübingen Rectifjed linear unit (relu) . Preliminaries function is logistic sigmoid function becomes Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 13 / 72 ANNs Hyperbolic tangent (tanh) Deep ANNs CNNs RNNs Autoencoders Activation functions in ANNs hidden units (difgerentiable) functions Sigmoid (logistic) RNNs 15 / 72 Deep ANNs 1 if one can fjnd a linear Ç. Çöltekin, discriminator RNNs engineering example is the logical XOR problem 0 1 1 1 0 1 1 0 Linear separability 0 0 0 ANNs There is no line that can separate positive and negative classes. Ç. Çöltekin, SfS / University of Tübingen Summer Semester 2018 8 / 72 Preliminaries ANNs Deep ANNs CNNs Can a linear classifjer learn the XOR problem? said to be linearly separable Autoencoders Autoencoders Deep ANNs SfS / University of Tübingen activation function – It can be used for both regression and classifjcation perceptron-like units Ç. Çöltekin, multi-layer perceptron (MLP) Multi-layer perceptron Autoencoders RNNs CNNs ANNs Deep ANNs CNNs 10 / 72 Summer Semester 2018 9 / 72 Preliminaries SfS / University of Tübingen Preliminaries problems Summer Semester 2018 RNNs • We can use non-linear basis functions x 2 w 0 + w 1 x 1 + w 2 x 2 + w 3 φ ( x 1 , x 2 ) • A classifjcation problem is 1 + − is still linear in w for any choice of φ ( · ) • For example, adding the product x 1 x 2 as an additional feature would allow a solution like: x 1 + x 2 − 2x 1 x 2 • A well-known counter x 1 − + x 1 x 2 x 1 + x 2 − 2x 1 x 2 0 1 • Choosing proper basis functions like x 1 x 2 is called feature • The simplest modern ANN architecture is called x 1 • (MLP) is a fully connected , feed-forward network consisting of x 2 • Unlike perceptron, the units in an MLP use a continuous x 3 • The MLP can be trained using gradient-based methods x 4 • The MLP can represent many interesting machine learning • The unit calculates a x 0 = 1 x 0 = 1 • A common activation m ∑ w j x j = wx w 0 x 1 w 0 x 1 j w 1 1 w f ( x ) = • Result is a linear 1 ∑ 1 + e − x x 2 ∑ y f ( · ) x 2 y w 2 w 2 • The output of the network • Then the unit applies a w m w m function f ( · ) 1 y = • Output of the unit is 1 + e − wx x m x m y = f ( wx ) • The activation functions of the output units depends on • The activation functions in MLP are typically continuous • For hidden units common choices are 1 e wx 1 1 + e x P ( y = 1 | x ) = 1 + e − wx = 1 + e − wx e 2x − 1 e 2x + 1 e w k x P ( y = k | x ) = ∑ j e w j x max ( 0 , x )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend