Traitement automatique des langues : Fondements et applications - PowerPoint PPT Presentation

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 2016—2017

Introduction Machine learning for NLP • Standard approach: linear model trained over high-dimensional but very sparse feature vectors • Recently: non-linear neural networks over dense input vectors

Neural Network Architectures Feed-forward neural networks • Best known, standard neural network approach • Fully connected layers • Can be used as drop-in replacement for typical NLP classifiers

Convolutional neural network Introduction • Type of feedforward neural network • Certain layers are not fully connected but locally connected (convolutional layers, pooling layers) • same, local cues appear in different places in input (cfr. vision)

Convolutional neural network Intuition

Convolutional neural network Architecture

Convolutional neural network Encoding sentences How to represent variable number of features, e.g. words in a sentence, document? • Continuous Bag of Words (CBOW): sum embedding vectors of corresponding features • no ordering info (”not good quite bad” = ”not bad quite good”) • Convolutional layer • ’Sliding window’ approach that takes local structure into account • Combine individual windows to create vector of fixed size

Continuous bag of words Variable number of features • Feed-forward network assumes fixed dimensional input • How to represent variable number of features, e.g. words in a sentence, document? • Continuous Bag of Words (CBOW): sum embedding vectors of corresponding features

Convolutional neural network Convolutional layer for NLP • Goal: identify indicative local features (n-grams) in large structure, combine them into fixed size vector • Convolution: apply filter to each window (linear transformation + non-linear activation) • Pooling: combine by taking maximum

Convolutional neural networks Architecture for NLP

Neural Network Architectures Recurrent (+ recursive) neural networks • Handle structured data of arbitrary sizes • Recurrent networks for sequences • Recursive networks for trees

Recurrent neural network Introduction • CBOW: no ordering, no structure • CNN: improvement, but mostly local patterns • RNN: represent arbitrarily sized structured input as fixed-size vectors, paying attention to structured properties

Recurrent neural network Model • x 1 : input layer (current word) • a 1 : hidden layer of current timestep • a 0 : hidden layer of previous timestep • U , W and V : weights matrices • f ( · ) : element-wise activation function (sigmoid) • g ( · ) : softmax function to ensure probability distribution a 1 = f ( Ux 1 + Wa 0 ) (1) y 1 = g ( Va 1 ) (2)

Recurrent neural network Graphical representation

Recurrent neural network Training • Consider recurrent neural network as very deep neural network with shared parameters across computation • Backpropagation through time • What kind of supervision? • Acceptor: based on final state • Transducer: an output for each input (e.g. language modeling) • Encoder-decoder: one RNN to encode sequence into vector representation, another RNN to decode into sequence (e.g. machine translation)

Recurrent neural network Training: graphical representation

Recurrent neural network Multi-layer RNN • multiple layers of RNNs • input of next layer is output of RNN layer below it • Empirically shown to work better

Recurrent neural network Bi-directional RNN • Input sequence both forward and backward to different RNNs • Representation is concatenation of forward and backward state (A & A’) • Represent both history and future

Concrete RNN architectures Simple RNN

Concrete RNN architectures LSTM • Long short term memory networks • In practice, simple RNNs only able to remember narrow context (vanishing gradient) • LSTM: complex architecture able to capture long-term dependencies

Concrete RNN architectures LSTM

Concrete RNN architectures GRU • LSTM: effective, but complex, computationally expensive • GRU: cheaper alternative that works well in practice

Concrete RNN architectures GRU • reset gate ( r ): how much information from previous hidden state needs to be included (reset with current information?) • upgate gate ( z ): controls updates to hidden state (how much does hidden state need to be updated with current information?)

Recursive neural networks Introduction • Generalization of RNNs from sequences to (binary) trees • Linear transformation + non-linear activation function applied recursively throughout a tree • Useful for parsing

Application Image to caption generation

Application Neural machine translation

Application Neural dialogue generation (chatbot)

Software • Tensorflow • Python, C++ • http://www.tensorflow.org • Theano • Python • http://deeplearning.net/software/theano/ • Keras • Theano/tensorflow-based modular deep learning library • Lasagne • Theano-based deep learning library

Traitement automatique des langues : Fondements et applications - PowerPoint PPT Presentation

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 20162017 Introduction Machine learning for NLP Standard approach: linear model trained over

Traitement automatique des langues : Fondements et applications Cours 10 : Neural networks (1)

Association canadienne des professeurs de langues secondes Canadian Association of Second Language

Approche dapprentissage automatique pour lannotation automatique des vnements Dr. Rim

Panorama des mthodes de dtection et de traitement des anomalies Laure Berti-quille IRD

Rle des comorbidits psychiatriques dans le traitement et le pronostic des maladies

Fondements pour la v erification des syst` emes temps-r eel et concurrents Lecture 3

Block Ciphers and DES S-DES DES Details DES Design Other Ciphers CSS441: Security and

Data Encryption Standard Simplified-DES Details of DES DES in OpenSSL Cryptography DES in

Data Encryption Standard Simplified-DES Details of DES DES in OpenSSL Cryptography DES in

by 28 Octobre 2016 Gif sur Yvette

Plurilingual education and languages of education / Education plurilingue et langues de

Algorithmes de traitement dimage pour lestimation des caract eristiques locales de la

(Quelques) tendances et dfis du magntisme en dimensions rduites Fondements et applications

Emine rnek Schools Erasmus+ Project 2018-2020 Jouons aux langues et aux cultures

Approche inductive et dductive en langues secondes: processus, produit et perceptions Colloque:

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

Deriving Modular Recursion Schemes from Tree Automata Patrick Bahr University of Copenhagen,

Cooperation and Contribution of Riga Technical University Faculty of Power and Electrical

Extending the GC hardware Rob Reilink Extending the GC hardware Why? GC can be an embedded

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS 251 Fall 2019 CS 240 Spring 2020 Principles of Programming Languages Foundations of

VLSI Testing Fault Simulation Virendra Singh Associate Professor Computer Architecture and

Memory Testing 1 Memory Cells Per Chip 2 1 Test Time in Seconds (Memory Size n Bits, Memory

Data Stream Mining Albert Bifet @abifet Dagstuhl, 31 October 2017 Projects MOA (University