Attention-based Encoder-Decoder Networks NLP challenges Methods - PowerPoint PPT Presentation

Sina Ahmadi Introduction Problem definition Background Probabilistic techniques Neural Networks Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical Error Correction RNN BRNN Seq2seq Attention mechanism Experiments Sina Ahmadi Qualitative comparison Conclusion Paris Descartes University and future work sina.ahmadi@etu.parisdescartes.fr Conclusion Future studies References September 6, 2017 Questions

Overview Sina Ahmadi 1 Introduction Introduction Problem definition Problem definition 2 Background Background Probabilistic techniques Probabilistic techniques Neural Networks Neural Networks NLP challenges NLP challenges Methods 3 Methods RNN BRNN RNN Seq2seq Attention mechanism BRNN Experiments Seq2seq Qualitative comparison Attention mechanism Conclusion and future 4 Experiments work Qualitative comparison Conclusion Future studies 5 Conclusion and future work References Conclusion Questions Future studies 6 References 7 Questions

Introduction Sina Ahmadi Introduction Problem Automatic spelling and grammar correction is the task of auto- definition Background matically correcting errors in written text. Probabilistic techniques Neural Networks • This cake is basicly sugar, butter, and flour. [ → basically] NLP challenges Methods • We went to the store and bought new stove . [ → a new stove] RNN BRNN • i ’m entirely awake. [ → { I, wide } ] Seq2seq Attention mechanism The ability to correct errors accurately will improve Experiments Qualitative • the reliability of the underlying applications comparison Conclusion • the construction of software to help foreign language and future work learning Conclusion Future studies • to reduce noise in the entry of NLP tools References • better processing of unedited texts on the Web. Questions

Problem definition Sina Ahmadi Introduction Problem definition Background Given a N -character source sentence S = s 1 , s 2 , ..., s N with its Probabilistic techniques reference sentence T = t 1 , t 2 , ..., t M , we define an error correc- Neural Networks NLP challenges tion system as: Methods RNN Definition BRNN Seq2seq Attention mechanism � T = MC ( S ) (1) Experiments Qualitative where � comparison T is a correction hypothesis. Conclusion and future work Conclusion Future studies Question : How can the MC function can be modeled? References Questions

Background Sina Ahmadi Introduction Problem definition Various algorithms propose different approaches: Background Probabilistic techniques • Error detection : involves determining whether an input Neural Networks NLP challenges word has an equivalence relation with a word in the Methods dictionary. RNN BRNN • Dictionary lookup Seq2seq • n-gram analysis Attention mechanism • Error correction : refers to the attempt to endow spell Experiments Qualitative checkers with the ability to correct detected errors. comparison Conclusion • Minimum edit distance technique and future • Similarity key technique work Conclusion • Rule-based techniques Future studies • Probabilistic techniques References Questions

Probabilistic techniques Sina Ahmadi Introduction Problem definition We assume the task of error correction as a type of monolingual Background Probabilistic machine translation where the source sentence is potentially er- techniques Neural Networks roneous and the target sentence should be the corrected form of NLP challenges the input. Methods RNN BRNN Aim Seq2seq Attention To create a probabilistic model in such a way that: mechanism Experiments Qualitative comparison � T = argmax P ( T | S ; θ ) (2) Conclusion T and future work where θ is the parameters of the model. Conclusion Future studies This is called the Fundamental Equation of Machine Translation References [Smith, 2012]. Questions

Neural networks as a probabilistic model Sina Ahmadi • Mathematical model of the biological Introduction neural networks Problem definition inputs weights Background • Computes a single output from Probabilistic w 0 1 techniques multiple real-valued inputs: Neural Networks Activation x 1 w 1 NLP challenges function � � n Methods x 2 w 2 w i x i + b = W T x + b RNN z = (3) . . BRNN . . . . Seq2seq i =1 x n w n Attention mechanism • Putting the output into a non-linear Experiments x 1 Qualitative function: comparison x 2 Conclusion tanh ( z ) = e 2 z − 1 Output and future work x 3 (4) e 2 z + 1 Conclusion Future studies x 4 References • Back-propagates in order to minimize Questions Input Hidden Output the loss function H : layer layer layer θ ∗ = argmin H ( � y − y ) (5) θ

NLP challenges in Machine Translation Sina Ahmadi Introduction Problem definition Background Large input state spaces → word embedding Probabilistic techniques No upper limit on the number of words. Neural Networks NLP challenges Methods Long-term dependencies RNN BRNN Seq2seq • Constraints: He did not even think about himself . Attention mechanism • Selectional preferences: I ate salad with fork NOT rake. Experiments Qualitative comparison Conclusion Variable-length output sizes and future work Conclusion • This strucutre have anormality → 30 characters Future studies • This structure has an abnormality. → 34 characters References Questions

Recurrent Neural Network Sina Ahmadi Introduction Unlike a simple MLP, can make use of all the previous inputs. Problem definition Thus, it provides a memory-like functionality. Background Probabilistic techniques Neural Networks NLP challenges x 0 � x 1 � · · · x t − 1 � � x t · · · x n − 1 � x n � Inputs Methods U RNN BRNN W � � � � � � Seq2seq h 0 h 1 · · · h t − 1 h t · · · h n − 1 h n Internal States Attention mechanism V Experiments y 0 � � y 1 · · · � y t − 1 y t � · · · y n − 1 � y n � Qualitative Outputs comparison Conclusion and future work Conclusion h t = tanh ( Wx t + Uh t − 1 + b ) (6) Future studies References y t = softmax ( Vh t ) = � (7) Questions W , U an V are the parameters of our network we want to learn.

Bidirectional Recurrent Neural Network Sina Ahmadi We can use two RNN models; one that reads through the in- Introduction Problem put sequence forwards and the other backwards, both with two definition Background different hidden units but connected to the same output. Probabilistic techniques Neural Networks NLP challenges � y t − 1 y t +1 � y t +2 � � y t Methods RNN BRNN ← − ← − ← − ← − . . . s t − 1 s t s t +1 s t +2 Backward states . . . Seq2seq Attention mechanism Experiments . . . � s t − 1 s t � s t +1 � � s t +2 . . . Forward states Qualitative comparison Conclusion and future . . . � x t − 1 � x t x t +1 � x t +2 � . . . work Conclusion Future studies References h t = tanh ( � � W x t + � U � h t − 1 + � b ) (8) Questions ← h t = tanh ( ← − W x t + ← − U ← − − h t − 1 + ← − b ) (9)

Sequence-to-sequence models Sina Ahmadi The sequence-to-sequence model is composed of two processes Introduction Problem : encoding and decoding . definition Background o 1 o 2 o 3 Probabilistic techniques softmax softmax softmax Neural Networks NLP challenges Methods RNN s 1 s 2 s 3 BRNN Seq2seq h 1 h 2 Attention mechanism h 3 = tanh ( Wx 2 + U h 2 + b ) embedding x 1 embedding x 2 embedding x 3 Experiments Qualitative comparison Conclusion and future x 1 x 2 x 3 work Conclusion Future studies References h t = RNN ( x t , h t − 1 ) (10) Questions c = tanh ( h T ) (11) where h t is a hidden state at time t , and c is the context vector of the hidden layers of the encoder.

Attention mechanism Sina Ahmadi The attention mechanism calculates a new vector c t for the out- Introduction put word y t at the decoding step t . Problem definition Background T exp ( e ij ) � Probabilistic techniques α ij = (13) � T c t = a tj h j 2mm Neural Networks k =1 exp ( e ik ) NLP challenges j =1 Methods e ij = attentionScore ( s i − 1 , h j ) (14) (12) RNN BRNN o 1 o 2 o 3 Seq2seq Attention softmax softmax softmax mechanism Experiments Qualitative comparison attention s 1 s 2 s 3 Conclusion and future h 1 h 2 work Conclusion h 3 = tanh ( Wx 2 + U h 2 + b ) embedding x 1 embedding x 2 embedding x 3 Future studies References Questions x 1 x 2 x 3 where h j is the hidden state of the word x j , and a tj is the weight of h j for predicting y t . This vector is also called attention vector .

Experiments Sina Ahmadi Introduction Various metrics are used to evaluate the correction models, in- Problem definition cluding MaxMatch M 2 [Dahlmeier, 2012], I-measure [Felice, 2015], Background Probabilistic BLEU and GLEU [Napoles, 2015]. techniques Neural Networks NLP challenges M 2 scorer Methods Model RNN P R F 0 . 5 BRNN Seq2seq Baseline 1.0000 0.0000 0.0000 Attention mechanism RNN 0.5397 0.2487 0.4373 Experiments Qualitative BiRNN 0.5544 0.2943 0.4711 comparison Conclusion Encoder-decoder 0.5835 0.3249 0.5034 and future work Attention 0.5132 0.2132 0.4155 Conclusion Future studies Table: Evaluation results of the models using MaxMatch M 2 metric. References Bold numbers indicate the scores of the best model. Questions

Attention-based Encoder-Decoder Networks NLP challenges Methods - PowerPoint PPT Presentation

Sina Ahmadi Introduction Problem definition Background Probabilistic techniques Neural Networks Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical Error Correction RNN BRNN Seq2seq Attention

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

Contents PRO-Decoder Function Methods Results Abstract Experiment Computer RBS-Decoder

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Digital Design Disc: RTL Combinatorial Components 2-to-4 Decoder 4-to-16 Decoder 8-bit Shifter

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

Modeling language as a sequence of tokens CMSC 470 Marine Carpuat Beyond MT: Encoder-Decoder

and Sequential Programs 01204111 Computers and Programmin ing Inti In tira raporn rn Mula

Prescriptive versus Descriptive Prescriptive (largely proscriptive): old-school grammar; mostly

(3) (1) (2) 2 TENSE STUDY VOCABULARY TENSE STUDY IMMEDIATE FUTURE USAGE

constructing aspect-based sentiment lexicons with topic modeling . 1 Kazan (Volga Region) Federal

Governance, Principles, & Protocols AfriSIG 12 October, 2016 Durban.za Avri Doria 12

Sums of Money MDM4U: Mathematics of Data Management How many different sums of money can be made

Announcements Celebrating Earth Day Consider how the wildflowers grow. (Luke 12:27) Check In

D imensionality reduction facilitates the finds the directions of greatest variance in the

Attention-based Encoder-Decoder Networks NLP challenges Methods - PowerPoint PPT Presentation

Sina Ahmadi Introduction Problem definition Background Probabilistic techniques Neural Networks Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical Error Correction RNN BRNN Seq2seq Attention

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

The Attention Mechanism &amp; Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Image and Video Coding: Introduction bitstream encoder decoder Motivation Image and Video

Image and Video Coding: Representation, Acquisition, Display ... 10011 ... encoder decoder

A Hierarchical Encoder-Decoder for Paragraph Summarization Farzaneh Mahdisoltani Department of

Contents PRO-Decoder Function Methods Results Abstract Experiment Computer RBS-Decoder

CS7015 (Deep Learning) : Lecture 16 Encoder Decoder Models, Attention Mechanism Mitesh M. Khapra

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Adaptive Multi-pass Decoder for Neural Machine Translation EMNLP 2018

Digital Design Disc: RTL Combinatorial Components 2-to-4 Decoder 4-to-16 Decoder 8-bit Shifter

An Efficient GPU-based An Efficient GPU-based LDPC Decoder for Long LDPC Decoder for Long

Modeling language as a sequence of tokens CMSC 470 Marine Carpuat Beyond MT: Encoder-Decoder

and Sequential Programs 01204111 Computers and Programmin ing Inti In tira raporn rn Mula

Prescriptive versus Descriptive Prescriptive (largely proscriptive): old-school grammar; mostly

(3) (1) (2) 2 TENSE STUDY VOCABULARY TENSE STUDY IMMEDIATE FUTURE USAGE

constructing aspect-based sentiment lexicons with topic modeling . 1 Kazan (Volga Region) Federal

Governance, Principles, &amp; Protocols AfriSIG 12 October, 2016 Durban.za Avri Doria 12

Sums of Money MDM4U: Mathematics of Data Management How many different sums of money can be made

Announcements Celebrating Earth Day Consider how the wildflowers grow. (Luke 12:27) Check In

D imensionality reduction facilitates the finds the directions of greatest variance in the

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Governance, Principles, & Protocols AfriSIG 12 October, 2016 Durban.za Avri Doria 12