attention based encoder decoder networks
play

Attention-based Encoder-Decoder Networks NLP challenges Methods - PowerPoint PPT Presentation

Sina Ahmadi Introduction Problem definition Background Probabilistic techniques Neural Networks Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical Error Correction RNN BRNN Seq2seq Attention


  1. Sina Ahmadi Introduction Problem definition Background Probabilistic techniques Neural Networks Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical Error Correction RNN BRNN Seq2seq Attention mechanism Experiments Sina Ahmadi Qualitative comparison Conclusion Paris Descartes University and future work sina.ahmadi@etu.parisdescartes.fr Conclusion Future studies References September 6, 2017 Questions

  2. Overview Sina Ahmadi 1 Introduction Introduction Problem definition Problem definition 2 Background Background Probabilistic techniques Probabilistic techniques Neural Networks Neural Networks NLP challenges NLP challenges Methods 3 Methods RNN BRNN RNN Seq2seq Attention mechanism BRNN Experiments Seq2seq Qualitative comparison Attention mechanism Conclusion and future 4 Experiments work Qualitative comparison Conclusion Future studies 5 Conclusion and future work References Conclusion Questions Future studies 6 References 7 Questions

  3. Introduction Sina Ahmadi Introduction Problem Automatic spelling and grammar correction is the task of auto- definition Background matically correcting errors in written text. Probabilistic techniques Neural Networks • This cake is basicly sugar, butter, and flour. [ → basically] NLP challenges Methods • We went to the store and bought new stove . [ → a new stove] RNN BRNN • i ’m entirely awake. [ → { I, wide } ] Seq2seq Attention mechanism The ability to correct errors accurately will improve Experiments Qualitative • the reliability of the underlying applications comparison Conclusion • the construction of software to help foreign language and future work learning Conclusion Future studies • to reduce noise in the entry of NLP tools References • better processing of unedited texts on the Web. Questions

  4. Problem definition Sina Ahmadi Introduction Problem definition Background Given a N -character source sentence S = s 1 , s 2 , ..., s N with its Probabilistic techniques reference sentence T = t 1 , t 2 , ..., t M , we define an error correc- Neural Networks NLP challenges tion system as: Methods RNN Definition BRNN Seq2seq Attention mechanism � T = MC ( S ) (1) Experiments Qualitative where � comparison T is a correction hypothesis. Conclusion and future work Conclusion Future studies Question : How can the MC function can be modeled? References Questions

  5. Background Sina Ahmadi Introduction Problem definition Various algorithms propose different approaches: Background Probabilistic techniques • Error detection : involves determining whether an input Neural Networks NLP challenges word has an equivalence relation with a word in the Methods dictionary. RNN BRNN • Dictionary lookup Seq2seq • n-gram analysis Attention mechanism • Error correction : refers to the attempt to endow spell Experiments Qualitative checkers with the ability to correct detected errors. comparison Conclusion • Minimum edit distance technique and future • Similarity key technique work Conclusion • Rule-based techniques Future studies • Probabilistic techniques References Questions

  6. Probabilistic techniques Sina Ahmadi Introduction Problem definition We assume the task of error correction as a type of monolingual Background Probabilistic machine translation where the source sentence is potentially er- techniques Neural Networks roneous and the target sentence should be the corrected form of NLP challenges the input. Methods RNN BRNN Aim Seq2seq Attention To create a probabilistic model in such a way that: mechanism Experiments Qualitative comparison � T = argmax P ( T | S ; θ ) (2) Conclusion T and future work where θ is the parameters of the model. Conclusion Future studies This is called the Fundamental Equation of Machine Translation References [Smith, 2012]. Questions

  7. Neural networks as a probabilistic model Sina Ahmadi • Mathematical model of the biological Introduction neural networks Problem definition inputs weights Background • Computes a single output from Probabilistic w 0 1 techniques multiple real-valued inputs: Neural Networks Activation x 1 w 1 NLP challenges function � � n Methods x 2 w 2 w i x i + b = W T x + b RNN z = (3) . . BRNN . . . . Seq2seq i =1 x n w n Attention mechanism • Putting the output into a non-linear Experiments x 1 Qualitative function: comparison x 2 Conclusion tanh ( z ) = e 2 z − 1 Output and future work x 3 (4) e 2 z + 1 Conclusion Future studies x 4 References • Back-propagates in order to minimize Questions Input Hidden Output the loss function H : layer layer layer θ ∗ = argmin H ( � y − y ) (5) θ

  8. NLP challenges in Machine Translation Sina Ahmadi Introduction Problem definition Background Large input state spaces → word embedding Probabilistic techniques No upper limit on the number of words. Neural Networks NLP challenges Methods Long-term dependencies RNN BRNN Seq2seq • Constraints: He did not even think about himself . Attention mechanism • Selectional preferences: I ate salad with fork NOT rake. Experiments Qualitative comparison Conclusion Variable-length output sizes and future work Conclusion • This strucutre have anormality → 30 characters Future studies • This structure has an abnormality. → 34 characters References Questions

  9. Recurrent Neural Network Sina Ahmadi Introduction Unlike a simple MLP, can make use of all the previous inputs. Problem definition Thus, it provides a memory-like functionality. Background Probabilistic techniques Neural Networks NLP challenges x 0 � x 1 � · · · x t − 1 � � x t · · · x n − 1 � x n � Inputs Methods U RNN BRNN W � � � � � � Seq2seq h 0 h 1 · · · h t − 1 h t · · · h n − 1 h n Internal States Attention mechanism V Experiments y 0 � � y 1 · · · � y t − 1 y t � · · · y n − 1 � y n � Qualitative Outputs comparison Conclusion and future work Conclusion h t = tanh ( Wx t + Uh t − 1 + b ) (6) Future studies References y t = softmax ( Vh t ) = � (7) Questions W , U an V are the parameters of our network we want to learn.

  10. Bidirectional Recurrent Neural Network Sina Ahmadi We can use two RNN models; one that reads through the in- Introduction Problem put sequence forwards and the other backwards, both with two definition Background different hidden units but connected to the same output. Probabilistic techniques Neural Networks NLP challenges � y t − 1 y t +1 � y t +2 � � y t Methods RNN BRNN ← − ← − ← − ← − . . . s t − 1 s t s t +1 s t +2 Backward states . . . Seq2seq Attention mechanism Experiments . . . � s t − 1 s t � s t +1 � � s t +2 . . . Forward states Qualitative comparison Conclusion and future . . . � x t − 1 � x t x t +1 � x t +2 � . . . work Conclusion Future studies References h t = tanh ( � � W x t + � U � h t − 1 + � b ) (8) Questions ← h t = tanh ( ← − W x t + ← − U ← − − h t − 1 + ← − b ) (9)

  11. Sequence-to-sequence models Sina Ahmadi The sequence-to-sequence model is composed of two processes Introduction Problem : encoding and decoding . definition Background o 1 o 2 o 3 Probabilistic techniques softmax softmax softmax Neural Networks NLP challenges Methods RNN s 1 s 2 s 3 BRNN Seq2seq h 1 h 2 Attention mechanism h 3 = tanh ( Wx 2 + U h 2 + b ) embedding x 1 embedding x 2 embedding x 3 Experiments Qualitative comparison Conclusion and future x 1 x 2 x 3 work Conclusion Future studies References h t = RNN ( x t , h t − 1 ) (10) Questions c = tanh ( h T ) (11) where h t is a hidden state at time t , and c is the context vector of the hidden layers of the encoder.

  12. Attention mechanism Sina Ahmadi The attention mechanism calculates a new vector c t for the out- Introduction put word y t at the decoding step t . Problem definition Background T exp ( e ij ) � Probabilistic techniques α ij = (13) � T c t = a tj h j 2mm Neural Networks k =1 exp ( e ik ) NLP challenges j =1 Methods e ij = attentionScore ( s i − 1 , h j ) (14) (12) RNN BRNN o 1 o 2 o 3 Seq2seq Attention softmax softmax softmax mechanism Experiments Qualitative comparison attention s 1 s 2 s 3 Conclusion and future h 1 h 2 work Conclusion h 3 = tanh ( Wx 2 + U h 2 + b ) embedding x 1 embedding x 2 embedding x 3 Future studies References Questions x 1 x 2 x 3 where h j is the hidden state of the word x j , and a tj is the weight of h j for predicting y t . This vector is also called attention vector .

  13. Experiments Sina Ahmadi Introduction Various metrics are used to evaluate the correction models, in- Problem definition cluding MaxMatch M 2 [Dahlmeier, 2012], I-measure [Felice, 2015], Background Probabilistic BLEU and GLEU [Napoles, 2015]. techniques Neural Networks NLP challenges M 2 scorer Methods Model RNN P R F 0 . 5 BRNN Seq2seq Baseline 1.0000 0.0000 0.0000 Attention mechanism RNN 0.5397 0.2487 0.4373 Experiments Qualitative BiRNN 0.5544 0.2943 0.4711 comparison Conclusion Encoder-decoder 0.5835 0.3249 0.5034 and future work Attention 0.5132 0.2132 0.4155 Conclusion Future studies Table: Evaluation results of the models using MaxMatch M 2 metric. References Bold numbers indicate the scores of the best model. Questions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend