Attention-based Encoder-Decoder Networks NLP challenges Methods - - PowerPoint PPT Presentation

attention based encoder decoder networks
SMART_READER_LITE
LIVE PREVIEW

Attention-based Encoder-Decoder Networks NLP challenges Methods - - PowerPoint PPT Presentation

Sina Ahmadi Introduction Problem definition Background Probabilistic techniques Neural Networks Attention-based Encoder-Decoder Networks NLP challenges Methods for Spelling and Grammatical Error Correction RNN BRNN Seq2seq Attention


slide-1
SLIDE 1

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction

Sina Ahmadi

Paris Descartes University sina.ahmadi@etu.parisdescartes.fr

September 6, 2017

slide-2
SLIDE 2

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Overview

1 Introduction

Problem definition

2 Background

Probabilistic techniques Neural Networks NLP challenges

3 Methods

RNN BRNN Seq2seq Attention mechanism

4 Experiments

Qualitative comparison

5 Conclusion and future work

Conclusion Future studies

6 References 7 Questions

slide-3
SLIDE 3

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Introduction

Automatic spelling and grammar correction is the task of auto- matically correcting errors in written text.

  • This cake is basicly sugar, butter, and flour. [→ basically]
  • We went to the store and bought new stove. [→ a new stove]
  • i’m entirely awake. [→ {I, wide}]

The ability to correct errors accurately will improve

  • the reliability of the underlying applications
  • the construction of software to help foreign language

learning

  • to reduce noise in the entry of NLP tools
  • better processing of unedited texts on the Web.
slide-4
SLIDE 4

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Problem definition

Given a N-character source sentence S = s1, s2, ..., sN with its reference sentence T = t1, t2, ..., tM, we define an error correc- tion system as:

Definition

  • T = MC(S)

(1) where T is a correction hypothesis. Question: How can the MC function can be modeled?

slide-5
SLIDE 5

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Background

Various algorithms propose different approaches:

  • Error detection: involves determining whether an input

word has an equivalence relation with a word in the dictionary.

  • Dictionary lookup
  • n-gram analysis
  • Error correction: refers to the attempt to endow spell

checkers with the ability to correct detected errors.

  • Minimum edit distance technique
  • Similarity key technique
  • Rule-based techniques
  • Probabilistic techniques
slide-6
SLIDE 6

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Probabilistic techniques

We assume the task of error correction as a type of monolingual machine translation where the source sentence is potentially er- roneous and the target sentence should be the corrected form of the input.

Aim

To create a probabilistic model in such a way that:

  • T = argmax

T

P(T|S; θ) (2) where θ is the parameters of the model. This is called the Fundamental Equation of Machine Translation [Smith, 2012].

slide-7
SLIDE 7

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Neural networks as a probabilistic model

Activation function

  • w2

x2 . . . . . . wn xn w1 x1 w0 1

inputs weights

x1 x2 x3 x4 Output Hidden layer Input layer Output layer

  • Mathematical model of the biological

neural networks

  • Computes a single output from

multiple real-valued inputs: z =

n

  • i=1

wixi + b = W Tx + b (3)

  • Putting the output into a non-linear

function: tanh(z) = e2z − 1 e2z + 1 (4)

  • Back-propagates in order to minimize

the loss function H: θ∗ = argmin

θ

H( y − y) (5)

slide-8
SLIDE 8

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

NLP challenges in Machine Translation

Large input state spaces → word embedding

No upper limit on the number of words.

Long-term dependencies

  • Constraints: He did not even think about himself.
  • Selectional preferences: I ate salad with fork NOT rake.

Variable-length output sizes

  • This strucutre have anormality → 30 characters
  • This structure has an abnormality. → 34 characters
slide-9
SLIDE 9

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Recurrent Neural Network

Unlike a simple MLP, can make use of all the previous inputs. Thus, it provides a memory-like functionality.

Inputs Internal States Outputs

  • x0
  • x1

· · ·

  • xt−1
  • xt

· · ·

  • xn−1
  • xn
  • h0
  • h1

· · ·

  • ht−1
  • ht

· · ·

  • hn−1
  • hn
  • y0
  • y1

· · ·

  • yt−1
  • yt

· · ·

  • yn−1
  • yn

U W V

ht = tanh(Wxt + Uht−1 + b) (6)

  • yt = softmax(Vht) =

(7) W , U an V are the parameters of our network we want to learn.

slide-10
SLIDE 10

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Bidirectional Recurrent Neural Network

We can use two RNN models; one that reads through the in- put sequence forwards and the other backwards, both with two different hidden units but connected to the same output.

. . .

  • st−1
  • st
  • st+1
  • st+2

. . . Forward states ← − s t+2 ← − s t+1 ← − s t ← − s t−1 Backward states . . . . . .

  • xt−1
  • xt
  • xt+1
  • xt+2
  • yt+2
  • yt+1
  • yt
  • yt−1

. . . . . .

  • ht = tanh(

W xt + U ht−1 + b) (8) ← − ht = tanh(← − W xt + ← − U ← − h t−1 + ← − b ) (9)

slide-11
SLIDE 11

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Sequence-to-sequence models

The sequence-to-sequence model is composed of two processes : encoding and decoding.

h3 = tanh(Wx2 + Uh2 + b) h2 h1 embedding x1 embedding x2 embedding x3 x1 x2 x3 softmax softmax softmax

  • 1
  • 2
  • 3

s1 s2 s3

ht = RNN(xt, ht−1) (10) c = tanh(hT) (11) where ht is a hidden state at time t, and c is the context vector

  • f the hidden layers of the encoder.
slide-12
SLIDE 12

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Attention mechanism

The attention mechanism calculates a new vector ct for the out- put word yt at the decoding step t. ct =

T

  • j=1

atjhj (12) 2mm αij = exp(eij) T

k=1 exp(eik)

(13) eij = attentionScore(si−1, hj) (14)

attention h3 = tanh(Wx2 + Uh2 + b) h2 h1 embedding x1 embedding x2 embedding x3 x1 x2 x3 softmax softmax softmax

  • 1
  • 2
  • 3

s1 s2 s3

where hj is the hidden state of the word xj, and atj is the weight

  • f hj for predicting yt. This vector is also called attention vector.
slide-13
SLIDE 13

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Experiments

Various metrics are used to evaluate the correction models, in- cluding MaxMatch M2 [Dahlmeier, 2012], I-measure [Felice, 2015], BLEU and GLEU [Napoles, 2015]. Model M2 scorer P R F0.5 Baseline 1.0000 0.0000 0.0000 RNN 0.5397 0.2487 0.4373 BiRNN 0.5544 0.2943 0.4711 Encoder-decoder 0.5835 0.3249 0.5034 Attention 0.5132 0.2132 0.4155

Table: Evaluation results of the models using MaxMatch M2 metric. Bold numbers indicate the scores of the best model.

slide-14
SLIDE 14

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Qualitative comparison

S

  • u

r c e s e n t e n c e 3 6 c h a r a c t e r s

لازا!لا!ريبكا!هحيبشلاا!نظيا!ناا!حاوراا!سانلاا!انلآوا!لقاا!هفلكا!كلذلفا!،ا!هبصنما!نعا!هيلختا!نما!اذاا!هذهبا!نوضتريا!لا!نويروسلاا!ناكا!هلداعملاا!هنيهملاا!مهيلعفا!،ا!ناا!اوبهيا!هبها!هيوقا! هدحاوا!وذخايوا!هذها!نما!مهقوقحا!هباصعلاا!هونعا!،ا!ا!انناأا!ايا!يئابحا!نمثا!عفدنا!رثكاا!نما!نيعبراا!اناعا!ا!ا!نموعونخلاا!تايحضتلاا!هذها!لهأتسيا!هنكلوا!ايلاغا!نوكيسا!نمثلاوا!لذلاوا!

G

  • l

d s t a n d a r d r e f e r e n c e 3 9 c h a r a c t e r s

ا!ةيوقا!ةبها!اوبهيا!نأا!مهيلعفا!،ا!ةنيهملاا!ةلداعملاا!هذهبا!نوضتريا!لا!نويروسلاا!ناكا!اذإا!كلذلفا!،ا!هبصنما!نعا!هيلختا!نما!ةفلكا!لقأا!سانلاا!انلآوا!حاورأا!نأا!نظيا!ةحيبشلاا!ريبكا!لازل ا!هذها!لهأتسيا!هنكلوا!ايلاغا!نوكيسا!نمثلاوا!لذلاوا!،ا!عونخلاا!نموا!،ا!اماعا!نيعبرأا!نما!رثكأا!نمثا!عفدنا!يئابحأا!ايا!اننإا!.ا!ةونعا!ةباصعلاا!هذها!نما!مهقوقحا!اوذخأيوا!ةدحاو تايحضتلاا!.

R e c u r r e n t n e u r a l n e t w

  • r

k m

  • d

e l p r e d i c t i

  • n

3 7 c h a r a c t e r s

لازا!لا!ريبكا!ةحيبشلاا!نظيا!ناا!حاوراا!سانلاا!انلآوا!لقاا!هفلكا!نما!هيلختا!كلذلفا!،ا!هبصنما!نعا!اذاا!هذهبا!نوضتريا!لا!نويروسلاا!ناكا!ةلداعملاا!هنيهملاا!مهيلعفا!،ا!ناا!اوبهيا!هبها!هيوقا! هدحاوا!واذذخايوا!مههوقحا!ننا!ههها!هبااعلاا!هونعا!ا!ا!ا!اننأا!ايا!يئابحأنمثا!عفدنا!رثكأا!ا!نما!نيعبرأا!اناع،ا!ا!نموا!عونخااا!،،ا!للاوا!نملاوا!ا!ننكيسا!اااااا!ننلوا!ا!ههههههتتسيا!يحضاا

B i d i r e c t i

  • n

a l r e c u r r e n t n e u r a l n e t w

  • r

k p r e d i c t i

  • n

3 7 c h a r a c t e r s

لازا!لا!ريبكا!ةحيبشلاا!نأا!نظيا!حاورأا!انلاوا!سانلاا!لقأا!هفلكا!نما!هيلختا!نعا!هبصنما!كلذلفا!،ا!اذإا!هذهبا!نوضتريا!لا!نويروسلاا!ناكا!ةلداعملاا!ةنيهملاا!مهيلعفا!،ا!نأا!اوبهيا!هبها!هيوقا! هدحاوا!وذخايوا!ا!ممممهقققا!ههها!ببصعلااا!ههوععا!ا!ا!ا!اننايا!يئئاحأأا!فدنا!ا!نما!ا!نممرثككا!نعبررا!ا!مماا!،ا!وا!نموعوونلاا!،ا!لذلاو،نثملاوا!اياغغننوكيسا!لههتسهههنككوا!ههها! تيحضتلا

S e q u e n c e

  • t
  • s

e q u e n c e m

  • d

e l p r e d i c t i

  • n

3 6 c h a r a c t e r s

لازا!لا!ريبكا!ةحيبشلاا!نظيا!نأا!حاورأا!انللوا!سانلاا!لقأا!هفلكا!كلذلفا!،ا!هبصنما!نعا!هيلختا!نما!اذإا!هذهبا!نوضتريا!لا!نويروسلاا!ناكا!هلداعملاا!ةنيهملاا!مهيلعفا!،ا!نأا!اوبهيا!هبها!هيوق هدحاوا!وذخايوا!هههننمممهقققحوا!بباصللاا!هونعا!ا!ا!اننأا!ا!ايا!عفدنييئابحا!ممثا!رككأا!ا!نا!ا!نعبررا!عا!ا!ا!ا!ا!ا!نما!وونخلاا!ا!ا!لللاوا!ننمللاوا!نوييا!يياااا!ااضتتااههذههلأهتسهههنكوو

A t t e n t i

  • n
  • b

a s e d s e q u e n c e

  • t
  • s

e q u e n c e m

  • d

e l p r e d i c t i

  • n

3 9 c h a r a c t e r s

لازا!لا!ريبكا!هحيبشلاا!نظيا!ناا!حاوراا!انللوا!سانلاا!لقاا!هفلكا!كلذلفا!،ا!هبصنما!نعا!هيلختا!نما!اذاا!هذهبا!نوضتريا!لا!نويروسلاا!ناكا!ةلداعملاا!ةنيهملاا!مهيلعفا!،ا!ناا!اوبهيا!هبها!هيوقا! هدحاوا!ذخايوا!مههقققحاوا!هذها!نما!ةباصععاا!هونعا!ا!ا!اانإا!ايا!يئاببحا!فدنا!ا!ا!نما!رثكأا!نما!نييبرأاناعا!،ا!ا!نموا!عوخخلا،ا!ا!لللاوا!نوكيسا!ا!ا!ا!نمثلاوا!ايلااا!هنككوا!هذذههههتتيا! يضضتلا

slide-15
SLIDE 15

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Conclusion and future work

Conclusion

  • Modeling correction error for any language.
  • Variant results using different metrics.
  • Reducing precision in correction of long sentences.

Future studies

  • Models to be explored in more levels, e.g., word-level,

phrase-level.

  • Limiting the length of the sequences in training models.
  • Using deeper networks with larger embedding size.
  • Preventing over-learning of models by not training them
  • ver correct input tokens (action =”OK”).
slide-16
SLIDE 16

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

References

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993) The mathematics of statistical machine translation: Parameter estimation. Computational linguistics 19(2), 263-311. Daniel Dahlmeier and Hwee Tou Ng (2012). Better evaluation for grammatical error correction. Association for Computational Linguistics 568-572. Mariano Felice and Ted Briscoe (2015). Towards a standard evaluation method for grammatical error detection and correction. HLT-NAACL 578-587. Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault (2015). Ground truth for grammatical error correction metrics. Association for Computational Linguistics 588-593.

slide-17
SLIDE 17

Sina Ahmadi Introduction

Problem definition

Background

Probabilistic techniques Neural Networks NLP challenges

Methods

RNN BRNN Seq2seq Attention mechanism

Experiments

Qualitative comparison

Conclusion and future work

Conclusion Future studies

References Questions

Questions?