Wh What do Neural Ma Machine Tr Translation Models Learn About Mo Morphology
Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan Sajjad James Glass
- Presented by Raghav Gurbaxani
FROM ACL 2017
Mo Morphology Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan - - PowerPoint PPT Presentation
Wh What do Neural Ma Machine Tr Translation Models Learn About Mo Morphology Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan Sajjad James Glass - Presented by Raghav Gurbaxani FROM ACL 2017 Mo Motivation In recent times Neural
Yonatan Belinkov Nadir Durrani Fahim Dalvi Hassan Sajjad James Glass
FROM ACL 2017
FROM ACL 2017
granularity
FROM ACL 2017
rich languages?
learning?
structure of a language?
FROM ACL 2017
FROM ACL 2017
FROM ACL 2017
FROM ACL 2017
1. Train a Neural Machine Translation System. 2. Extract feature representations using the trained model. 3. Train a classifier using the extracted model and evaluate it on an extrinsic task.
representations for a given task.
FROM ACL 2017
FROM ACL 2017
FROM ACL 2017
morphologically-rich languages, Arabic-, German-, French-, and Czech-English pairs (on both Encoder and Decoder sides).
TED talks made available for IWSLT 2016.
datasets and predicted tags used freely available taggers.
FROM ACL 2017
Statistics for annotated corpora in Arabic (Ar), German (De), French (Fr), and Czech (Cz)
1. Effect of word representation 2. Impact of word frequency 3. Effect of encoder depth 4. Effect of target language 5. Analyzing specific tags
FROM ACL 2017
FROM ACL 2017
FROM ACL 2017
FROM ACL 2017
POS and morphological tagging accuracy of word-based and character-based models per word frequency in the training data
FROM ACL 2017
FROM ACL 2017
across layers.
FROM ACL 2017
languages is challenging, translating into such languages is even harder.
translating into English are better than those learned translating into German, which are in turn better than those learned when translating into Hebrew.
FROM ACL 2017
Effect of target language on representation quality of the Arabic source.
similar misclassified tags (especially classifying nouns-NN, NNP).
(DT+NNP, DT+NNPS, DT+NNS, DT+VBG).
nouns and infrequent words.
DT+JJ, VBP, and even PUNC) tags.
FROM ACL 2017
Increase in POS accuracy with char-
tag frequency in the training set; larger bubbles reflect greater gaps.
system on the parallel corpus and use features are used to train a classifier
1. Effect of attention 2. Effect of word representation
achieves low POS tagging accuracy.
FROM ACL 2017
FROM ACL 2017
findings regarding word-based versus character-based representations on the decoder side.
improve the encoder, they do not help the
FROM ACL 2017
and char based encoder/decoder representations.
FROM ACL 2017
FROM ACL 2017