Why You Should Care About Byte-Level Seq2Seq Models in NLP South - - PowerPoint PPT Presentation

why you should care about byte level seq2seq models in nlp
SMART_READER_LITE
LIVE PREVIEW

Why You Should Care About Byte-Level Seq2Seq Models in NLP South - - PowerPoint PPT Presentation

Why You Should Care About Byte-Level Seq2Seq Models in NLP South England Natural Language Processing Meetup Alan Turing Institute Monday March 4, 2019 Tom Kenter TTS Research Google UK, London Based on internship at Google Research in


slide-1
SLIDE 1

Why You Should Care About Byte-Level Seq2Seq Models in NLP

South England Natural Language Processing Meetup Alan Turing Institute Monday March 4, 2019 Tom Kenter TTS Research Google UK, London

slide-2
SLIDE 2

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Based on internship at Google Research in Mountain View Byte-level Machine Reading across Morphologically Varied Languages Tom Kenter, Llion Jones, Daniel Hewlett Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018 https://ai.google/research/pubs/pub47437 Medium blogpost Why You Should Care About Byte-Level Sequence-to-Sequence Models in NLP

slide-3
SLIDE 3

Proprietary + Confidential

Is it advantageous, when processing morphologically rich languages, to use bytes rather than words as input and output to RNNs in a machine reading task?

slide-4
SLIDE 4

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Machine Reading

Computer reads text and has to answer questions about it. WikiReading datasets

  • English WikiReading dataset

(Hewlett, et al, ACL, 2016)

  • Two extra datasets — Russian and Turkish —

(Kenter et al, AAAI, 2018)

https://github.com/google-research-datasets/wiki-reading

slide-5
SLIDE 5

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Byte-level Machine Reading

W h e r e i s A m s t e r d a m I n N e t h e r l a n d s t h e

word-level

1 1 1 1

byte-level

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

slide-6
SLIDE 6

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Morphologically rich languages

Turkish kolay → easy kolaylaştırabiliriz → we can make it easier kolaylaştıramıyoruz → we cannot make it easier Russian В прошлом году Дмитрий переехал в Москву. Где теперь живет Дмитрий? В Москве.

slide-7
SLIDE 7

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

1 1 1 1

byte-level

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Small input vocabulary → small model size Longer unroll length for RNN ⟷ read less input Allows for apples-to-apples comparison between models Universal encoding scheme across languages No out-of-vocabulary problem

Why should you care about byte-level seq2seq models in NLP?

slide-8
SLIDE 8

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Models

Bidirectional RNN Multi-level RNN

slide-9
SLIDE 9

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Models

Hybrid word-byte model

slide-10
SLIDE 10

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Models

Convolutional recurrent

slide-11
SLIDE 11

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Models

Memory network/encoder-transformer-decoder Memory network Encoder-transformer-decoder

slide-12
SLIDE 12

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Results

slide-13
SLIDE 13

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Results

slide-14
SLIDE 14

Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Conclusions

Reading and outputting bytes, instead of words, works. Byte-level models provide an elegant way of dealing with the

  • ut-of-vocabulary problem.

Byte-level models perform on par with the state-of-the-art word-level model on English, and better on morphologically more involved languages. This is good news, as byte-level models have far fewer parameters.

slide-15
SLIDE 15

Are you interested in machine reading/question answering/NLU, and looking for a new challenge? Try your approach on 3 languages at once! WikiReading English & Russian & Turkish https://github.com/google-research-datasets/wiki-reading

slide-16
SLIDE 16

Thank you

Byte-level Machine Reading across Morphologically Varied Languages Tom Kenter, Llion Jones, Daniel Hewlett Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018 https://ai.google/research/pubs/pub47437 Medium blogpost Why You Should Care About Byte-Level Sequence-to-Sequence Models in NLP