Deep Learning Applications in Natural Language Processing Jindich - PowerPoint PPT Presentation

Deep Learning Applications in Natural Language Processing Jindřich Libovický December 5, 2018 B4M36NLP Introduction to Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics unless otherwise stated

Outline Information Search Unsupervised Dictionary Induction Image Captioning Deep Learning Applications in Natural Language Processing 1/38

Information Search

Answer Span Selection Task: Find an answer for a question given question in a coherent text. http://demo.allennlp.org/machine-comprehension Deep Learning Applications in Natural Language Processing 2/38

Standard Dataset: SQuAD paragraphs, 500 articles) human doing the task) https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/ Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , pages 2383–2392, Austin, Texas, November 2016. Association for Computational Linguistics. URL https://aclweb.org/anthology/D16-1264 Deep Learning Applications in Natural Language Processing 3/38 • best articles from Wikipedia, of reasonable size (23k • crowd-sourced more than 100k question-answer pairs • complex quality testing (which got estimate of single

Method Overview …using your favourite architecture.d 2. Compute a similarity between all pairs of words in the text and in the question. 3. Collect all informations we have for each token. 4. Classify where the span is. Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention fmow for machine comprehension. CoRR , abs/1611.01603, 2016. URL http://arxiv.org/abs/1611.01603 Deep Learning Applications in Natural Language Processing 4/38 1. Get text and question representation from • pre-trained word embeddings • character-level CNN

Method Overview: Image Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. Bidirectional attention fmow for machine comprehension. CoRR , abs/1611.01603, 2016. URL http://arxiv.org/abs/1611.01603 Deep Learning Applications in Natural Language Processing 5/38

Representing Words informations (numbers, addresses) character embeddings of size 16 1D-convolution to 100 dimensions max-pooling Deep Learning Applications in Natural Language Processing 6/38 • pre-trained word embeddings • concatenate with trained character-level representations • character-level representaions allows searching for out-of-vocabulary structured

Contextual Embeddings Layer → one state per word Deep Learning Applications in Natural Language Processing 7/38 • process both question and context with bidirectional LSTM layer • parameters are shared → representaions share the space

Attention Flow 𝑣 1 𝑣 2 𝑣 3 𝑣 4 𝑣 5 Captures affjnity / similarity between pairs of question and context words. Deep Learning Applications in Natural Language Processing 8/38 context H ℎ 1 ℎ 2 ℎ 3 ℎ 4 ℎ 5 ℎ 6 ℎ 7 ℎ 8 ℎ 9 ℎ 10 ℎ 11 ℎ 12 query U S 𝑗𝑘 = w 𝑈 [ℎ 𝑗 , 𝑑 𝑘 , ℎ 𝑗 ⊙ 𝑑 𝑘 ]

Context-to-query Attention ̃ ̃ 𝑣 2 ̃ 𝑣 3 ̃ 𝑣 4 ̃ 𝑣 5 𝑣 6 𝑣 1 ̃ 𝑣 7 ̃ 𝑣 8 ̃ 𝑣 9 ̃ 𝑣 12 Deep Learning Applications in Natural Language Processing 𝑣 1 ̃ weighted sum softmax 𝑣 2 𝑣 3 𝑣 4 𝑣 5 softmax × softmax softmax softmax softmax softmax softmax softmax softmax softmax softmax 9/38 context H ℎ 1 ℎ 2 ℎ 3 ℎ 4 ℎ 5 ℎ 6 ℎ 7 ℎ 8 ℎ 9 ℎ 10 ℎ 11 ℎ 12 query U 𝑣 10 ̃ 𝑣 11 ̃

Query-to-Context Attention 𝑣 1 𝑣 2 𝑣 3 𝑣 4 𝑣 5 maximum × × × × × × × × × × × × weighted sum Deep Learning Applications in Natural Language Processing 10/38 context H ℎ 1 ℎ 2 ℎ 3 ℎ 4 ℎ 5 ℎ 6 ℎ 7 ℎ 8 ℎ 9 ℎ 10 ℎ 11 ℎ 12 query U

Modeling Layer Deep Learning Applications in Natural Language Processing 11/38 • concatenate: LSTM outputs for each context word , context-to-query-vectors • copy query-to-context vector to each of them • apply one non-linear layer and bidirectional LSTM

Output Layer 2. End-token probabilities: 3. At the end select the most probable span Deep Learning Applications in Natural Language Processing 12/38 1. Start-token probabilities: project each state to scalar → apply softmax over the context • Compute weighted averate using the start-token probablities → single vector • Concatenate the vector to each state • Project states to scalar, renormalize with softmax

Method Overview: Recap Deep Learning Applications in Natural Language Processing 13/38

Attention Analysis (1) Deep Learning Applications in Natural Language Processing 14/38

Attention Analysis (2) Deep Learning Applications in Natural Language Processing 15/38

Make it 100 × Faster! Replace LSTMs by dilated convolutions. Deep Learning Applications in Natural Language Processing 16/38

Convolutional Blocks Deep Learning Applications in Natural Language Processing 17/38

Using Pre-Trained Representations Just replace the contextual embeddings wiht ELMo or BERT… Deep Learning Applications in Natural Language Processing 18/38

SQuAD Leaderboard BiDAF with ELMo Deep Learning Applications in Natural Language Processing 81.525 73.744 BiDAF trained from scratch 87.432 81.003 93.160 method 87.433 BiDAF with BERT 91.221 82.304 Human performacne F1 Score Exact Match 19/38

Unsupervised Dictionary Induction

Unsupervised Bilingual Dictionary Task: Get a translation dictionary between two languages using monolignual data only. We will approach: Mikel Artetxe, Gorka Labaka, and Eneko Agirre. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 789–798, Melbourne, Australia, July 2018. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P18-1073 Deep Learning Applications in Natural Language Processing 20/38 • makes NLP accessible for low-resourced languages • basic for unsupervised machine translation • hot research topic (at least 10 research papers on this topic this year)

How it is done 1. Train word embeddings on large monolignual corpora. 2. Find a mapping between the two languages. So far looks simple… Deep Learning Applications in Natural Language Processing 21/38

Dictionary and Common Projection 𝑌 , 𝑎 embedding matrices for 2 languages. Supervised projection between embeddings Given existing dictionary 𝐸 (small seed dictionary): argmax 𝑋 𝑎 ,𝑋 𝑌 ∑ 𝑗 ∑ 𝑘 𝑈 ) …but we need to fjnd all 𝐸 , 𝑋 𝑌 , and 𝑋 𝑎 . Deep Learning Applications in Natural Language Processing 22/38 Dictionary matrix 𝐸 𝑗𝑘 = 1 if 𝑌 𝑗 is translation of 𝑎 𝑘 . 𝐸 𝑗𝑘 ⋅ similarity (𝑌 𝑗∶ 𝑋 𝑌 , 𝑎 𝑘∶ 𝑋 𝑎 ) (𝑌 ∶𝑗 𝑋 𝑌 (𝑎 𝑘∶ 𝑋 𝑎 )

A Tiny Observation 𝑌𝑌 𝑈 Question: How would you interpret this matrix? It is a table of similarities between pairs of words. Deep Learning Applications in Natural Language Processing 23/38

If the Vocabularies were Isometric… same vectors Let’s assume, it is true (at least approximately) 𝑘 (𝑁 𝑌 ) 𝑗,∶ (𝑁 𝑎 ) 𝑈 𝑘,∶ ] Assign nearest neighbor from the other language. ……in practice tragically bad but at least good initialization. Deep Learning Applications in Natural Language Processing 24/38 • 𝑁 𝑌 = 𝑌𝑌 𝑈 and 𝑁 𝑎 = 𝑎𝑎 𝑈 would only have permuted rows and columns • if we sorted values in each row of 𝑁 𝑌 and 𝑁 𝑎 , corresponding words would have the 𝐸 𝑗,∶ ← 1 [ argmin

Self-Learning Iterate until convergence: Deep Learning Applications in Natural Language Processing otherwise 0, if 𝑗 is nearest neighbor of 𝑘 or wise versa 2. Update dictionary matrix 𝐸 𝑈 ) 25/38 𝑘 ∑ 𝑗 ∑ 𝑋 𝑎 ,𝑋 𝑌 argmax 1. Optimize 𝑋 𝑎 and 𝑋 𝑌 , w.r.t to current dictionary 𝐸 𝑗𝑘 ⋅ (𝑌 ∶𝑗 𝑋 𝑌 (𝑎 𝑘∶ 𝑋 𝑎 ) 𝐸 𝑗𝑘 = {1,

Accuracy on Large Dictionary Deep Learning Applications in Natural Language Processing 26/38

Try it yourself! https://github.com/artetxem/vecmap python3 map_embeddings.py --unsupervised SRC.EMB TRG.EMB SRC_MAPPED.EMB TRG_MAPPED.EMB Deep Learning Applications in Natural Language Processing 27/38 • Pre-train monolingual word embeddings using FastText / Word2Vec • Install VecMap

Image Captioning

Image Captioning Task: Generate a caption in natural language given an image. Example: A group of people wearing snowshoes, and dressed for winter hiking, is standing in front of a building that looks like it’s made of blocks of ice. The people are quietly listening while the story of the ice cabin was explained to them. A group of people standing in front of an igloo. Several students waiting outside an igloo. Deep Learning Applications in Natural Language Processing 28/38

Deep Learning Solution 2. Use autoregressive decoder to generate the caption using the image representation. Deep Learning Applications in Natural Language Processing 29/38 1. Obtain pre-trained image representation.

Deep Learning Applications in Natural Language Processing Jindich - PowerPoint PPT Presentation

Deep Learning Applications in Natural Language Processing Jindich Libovick December 5, 2018 B4M36NLP Introduction to Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Deep learning for natural language processing Introduction to natural language processing

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Learning for Natural Language Processing (in 2 hours) Eneko Agirre

Advanced Lesson 30 Topic 30: Identifying similarities and differences in text .Reading

Programming in the small, medium, large You must be able to write itoa to be able to write

Programming With Data One-Slide Summary A list is a data structure , a way of storing and

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

Outline What is a Constraint ? What is a Global Constraint ? Examples of Global

An Experimentalists Overview of the role of Electron Correlations in the Thermoelectric

Suricata IDPS and Linux kernel . Leblond, G. Longo Stamus Networks February 10, 2016 .

Meta-Interpretive Learning of Logic Programs Stephen Muggleton Department of Computing Imperial

Deep Learning Applications in Natural Language Processing Jindich - PowerPoint PPT Presentation

Deep Learning Applications in Natural Language Processing Jindich Libovick December 5, 2018 B4M36NLP Introduction to Natural Language Processing Charles University Faculty of Mathematics and Physics Institute of Formal and Applied

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Deep learning for natural language processing Introduction to natural language processing

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Natural Language Processing with Deep Learning CS224N The Future of Deep Learning + NLP Kevin

Natural Language Processing Fall 2018 Frank Ferraro Natural language processing ITE 358

Natural Language Processing 1 Lecture 8: Compositional semantics and discourse processing Katia

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Learning for Natural Language Processing (in 2 hours) Eneko Agirre

Advanced Lesson 30 Topic 30: Identifying similarities and differences in text .Reading

Programming in the small, medium, large You must be able to write itoa to be able to write

Programming With Data One-Slide Summary A list is a data structure , a way of storing and

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

Outline What is a Constraint ? What is a Global Constraint ? Examples of Global

An Experimentalists Overview of the role of Electron Correlations in the Thermoelectric

Suricata IDPS and Linux kernel . Leblond, G. Longo Stamus Networks February 10, 2016 .

Meta-Interpretive Learning of Logic Programs Stephen Muggleton Department of Computing Imperial

Deep learning for natural language processing A short primer on deep learning Benoit Favre <