deep learning for natural language processing in 2 hours
play

Deep Learning for Natural Language Processing (in 2 hours) Eneko - PowerPoint PPT Presentation

Deep Learning for Natural Language Processing (in 2 hours) Eneko Agirre http://ixa2.si.ehu.eus/eneko IXA NLP group http://ixa.eus @eagirre Contents Introduction to Deep learning Deep Learning ~ Learning Representations Text as


  1. Representation learning MLP: What are we learning in W 0 when we backpropagate? h 0 x W 0 W 0 (one-hot) 1 y = softmax ( W 2 h 1 + b 2 ) → 0 h 1 = f ( W 1 h 0 + b 1 ) 0 h 0 = f ( W 0 x + b 0 ) 0 DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 50

  2. Representation learning Back-propagation allows Neural Networks to learn representations for words while training: word embeddings! h 0 x W 0 W 0 (one-hot) 1 y = softmax ( W 2 h 1 + b 2 ) → 0 h 1 = f ( W 1 h 0 + b 1 ) 0 h 0 = f ( W 0 x + b 0 ) 0 DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 51

  3. Representation learning ● Back-propagation allows Neural Networks to learn representations for words while training! – Word embeddings ! Continuous vector space instead of one-hot ● Are these word embeddings useful? ● Which task would be the best to learn embeddings that can be used in other tasks? ● Can we transfer this representation from one task to the other? ● Can we have all languages in one embedding space? DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 52

  4. Representation learning Representation learning and and word embeddings word embeddings DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 53

  5. Word embeddings judge ● Let’s represent words as vectors: driver policeman Similar words should have vectors cop Tallinn which are close to each other Cambridge Paris London WHY? ● If an AI has seen these two sequences I live in Cambridge I live in Paris ● … then which one should be more plausible? I live in Tallinn I live in policeman DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 54

  6. Word embeddings judge ● Let’s represent words as vectors: driver policeman Similar words should have vectors cop Tallinn which are close to each other Cambridge Paris London WHY? ● If an AI has seen these two sequences I live in Cambridge I live in Paris ● … then which one should be more plausible? I live in Tallinn OK I live in policeman DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 55

  7. judge driver policeman Word embeddings cop Tallinn Cambridge London Paris Distributional vector spaces: Option 1: use co-occurrence Option 2: learn low-rank matrix PPMI dense matrix directly ● 1A: large sparse matrix ● 2A: MLP on particular classification task ● 1B: factorize it and use low- ● 2B: find a general task rank dense matrix (SVD) ● DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 59

  8. judge driver policeman Word embeddings cop Tallinn Cambridge London Paris General task with large quantities of data: guess the missing word (language models) CBOW : given context guess middle word … people who keep pet dogs or cats exhibit better mental and physical health … SKIP-GRAM given middle word guess context … people who keep pet dogs or cats exhibit better mental and physical health … Proposed by Mikolov et al. (2013) - Word2vec DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 60

  9. CBOW Source: Mikolov et al. 2013 Like MLP, one layer, LARGE vocabulary LARGE number of classes! … people who keep pet The vocabulary dogs or cats exhibit better mental and physical health … Cross-entropy loss / Softmax expensive! Negative sampling : K E w n ∼ P noise log σ( h on ) J NEG w ( t ) = log σ( h 0 i )− ∑ k = 1 DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 62

  10. judge driver policeman Word embeddings cop Tallinn Cambridge London Paris Pre-trained word embeddings can leverage texts with BILLIONS of words!! Pre-trained word embeddings useful for: ● Word similarity ● Word analogy ● Other tasks like PoS tagging, NERC, sentiment analysis, etc. ● Initialize embedding layer in deep learning models DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 63

  11. judge driver policeman Word embeddings cop Tallinn Cambridge London Paris Word similarity Source: Collobert et al. 2011 DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 64

  12. judge driver policeman Word embeddings cop Tallinn Cambridge London Paris Word analogy: a is to b as c is to ? king ? man is to king as woman is to ? man woman DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 65

  13. judge driver policeman Word embeddings cop Tallinn Cambridge London Paris Word analogy: a is to b as c is to ? king man is to king as woman is to ? man queen woman a − b ≈ c − d d ≈ c − a + b argmax d ∈ V ( cos ( d ,c − a + b )) king − man + woman ≈ queen DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 66

  14. Word embeddings ● How to use embeddings in a given task (e.g. MLP sentiment analysis): – Learn them from scratch (random init.) – Initialize using pre-trained embeddings from some other task (e.g. word2vec) ● Other embeddings: – GloVe (Pennington et al. 2014) – Fastext (Mikolov et al. 2017) DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 68

  15. Recap ● Deep learning: learn representations for words ● Are they useful for anything? ● Which task would be the best to learn embeddings that can be used in other tasks? ● Can we transfer this representation from one task to the other? ● Can we have all languages in one embedding space? DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 70

  16. Superhuman abilities: Superhuman abilities: cross-lingual word cross-lingual word embeddings embeddings http://aclweb.org/anthology/P18-1073 http://aclweb.org/anthology/P18-1073 DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 71

  17. Contents ● Introduction to NLP – Deep Learning ~ Learning Representations ● Text as bag of words – Text Classification – Representation learning and word embeddings – Superhuman: xlingual word embeddings ● Text as a sequence: RNN – Sentence encoders – Machine Translation – Superhuman: unsupervised MT DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 120

  18. From words to sequences ● Representation for words: one vector for each word (word embeddings ) ● Representation for sequences of words: one vector for each sequence (?!) – Is it possible to represent a sentence in one vector at all? – Let’s go back to MLP DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 121

  19. From words to sequences MLP: What is h 0 with respect to the words in the input ? Add vectors of words in context (1’s in x), ● plus bias, apply non-linearity h 0 sentence representation f ( ∑ ⃗ w i + b 0 ) DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 122

  20. Sentence encoder A function: D w i ∈ℝ input a sequence of word embeddings ⃗ output a sentence representation D' s ∈ℝ ⃗ hidden layers s sentence representation ⃗ sentence encoder word embeddings w 1 w 2 w 3 ⃗ ⃗ ⃗ DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 123

  21. Sentence encoder Baseline: Continuous bag of words (pre-trained embeddings) h 1 = f ( W 1 h 0 + b 1 ) h 0 = ⃗ s s sentence representation ⃗ Σ word embeddings w 1 w 2 w 3 ⃗ ⃗ ⃗ DL4NLP in 2 hours - Eneko Agirre - Pavia 2019 124

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend