deep learning research for nlp

Deep Learning Research for NLP Graham Neubig Language Processing - PowerPoint PPT Presentation

LTI Orientation Deep Learning Research for NLP Graham Neubig Language Processing Mary prevents Peter from scoring a goal. John passes the ball upfield to Peter, who shoots for the goal. The shot is deflected by Mary and the ball goes


  1. LTI Orientation Deep Learning Research 
 for NLP Graham Neubig

  2. Language Processing Mary prevents Peter from scoring a goal. John passes the ball upfield to Peter, who shoots for the goal. The shot is deflected by Mary and the ball goes out of bounds.

  3. Structured Prediction X Y

  4. Supervised Learning X X Y Y

  5. Supervised Learning w/ Neural Nets Learning θ X X Y Y

  6. Structured Prediction w/ Neural Nets

  7. Neural Structured Prediction w Y Model X Loss

  8. Neural Structured Prediction w Y Model X Search Loss

  9. The Problem of Discrete Decisions y i ( w ) = ‘dog’ ˆ y i ( w ) = ‘the’ ˆ y i ( w ) = ‘cat’ ˆ g ( w ) w

  10. Soft Search [Goyal+18] 
 (Faculty: Neubig) y i = ‘the’ ˆ argmax { e ( dog ) s i ( dog ) peaked softmax { e ( y ) · exp[ α · s i ( y )] e ( the ) X s i ( the ) Z y e ( cat ) α -soft argmax s i ( cat ) ˜ e i h i +1 h i

  11. Smoothed Surface y i ( w ) = ‘dog’ ˆ y i ( w ) = ‘the’ ˆ y i ( w ) = ‘cat’ ˆ α = 10 α = 1 g ( w ) w

  12. Prediction over Word Embedding Space [Kumar+18] 
 (Faculty: Tsvetkov)

  13. Structured Modeling w/ Neural Nets

  14. Why Structured Neural Nets • In pre-neural NLP we did feature engineering to capture the salient features of text • Now, neural nets capture features for us • But given too much freedom they will not learn or overfit • So we do architecture engineering to add bias

  15. Structure in Language Words Sentences S VP VP PP NP Alice gave a message to Bob Phrases Documents This film was completely unbelievable. The characters were wooden and the plot was absurd. That being said, I liked it.

  16. BiLSTM Conditional Random Fields [Ma+15] 
 (Faculty: Hovy) • Add an additional layer that ensures consistency between tags <s> I hate this movie <s> PRP VBP DT NN • Training and prediction use dynamic programming

  17. Neural Factor Graph Models [Malaviya+18] (Faculty: Gormley, Neubig) • Problem: Neural CRFs can only handle single tag/word • Idea: Expand to multiple tags using graphical models

  18. Stack LSTM [Dyer+15] (Faculty: Dyer (now DeepMind)) RED-L(amod) SHIFT … SHIFT REDUCE_L REDUCE_R S B p t } {z | } {z | TOP TOP amod an decision was made ∅ root overhasty TOP | REDUCE-LEFT(amod) Compositional {z Representation A SHIFT } …

  19. Morphological Language Models [Matthews+18] 
 (Faculty: Neubig, Dyer) • Problem: Language modeling for morphologically rich languages is hard • Idea: Specifically decompose input and output using morphological structure

  20. Neural-Symbolic Integration

  21. Neural-Symbolic Hybrids • Neural and symbolic models better at different things • Neural: smoothing over differences using similarity • Symbolic: remembering individual single-shot events • How can we combine the two?

  22. Discrete Lexicons in Neural Seq2seq [Arthur+15] (Faculty: Neubig)

  23. NNs + Logic Rules [Hu+16] (Faculty: Hovy, Xing) • Problem: It is difficult to explicitly incorporate knowledge into neural-net-based models • Idea: Use logical rules to constrain space of predicted probabilities

  24. Latent Variable Models

  25. Latent Variable Models X Y

  26. Latent Variable Models X X Z Y ? Z ? ? Y

  27. Neural Latent Variable Models Z X

  28. Generating Text from Latent Space Z X X

  29. Example: Discourse Level Modeling with VAE [Zhao+17] (Faculty: Eskenazi) • Use latent variable as a way to represent entire discourse in dialog

  30. Handling Discrete Latent Variables [Zhou+17] (Faculty: Neubig)

  31. Structured Latent Variables [Yin+18] (Faculty: Neubig) • Problem: Paucity of training data for structured prediction problems • Idea: Treat the structure as a latent variable in a VAE model

  32. Better Learning Algorithms for Latent Variable Models [He+2019] (Faculty: Neubig) • Problem: When learning latent variable models, predicting the latent variables can be difficult • Solution: Perform aggressive update of the part of the model that predicts these variables

  33. Any Questions?

Recommend


More recommend