semi supervised question retrieval with gated convolutions
play

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei - PowerPoint PPT Presentation

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti and Llus Mrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting


  1. Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, 
 Kateryna Tymoshenko, Alessandro Moschitti and Lluís Màrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  2. Our Task Find similar ques.ons given the user’s input ques.on title body question from Stack Exchange AskUbuntu QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 2 ‹#› ‹#›

  3. Our Task Find similar ques.ons given the user’s input ques.on user-marked similar question question from Stack Exchange AskUbuntu Our goal: automate this process as a solu.on for QA QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 3 ‹#› ‹#›

  4. Challenges • Mul.-sentence text contains irrelevant details Title: How can I boot Ubuntu from a USB ? Body: I bought a Compaq pc with Windows 8 a few months ago and now I want to install Ubuntu but still keep Windows 8. I tried Webi but when my pc restarts it read ERROR 0x000007b. I know that Windows 8 has a thing about not letting you have Ubuntu ... Title: When I want to install Ubuntu on my laptop I’ll have to erase all my data. “Alonge side windows” doesnt appear Body: I want to install Ubuntu from a Usb drive. It says I have to erase all my data but I want to install it along side Windows 8. The “Install alongside windows” option doesn’t appear … • Forum user annota.on is limited and noisy (more on this later) QCRI/MIT-CSAIL Annual Meeting – March 2015 4 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  5. Solution (1) a model to better represent the question text (2) semi-supervised training to leverage raw text data QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 5 ‹#› ‹#›

  6. Model Model Architecture*: Choice of encoder: LSTM, GRU, CNN … or: cosine similarity pooling pooling ⇣ ⌘ c (3) λ t � c (2) c (2) = + (1 � λ t ) � t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ t � c (2) c (1) = + (1 � λ t ) � t − 1 + W 2 x t encoder encoder t t − 1 c (1) λ t � c (1) = + (1 � λ t ) � ( W 1 x t ) t t − 1 tanh( c (3) h t = + b ) t question 1 question 2 question 1 question 2 Why this encoder (or equations)? How to understand it? *Other architectures possible: (Feng et. al. 2015), (Tan et. al. 2015) etc. QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 6 ‹#› ‹#›

  7. Sentence: “the movie is not that good” Neural Bag-of-words 
 Bag of words, TF-IDF (average embedding) not movie movie … + + + = good that good not is e t d o i v o n … o o m g QCRI/MIT-CSAIL Annual Meeting – March 2015 7 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  8. Sentence: “the movie is not that good” Ngram Kernel CNNs (N=2) not that the movie that good is not movie is … Neural methods as a dimension-reduction of traditional methods QCRI/MIT-CSAIL Annual Meeting – March 2015 8 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  9. Sentence: “the movie is not that good” String Kernel   0 not _ good the movie λ 0 the movie   is _ _ good  λ 2  is _ that   movie _ not   . .   . is not     not _ good not _ good λ 1 …   0 penalize skips bigger feature space λ ∈ (0 , 1) Neural model inspired by this kernel method ? QCRI/MIT-CSAIL Annual Meeting – March 2015 9 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  10. “string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 10 ‹#› ‹#›

  11. “string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 11 ‹#› ‹#›

  12. “string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 12 ‹#› ‹#›

  13. Formulas in the case of 3gram ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t QCRI/MIT-CSAIL Annual Meeting – March 2015 13 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  14. Formulas in the case of 3gram ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t penalize skip grams weighted average of 1grams (to 3grams) up to position t QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 14 ‹#› ‹#›

  15. Formulas ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t c (3) λ = 0 : = W 1 x t − 2 + W 2 x t − 1 + W 3 x t (one-layer CNN) t QCRI/MIT-CSAIL Annual Meeting – March 2015 15 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  16. Gated version ⇣ ⌘ c (3) λ t � c (2) c (2) = + (1 � λ t ) � t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ t � c (2) c (1) = + (1 � λ t ) � t − 1 + W 2 x t t − 1 t c (1) λ t � c (1) = + (1 � λ t ) � ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t σ ( Wx t + Uh t � 1 + b 0 ) λ t = adaptive decay controlled by gate QCRI/MIT-CSAIL Annual Meeting – March 2015 16 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  17. Training • Amount of annotation is scarce # of unique questions 167,765 # of marked questions 12,584 # of marked pairs 16,391 forum users only identify a few similar pairs only 10% of the number unique questions Ideally, want to use all questions available QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 17 ‹#› ‹#›

  18. Pre-training Encoder-Decoder Network Encoder trained to pull out important (summarized) information </ s > encoder decoder … … < s > encode question body/title re-generate question title Pre-training recently applied to classification task • Semi-supervised Sequence Learning. Dai and Le. 2015 QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 18 ‹#› ‹#›

  19. Evaluation Set-up Dataset: AskUbuntu 2014 dump pre-train on 167k, fine-tune on 16k evaluate using 8k pairs (50/50 split for dev/test) Baselines: TF-IDF , BM25 and SVM reranker CNNs, LSTMs and GRUs Grid-search: learning rate, dropout, pooling, filter size, pre-training, … 5 independent runs for each config. > 500 runs in total QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 19 ‹#› ‹#›

  20. Overall Results MAP MRR 75.6 71.4 71.3 70.1 68.0 62.3 59.3 57.6 56.8 56.0 BM25 LSTM CNN GRU Ours Our improvement is significant QCRI/MIT-CSAIL Annual Meeting – March 2015 20 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  21. Analysis MAP MRR P@1 75.6 72.9 70.7 62.3 62.0 60.7 59.1 58.2 56.6 full model w/o pretraining w/o body QCRI/MIT-CSAIL Annual Meeting – March 2015 21 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  22. Pre-training MRRs quite different PPLs are close MRR on the dev set versus Perplexity on a heldout corpus QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 22 ‹#› ‹#›

  23. Decay Factor (Neural Gate) ⇣ ⌘ c (3) = λ � c (3) c (2) t − 1 + (1 � λ ) � t − 1 + W 3 x t t Analyze the weight vector over time QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 23 ‹#› ‹#›

  24. Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 24 ‹#› ‹#›

  25. Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 25 ‹#› ‹#›

  26. Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 26 ‹#› ‹#›

  27. Conclusions • AskUbuntu data as a natural benchmark for retrieval and summarization tasks • Neural model with good intuition and understanding (e.g. attention) can potentially lead to good performance https://github.com/taolei87/askubuntu https://github.com/taolei87/rcnn QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 27 ‹#› ‹#›

  28. QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 28 ‹#› ‹#›

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend