Semi-supervised Question Retrieval with Gated Convolutions Tao Lei - PowerPoint PPT Presentation

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola,   Kateryna Tymoshenko, Alessandro Moschitti and Lluís Màrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Our Task Find similar ques.ons given the user’s input ques.on title body question from Stack Exchange AskUbuntu QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 2 ‹#› ‹#›

Our Task Find similar ques.ons given the user’s input ques.on user-marked similar question question from Stack Exchange AskUbuntu Our goal: automate this process as a solu.on for QA QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 3 ‹#› ‹#›

Challenges • Mul.-sentence text contains irrelevant details Title: How can I boot Ubuntu from a USB ? Body: I bought a Compaq pc with Windows 8 a few months ago and now I want to install Ubuntu but still keep Windows 8. I tried Webi but when my pc restarts it read ERROR 0x000007b. I know that Windows 8 has a thing about not letting you have Ubuntu ... Title: When I want to install Ubuntu on my laptop I’ll have to erase all my data. “Alonge side windows” doesnt appear Body: I want to install Ubuntu from a Usb drive. It says I have to erase all my data but I want to install it along side Windows 8. The “Install alongside windows” option doesn’t appear … • Forum user annota.on is limited and noisy (more on this later) QCRI/MIT-CSAIL Annual Meeting – March 2015 4 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Solution (1) a model to better represent the question text (2) semi-supervised training to leverage raw text data QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 5 ‹#› ‹#›

Model Model Architecture*: Choice of encoder: LSTM, GRU, CNN … or: cosine similarity pooling pooling ⇣ ⌘ c (3) λ t � c (2) c (2) = + (1 � λ t ) � t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ t � c (2) c (1) = + (1 � λ t ) � t − 1 + W 2 x t encoder encoder t t − 1 c (1) λ t � c (1) = + (1 � λ t ) � ( W 1 x t ) t t − 1 tanh( c (3) h t = + b ) t question 1 question 2 question 1 question 2 Why this encoder (or equations)? How to understand it? *Other architectures possible: (Feng et. al. 2015), (Tan et. al. 2015) etc. QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 6 ‹#› ‹#›

Sentence: “the movie is not that good” Neural Bag-of-words   Bag of words, TF-IDF (average embedding) not movie movie … + + + = good that good not is e t d o i v o n … o o m g QCRI/MIT-CSAIL Annual Meeting – March 2015 7 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Sentence: “the movie is not that good” Ngram Kernel CNNs (N=2) not that the movie that good is not movie is … Neural methods as a dimension-reduction of traditional methods QCRI/MIT-CSAIL Annual Meeting – March 2015 8 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Sentence: “the movie is not that good” String Kernel   0 not _ good the movie λ 0 the movie   is _ _ good  λ 2  is _ that   movie _ not   . .   . is not     not _ good not _ good λ 1 …   0 penalize skips bigger feature space λ ∈ (0 , 1) Neural model inspired by this kernel method ? QCRI/MIT-CSAIL Annual Meeting – March 2015 9 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

“string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 10 ‹#› ‹#›

Formulas in the case of 3gram ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t QCRI/MIT-CSAIL Annual Meeting – March 2015 13 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Formulas in the case of 3gram ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t penalize skip grams weighted average of 1grams (to 3grams) up to position t QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 14 ‹#› ‹#›

Formulas ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t c (3) λ = 0 : = W 1 x t − 2 + W 2 x t − 1 + W 3 x t (one-layer CNN) t QCRI/MIT-CSAIL Annual Meeting – March 2015 15 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Gated version ⇣ ⌘ c (3) λ t � c (2) c (2) = + (1 � λ t ) � t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ t � c (2) c (1) = + (1 � λ t ) � t − 1 + W 2 x t t − 1 t c (1) λ t � c (1) = + (1 � λ t ) � ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t σ ( Wx t + Uh t � 1 + b 0 ) λ t = adaptive decay controlled by gate QCRI/MIT-CSAIL Annual Meeting – March 2015 16 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Training • Amount of annotation is scarce # of unique questions 167,765 # of marked questions 12,584 # of marked pairs 16,391 forum users only identify a few similar pairs only 10% of the number unique questions Ideally, want to use all questions available QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 17 ‹#› ‹#›

Pre-training Encoder-Decoder Network Encoder trained to pull out important (summarized) information </ s > encoder decoder … … < s > encode question body/title re-generate question title Pre-training recently applied to classification task • Semi-supervised Sequence Learning. Dai and Le. 2015 QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 18 ‹#› ‹#›

Evaluation Set-up Dataset: AskUbuntu 2014 dump pre-train on 167k, fine-tune on 16k evaluate using 8k pairs (50/50 split for dev/test) Baselines: TF-IDF , BM25 and SVM reranker CNNs, LSTMs and GRUs Grid-search: learning rate, dropout, pooling, filter size, pre-training, … 5 independent runs for each config. > 500 runs in total QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 19 ‹#› ‹#›

Overall Results MAP MRR 75.6 71.4 71.3 70.1 68.0 62.3 59.3 57.6 56.8 56.0 BM25 LSTM CNN GRU Ours Our improvement is significant QCRI/MIT-CSAIL Annual Meeting – March 2015 20 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Analysis MAP MRR P@1 75.6 72.9 70.7 62.3 62.0 60.7 59.1 58.2 56.6 full model w/o pretraining w/o body QCRI/MIT-CSAIL Annual Meeting – March 2015 21 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

Pre-training MRRs quite different PPLs are close MRR on the dev set versus Perplexity on a heldout corpus QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 22 ‹#› ‹#›

Decay Factor (Neural Gate) ⇣ ⌘ c (3) = λ � c (3) c (2) t − 1 + (1 � λ ) � t − 1 + W 3 x t t Analyze the weight vector over time QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 23 ‹#› ‹#›

Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 24 ‹#› ‹#›

Conclusions • AskUbuntu data as a natural benchmark for retrieval and summarization tasks • Neural model with good intuition and understanding (e.g. attention) can potentially lead to good performance https://github.com/taolei87/askubuntu https://github.com/taolei87/rcnn QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 27 ‹#› ‹#›

QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 28 ‹#› ‹#›

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei - PowerPoint PPT Presentation

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti and Llus Mrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

DVDNet Deep Blind Video Decaptioning with 3D-2D Gated Convolutions Dahun Kim, Sanghyun Woo,

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

1 Mammalian Neurons Have Several Types of Voltage-Gated Ion Channels Why do neurons need so many

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieving Target Gestures Toward Speech Driven Animation with Meaningful Behaviors N AJMEH S

Storage and Retrieval Cycle A storage and retrieval (S/R) cycle is one complete roundtrip from

Goals Advance math-aware search Advance semantic analysis of mathematical notation and

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng)

XML Out-Of-Band Data Retrieval Timur Yunusov Alexey

Digital preservation at Wellcome Alex Chan ~ a.chan@wellcome.ac.uk ~ they/them Senior

Utilizing Knowledge Bases for Text Retrieval: A Wishlist for Text Retrieval: A Wishlist

Information Retrieval Evaluation (COSC 488) Nazli Goharian nazli@cs.georgetown.edu @ Goharian,

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei - PowerPoint PPT Presentation

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti and Llus Mrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting

Range gated cameras technology and its applications Friday, October 10, 14 Range gated cameras

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN &amp; Gated RNN

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

DVDNet Deep Blind Video Decaptioning with 3D-2D Gated Convolutions Dahun Kim*, Sanghyun Woo*,

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Iterative Hybrid Algorithm for Semi-supervised Classification Martin SAVESKI Supervised by

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

1 Mammalian Neurons Have Several Types of Voltage-Gated Ion Channels Why do neurons need so many

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieving Target Gestures Toward Speech Driven Animation with Meaningful Behaviors N AJMEH S

Storage and Retrieval Cycle A storage and retrieval (S/R) cycle is one complete roundtrip from

Goals Advance math-aware search Advance semantic analysis of mathematical notation and

Axiomatic Analysis and Optimization of Information Retrieval Models ChengXiang (Cheng)

XML Out-Of-Band Data Retrieval Timur Yunusov Alexey

Digital preservation at Wellcome Alex Chan ~ a.chan@wellcome.ac.uk ~ they/them Senior

Utilizing Knowledge Bases for Text Retrieval: A Wishlist for Text Retrieval: A Wishlist

Information Retrieval Evaluation (COSC 488) Nazli Goharian nazli@cs.georgetown.edu @ Goharian,

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

DVDNet Deep Blind Video Decaptioning with 3D-2D Gated Convolutions Dahun Kim, Sanghyun Woo,