Outline Convolutional Neural Network Architectures for Matching - PowerPoint PPT Presentation

Outline Hu, NIPS’14 Irsoy, NIPS’14 Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS’14 Convolutional Sentence Model Convolutional Matching Models Experiments Deep Recursive Neural Networks for Compositionality in Language. NIPS’14 Deep Recursive Neural Networks Experiments LU Yangyang luyy11@sei.pku.edu.cn Jan. 14, 2015

Outline Hu, NIPS’14 Irsoy, NIPS’14 Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS’14 Convolutional Sentence Model Convolutional Matching Models Experiments Deep Recursive Neural Networks for Compositionality in Language. NIPS’14

Outline Hu, NIPS’14 Irsoy, NIPS’14 Authors • Convolutional Neural Network Architectures for Matching Natural Language Sentences • NIPS’14 • Baotian Hu 1 , Zhengdong Lu 2 , Hang Li 2 , and Qingcai Chen 1 1 Harbin Institute of Technology, Shenzhen Graduate School 2 Noah’s Ark Lab, Huawei Technologies Co. Ltd.

Outline Hu, NIPS’14 Irsoy, NIPS’14 Introduction Matching two potentially heterogenous language objects: • to model the correspondence between “linguistic objects” of different nature at different levels of abstractions • generalizes the conventional notion of similarity or relevance • related tasks: top- k re-ranking in machine translation, dialogue, paraphrase identification

Outline Hu, NIPS’14 Irsoy, NIPS’14 Introduction Matching two potentially heterogenous language objects: • to model the correspondence between “linguistic objects” of different nature at different levels of abstractions • generalizes the conventional notion of similarity or relevance • related tasks: top- k re-ranking in machine translation, dialogue, paraphrase identification Natural language sentences: • complicated structures: sequential & hierarchical Sentence matching: • the internal structures of sentences • the rich patterns in their interactions

Outline Hu, NIPS’14 Irsoy, NIPS’14 Introduction Matching two potentially heterogenous language objects: • to model the correspondence between “linguistic objects” of different nature at different levels of abstractions • generalizes the conventional notion of similarity or relevance • related tasks: top- k re-ranking in machine translation, dialogue, paraphrase identification Natural language sentences: • complicated structures: sequential & hierarchical Sentence matching: • the internal structures of sentences • the rich patterns in their interactions → adapting the convolutional strategy to natural language • the hierarchical composition for sentences • the simple-to-comprehensive fusion of matching patterns

Outline Hu, NIPS’14 Irsoy, NIPS’14 Convolutional Sentence Model Convolution: Given sentence input x , the convolution unit for feature map of type- f (among F l of them) on Layer- l : z ( l,f ) ( x ) the output of feature map of type- f for location i in Layer- l i W ( l,f ) the parameters for f on Layer- l σ ( · ) the activation function (Sigmoid or Relu) z ( l − 1) ˆ the segment of Layer- ( l − 1) for the convolution at location i i

Outline Hu, NIPS’14 Irsoy, NIPS’14 Convolutional Sentence Model (cont.) Max-Pooling: in every two-unit window for every f • shrinks the size of the representation by half → quickly absorbs the differences in length for sentence representation • filters out undesirable composition of words Length Variability: • putting all-zero padding vectors after the last word of the sentence until the maximum length • To eliminate the boundary effect caused by the great variability of sentence lengths: adding a gate g ( v ) to the convolutional unit which sets output vectors to all-zeros if the input is all zeros

Outline Hu, NIPS’14 Irsoy, NIPS’14 Some Analysis on the Convolutional Architecture The convolutional unit + max- pooling: the compositional operator with local selection mechanism as in the recursive autoencoder

Outline Hu, NIPS’14 Irsoy, NIPS’14 Some Analysis on the Convolutional Architecture The convolutional unit + max- pooling: the compositional operator with local selection mechanism as in the recursive autoencoder Compared to Recursive Models: • does not take a single path of word/phrase composition (by a separate gating function, an external parser, or just natural sequential order) • takes multiple choices of composition via a large feature map and leaves the choices to the pooling afterwards to pick the more appropriate segments for each composition • limitation of the convolutional architecture: a fixed depth → bounding the level of composition it could do Relation to “Shallow” Convolutional Models: • SENNA-type architecture: a convolution layer (local) and a max-pooling layer (global) → lost sentence-level sequential order • the superset of SENNA-type architectures

Outline Hu, NIPS’14 Irsoy, NIPS’14 Architecture-I (ARC-I) 1 Convolutional Matching Models • Finding the representation of each sentence • Comparing the representation for the two sentences with a multi- layer perceptron (MLP) 1the Siamese architecture: B. Antoine,et al. A semantic matching energy function for learning with multi-relational data. Machine Learning, 94(2):233 – 259, 2014.

Outline Hu, NIPS’14 Irsoy, NIPS’14 Architecture-I (ARC-I) 1 Convolutional Matching Models • Finding the representation of each sentence • Comparing the representation for the two sentences with a multi- layer perceptron (MLP) The Drawback of ARC-I : • defers the interaction between two sentences to until their individual representation matures → runs at the risk of losing details important for the matching task in representing the sentences • the representation of each sentence is formed without knowledge of each other, and this cannot be adequately circumvented in backward phase (learning) 1the Siamese architecture: B. Antoine,et al. A semantic matching energy function for learning with multi-relational data. Machine Learning, 94(2):233 – 259, 2014.

Outline Hu, NIPS’14 Irsoy, NIPS’14 Architecture-II (ARC-II) Convolutional Matching Models ARC-II :Built directly on the interaction space between two sentences • letting two sentences meet before their own high-level representations mature • still retaining the space for the individual development of abstraction of each sentence Layer-1: “one-dimensional” (1D) convolutions For segment i on S X and segment j on S Y : Layer-2: a 2D max-pooling in non-overlapping 2 × 2 windows Layer-3: a 2D convolution on k 3 × k 3 windows of output from Layer-2

Outline Hu, NIPS’14 Irsoy, NIPS’14 Some Analysis on ARC-II Convolutional Matching Models Order Preservation: • Both the convolution and pooling operation in ARC-II have this order preserving property. z ( l ) • Generally, contains information about the i,j words in S X before those in z ( l ) i +1 ,j , although they may be generated with slightly different segments in S Y , due to the 2D pooling. Model Generality: • ARC-II actually subsumes ARC-I as a special case

Outline Hu, NIPS’14 Irsoy, NIPS’14 Training Objective: negative sampling + a large margin • Stochastic gradient descent with mini-batch ( 100 ∼ 200 in sizes) • Regularization: - early stopping: enough for models with medium size and large training sets (with over 500 K instances) - early stopping + dropout: For small datasets (less than 10 k training instances) • Initialized input: 50 -dimensional word embeddings with the Word2Vec - English: learnt on Wikipedia ( ∼ 1 B words) - Chinese: learnt on Weibo data ( ∼ 300 M words) • Convolution: - 3 -word window throughout all experiments - test various numbers of feature maps (typically from 200 to 500 ) • Architecture: - ARC-II for all tasks: 8 layers (3 convolutions + 3 poolings + 2 MLPs) - ARC-I: less layers (2 convolutions + 2 poolings + 2 MLPs) and more hidden nodes

Outline Hu, NIPS’14 Irsoy, NIPS’14 Tasks & Competitor Methods Three tasks: • Matching language objects of heterogenous natures I. Sentence Completion II. Tweet-Response Matching • Matching homogeneous objects III. Paraphrase Identification Competitor Methods: • WordEmbed : represent each short-text as the sum of the embedding of the words it contains and match two documents by MLP • DeepMatch 2 : 3 hidden layers and 1,000 hidden nodes in 1st hidden layers • uRAE+MLP 3 : unfolding RAE, each sentence represented as a 100-dimensional vector • SENNA+MLP/sim : the SENNA-type sentence model • SenMLP : take the whole sentence as input, and use an MLP to obtain the score of coherence 2Z. Lu and H. Li. A deep architecture for matching short texts. In Advances in NIPS, 2013 3R. Socher, E. H. Huang, and A. Y. Ng. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in NIPS, 2011.

Outline Convolutional Neural Network Architectures for Matching - PowerPoint PPT Presentation

Outline Hu, NIPS14 Irsoy, NIPS14 Outline Convolutional Neural Network Architectures for Matching Natural Language Sentences. NIPS14 Convolutional Sentence Model Convolutional Matching Models Experiments Deep Recursive Neural

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Seminar Report : Automatic Categorization of SQL-Query-Results Abhijith Kashyap

Recent Results on Generalized Baumslag-Solitar Groups Derek J.S. Robinson University of Illinois

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi

Multi- -dimensional Data and dimensional Data and Spatial Range Spatial Range Multi Query in

Team: iTimer Hsien-Han Cheng 1 , Tung-Wei Lin 2 , Yu-Cheng Lin 2 , Iris Hui-Ru Jiang 2 ,Pei-Yu Lee

Simplify Container Networking With iCAN Huawei Cloud Network Lab Container Network Defined By

Dynamic Deployment and Scalability for the Cloud Jerome Bernard Director, EMEA Operations

Developing Materials Using some Principles from SLA Diane Schmitt More to Word Knowledge than