Definitions and Motivation Previous Work Contribution of this Work
A Study In Hebrew Paraphrase Identification Thesis Presentation - - PowerPoint PPT Presentation
A Study In Hebrew Paraphrase Identification Thesis Presentation - - PowerPoint PPT Presentation
Definitions and Motivation Previous Work Contribution of this Work A Study In Hebrew Paraphrase Identification Thesis Presentation Submitted by Gabriel Stanovsky Advised by Prof. Michael Elhadad Definitions and Motivation Previous Work
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Textual Entailment
Text fragment A will Textually Entail B if a human being who trusts A, on all its parts - will consequently have to infer that B is also true. Example A - דחיהכזוסוקיינתאנפתצובקלייפלאתנשבעיגהשטקורוד הנשהתואבהפוריאתופילאבהמיע B - ייפלאתנשבהפוריאתופילאבהתכזסוקיינתאנפתצובק
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase
Text fragments (A, B) are said to be in Paraphrase Relationship if A entails B and vice-versa. Paraphrase Identification is the task of determining whether two given texts stand in a relation of paraphrasing. Simple Example A - רצואהרשתיבלומהנגפהבדאמלקהעצפנריפשויתס B - רצואהרשתיבדילהנגפהבדאמלקהעגפנריפשויתס
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase?
The first example was a very simple one - what about the following pairs? Paraphrase? היסורעטפנרחסכסהלהעיגהיס היסורעטפנרחסמכסהלעהמתחלועבתסלכואמההנידמה
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase?
The first example was a very simple one - what about the following pairs? Paraphrase? הירוסבתורירגשההרגסבהרא הירוסמהרירגשתאהריזחהבהרא
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase?
The first example was a very simple one - what about the following pairs? Paraphrase? הירוסבינורחאהיעוראהתאהרקיסתשרה הירוסבתונורחאהתויושחרתההלעהחווידהריזגלאתשר
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase?
The first example was a very simple one - what about the following pairs? Paraphrase? ראלרזחכמרחאלויסורהותימעעשגפנוהינתנימינב ישדחהילועהתייגוסתאוינפבייצויסורהוליבקמעדעונוהינתנימינב We are in need of rigorous definitions! These were produced for Hebrew during the course of this work, following similar English definitions.
Definitions and Motivation Previous Work Contribution of this Work
Why Paraphrase?
1
Automatic Summarization: while scanning through a document, paraphrases found in text body could be detected, and then omitted, in order to provide a shorter version of the document
Definitions and Motivation Previous Work Contribution of this Work
Why Paraphrase?
2
Automatic Construction of Thesaurus: Identifying paraphrases from freely occurring text in conjunction with exploiting knowledge of the sentence structure can be used to yield a bank of Hebrew words which are, with high probability, synonyms.
Definitions and Motivation Previous Work Contribution of this Work
Why Paraphrase?
3
Automatic Filter of News Stream: Identification of paraphrases can be used upon a parallel news stream to detect the first occurrence of a news item (a task recently known as “first story detection”)
Definitions and Motivation Previous Work Contribution of this Work
Why Paraphrase?
4
Plus, it is a challenging task of automating a process which is carried out naturally and with no apparent effort by human beings.
Definitions and Motivation Previous Work Contribution of this Work
What’s Interesting in Hebrew Paraphrasing?
1
Word Agglutination: function words (prepositions, conjunctions and articles) in Hebrew can be agglutinated with other words - giving its speakers more articulations to express the same meaning. Example יצעהלשיענהלצבבשיאוה השרוחבעיגרמהלצהתחתחנאוה
Definitions and Motivation Previous Work Contribution of this Work
What’s Interesting in Hebrew Paraphrasing?
2
Syntactic Variations Exploiting Free-Word Order in Hebrew: Sentences in Hebrew may be expressed in different word orderings, as a tool to emphasize different notions within the same occurrence Example הגועהתאיתנכהינא יתנכההגועהתא
Definitions and Motivation Previous Work Contribution of this Work
What’s Interesting in Hebrew Paraphrasing?
3
Lexical Replacement: Replacing a Hebrew word with another derived from another language with transliterations, changing its part of speech Example הנואתביטולחלהסרהנתינוכמה הנואתהתובקעבסוללאטוטהרבעתינוכמה
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Parsing
Parsing (also referred to as Syntax Analysis) is the process that maps an input sentence to the more abstract representation of a syntactic tree. This tree represents the relation among words within the sentence. The parse tree of a sentence is not naturally embedded in the text itself. Language specific information (such as the language grammar, and knowledge of specific relation between words) is often needed.
Definitions and Motivation Previous Work Contribution of this Work
Parsing Conventions
Parsing is commonly a basic feature of NLP systems, and will play a prominent role in the systems to be described henceforth. Several conventions exist for the construction of parse
- trees. The dominants are that of constituency parsing and
dependency parsing We will use pre-trained systems for Hebrew parsing. for both of these conventions
Definitions and Motivation Previous Work Contribution of this Work
Phrase Structure Grammar
Phrase structure grammar (constituent grammar) was
- riginally defined by Chomsky(1956) as part of the
generative school. A Phrase structure grammar is formally defined as a 4-tuple G = (N, T, S, P):
1
N T = ∅, N - The non-terminal set, T - the terminal set
2
S ∈ N, S being the start symbol
3
P = {(u, v) : u, v ∈ (N T)∗}, P is finite and called the production rules
Definitions and Motivation Previous Work Contribution of this Work
Phrase Structure Grammar
According to these grammars, the leaves (terminals) of the parse tree are the words of the original sentence, appearing in the original sentence order. The rules by which an input sentence is parsed onto a parse tree are thus a modeling of a human language.
Definitions and Motivation Previous Work Contribution of this Work
Phrase Structure Example
אל רעי ו יבוד אל . NN NN NP NP NP yyDOT CC MOD NP FRAG MOD NP
Definitions and Motivation Previous Work Contribution of this Work
Dependency Grammar
Dependency Grammar date to the work of Tesni‘re (1959) Dependency parsing views the syntax analysis of a sentence as consisting of binary asymmetrical relations between words. According to this linguistic theory, the speaker of a language analyzes syntax by perceiving connections between words, the dependency relation aims at modeling this connection
Definitions and Motivation Previous Work Contribution of this Work
Dependency Rules
Various definitions exist for determining when two words will appear in a dependency relation. Following are a few of these definitions (where: H marks the head and D marks the dependent):
H is obligatory, D may be optional. The form of D depends on H. The linear position of D is specified with reference to H.
Definitions and Motivation Previous Work Contribution of this Work
Dependency Grammar Example
Definitions and Motivation Previous Work Contribution of this Work
Parsing as a Step Towards Detecting Paraphrases
Parsing seems a necessary step to assess whether two sentences are syntactic variants. With parsing one can align paraphrase candidates so that each part of the sentence can be further analyzed in terms
- f lexical similarity.
This is exemplified in the next slide
Definitions and Motivation Previous Work Contribution of this Work
Dependency Transliteration Paraphrasing Example
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Overview
Recent times have seen a great research interest in paraphrase related tasks, mainly in the English language. We will survey the efforts which were done and on which this work build upon. We begin by Surveying the Hebrew research that was carried out in this field.
Definitions and Motivation Previous Work Contribution of this Work
Hebrew Research
In comparison to the vast research efforts invested in English paraphrasing, very little work has been done in the field of Hebrew paraphrase. Ordan(2007) has developed a medium scale Wordnet for Hebrew. They aligned English and Hebrew expressions, and infer relations from the English available Wordnet onto their created Hebrew Wordnet.
Definitions and Motivation Previous Work Contribution of this Work
Foreign Languages Research
The research in the field can be divided into three main
- categories. We will mention some of the interesting
projects in each field
1
Generation of paraphrases: The Microsoft NLP team (2004) created a statistical machine translation based technique for generating paraphrases of a given system.
2
Extraction of paraphrases from large texts: Hashimoto(2011) created a system which scans large unannotated texts, looking for “definition sentences”, deeming definition of the same term as paraphrases.
Definitions and Motivation Previous Work Contribution of this Work
Foreign Languages Research
3
Identification of a paraphrase: Socher et al (2011) created a paraphrase identification algorithm which is considered as the current state of the art. We will elaborate on the components of this algorithm in the following sections.
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Introduction to Deep Learning
Deep Learning is a recent approach which aims at modeling the human perception of complicated notions in several levels of representation. It is implemented using several neural networks connected in such a way that one network’s output it transferred to another’s input. Backpropogation is carried across networks, starting from the topmost network to the bottom.
Definitions and Motivation Previous Work Contribution of this Work
Deep Learning Illustration
Definitions and Motivation Previous Work Contribution of this Work
Multi Task Learning
As can be seen in the prior illustration, a word representation(also known as word embeddings) is being trained during the process, which serve several NLP tasks Collobert & Weston (2011) trained such a system on 4 NLP tasks (POS tagging, Semantic Role Labeling, Chunking, Name Entity Recognition) and achieved near state of the art results on all tasks
Definitions and Motivation Previous Work Contribution of this Work
Word Embeddings
As a by-product of this process - The embeddings of words were published for English dictionaries It was shown by Turian (2010) to enhance performance in systems which treat words simply as index to a finite dictionary.
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
The Problem with the Connectionist Approach
A major drawback on the model of neural networks is the constant size of the network, on all its parts. This constant size of input and inner representations (known as the connectionist approach) does not the fit the nature of most AI tasks. In the field of this work, it is easy to observe that paraphrases must not be of same length, although by definition they convey the same amount of information.
Definitions and Motivation Previous Work Contribution of this Work
RAAM
This property seem to limit, or cancel the use of neural networks in the field of arbitrary length instances. To attack this problem, several connectionist system were devised to cope with the unbounded input size. One of these is the Recursive Auto-Associative Memory (RAAM), due to Pollack (1990). RAAM is also known and extended later as Auto Encoders
Definitions and Motivation Previous Work Contribution of this Work
Auto Encoders
The system Pollack devised was composed of two concatenated neural networks:
Encoder: neural network composed of an input layer of 2K elements fully connected to an output layer of K elements. Decoder: a neural network composed of an input layer of K elements fully connected to an output layer of 2K elements.
These two networks are trained concurrently as a single network: a network of 2K input elements, K hidden units in
- ne hidden layer, and 2K elements in the output layer.
Definitions and Motivation Previous Work Contribution of this Work
Auto Encoders Illustration
Definitions and Motivation Previous Work Contribution of this Work
Auto Encoders
During training on an input element, the element itself is presented as the network desired outcome Thus training the network to output the input it received, after going through the hidden layer. The hidden layer (recall that this is actually the output layer
- f the encoder), now holds a compact representation of the
input.
Definitions and Motivation Previous Work Contribution of this Work
How to Use Auto Encoders?
The input is assumed to be a variable length list of fixed size elements (a character string for example), mark each element representation size as K. Go over the input instance of size K · n and “encode” every pair of consecutive input elements onto one compressed representation. Thus obtaining a second level of representation of size K · n
2, this process is repeated until the last level contains
- ne element of fixed size which represents the entire
variable size input, in a fixed size representation.
Definitions and Motivation Previous Work Contribution of this Work
Encoding of a Parse Tree
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Framework Overview
Socher et al (2011), devised a framework for identifying paraphrases:
1
Replace each of the words in the two texts with its word embedding.
2
Autoencode both texts to receive embeddings of sentences.
3
Upon the resulting representation they performed a method they called “dynamic pooling” which should discriminate between paraphrases and non-paraphrases
Definitions and Motivation Previous Work Contribution of this Work
Socher’s Results
They tested against the English reference corpus - the MicroSoft Research Paraphrase Corpus (MSRP). Results are given in the following table (along with other systems’ performance for comparison): Model ACC F1 Wan et al. (2006) 75.6 83.0 Das and Smith (2009) 76.1 82.7 Socher et al. (2011) 76.8 83.6
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase Identification System
The main body of work aims at constructing an Hebrew paraphrase identification system. The path chosen begins similar to that of the state of the art in English, to exploit auto encoding over feature embeddings of words, in order to create compact representation of the embedded parse trees. The use of these autoencoded parse trees differs in the course of this work. A new concept was defined - Tree Matching.
Definitions and Motivation Previous Work Contribution of this Work
Framework Overview
Definitions and Motivation Previous Work Contribution of this Work
Embedding Computation
Each word w ∈ D (dictionary) is mapped onto a vector rw ∈ Rd The structure of this vector space should reflect word similarity, so that if two words are “similar” (along multiple linguistic dimensions, meaning, spelling, morphology, parts
- f speech etc), their vector encoding will be “close” in Rd.
These embeddings were computed for Hebrew dictionary
- f 5K words, training in a deep learning architecture a
desired language model.
Definitions and Motivation Previous Work Contribution of this Work
Language Model
The method taken for defining a language model, is that it would be able to separate valid occurrences of text from non-valid ones. Thus, a large segmented corpus of 131M tokens was sampled for overlapping 5-grams, to obtain valid examples, and non-valid (or corrupt) 5-grams were obtained by replacing one word in the valid n-gram. the error backpropogated was max(0, 1 − f(s) + f(sw)) Where w is a word chosen uniformly at random from the dictionary.
Definitions and Motivation Previous Work Contribution of this Work
Language Model Overview
Definitions and Motivation Previous Work Contribution of this Work
Auto Encoding
As described in previous slides - an autoencoder was developed and trained. Both on dependency as well as constituency parse trees. The 131M token corpus was sampled for 150K sentences which were used for training.
Definitions and Motivation Previous Work Contribution of this Work
Binarization
As can be seen, auto encoding needs the tree to have a fixed number of child nodes, this is not the case in parse trees. Although binarization of constituency parse trees is quite common, no such algorithm was found for dependency. An algorithm was developed for Dependency Parse Tree Binarization.
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase Pair After Binarization
Definitions and Motivation Previous Work Contribution of this Work
Tree Matching
Definition Consider two binary trees - t1, t2 (corresponding to sentences s1, s2, and obtained from them by auto encoding) Define a “Tree Match” M, to be any set of tuples (n1, n2), where n1, n2 are nodes of t1, t2 accordingly, s.t. for every word w in s1(s2), M contains exactly one tuple which contains a node in the path from w to the root of t1 (t2)
Definitions and Motivation Previous Work Contribution of this Work
Motivation for Tree matching
This definition captures the idea that a paraphrase pair consists
- f sentences whose parts are interchangeable, from the
sentence level down to the word level (including word-reordering).
Definitions and Motivation Previous Work Contribution of this Work
Tree Match Score
Following this definition, a score of a match can be defined: Definition S(M) =
- (n1,n2)∈M
(||n1 − n2||2 · (# spanned leaves by n1 and n2)) A Minimal Match for sentece pair (s1, s2) is the match which gives the minimal score for these sentences.
Definitions and Motivation Previous Work Contribution of this Work
Training
During training, a simple classifier can learn the threshold below which minimal matches represents pairs which are paraphrases. After training, the classifier would yield not only a binary value, but also significant matching between the pair, "explaining" why they are a paraphrase.
Definitions and Motivation Previous Work Contribution of this Work
Tree Matching Example
Definitions and Motivation Previous Work Contribution of this Work
Tree Matching is NP Complete
It can be shown that Tree Matching, as defined above, is an NP-Complete problem by showing: NP-Completeness Positive SubsetSum ≤p Tree Matching
Definitions and Motivation Previous Work Contribution of this Work
Paraphrase Corpus for Algorithm Evaluation
In order to test the framework, an Hebrew paraphrase corpus was needed to be collected. An algorithm was developed to acquire news articles from leading news sites, and align these based on the time they were published, and syntactic similarity. A very large unannotated corpus (about 1.4M headlines)
- f possible paraphrase pair was collected
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
As a by product of developing the framework, several resources were compiled, which can be re-used in future research: Annotated Paraphrase Corpus: 1K of the possible pairs were tagged by human judges to obtain an annotated reference corpus for future research comparison. Word Embedding: An embedding dictionary of 5K common Hebrew words was calculated, and proven to be useful as a plugin enhancer for supervised NLP tasks.
Definitions and Motivation Previous Work Contribution of this Work
Outline
1
Definitions and Motivation What is a Paraphrase? Linguistic Background
2
Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification
3
Contribution of this Work Algorithms Developed Generated Resources Results
Definitions and Motivation Previous Work Contribution of this Work
Embeddings as plugin enhancer
The produced embeddings show improvement when adding them to a CRF POS tagger (values show ACC/F1): without embeddings with embeddings TB1 0.879 / 0.735 0.900 / 0.804 A7 0.910 / 0.701 0.940 / 0.821 All 0.866 / 0.662 0.880 / 0.723
Definitions and Motivation Previous Work Contribution of this Work
Paraphrasing Framework Evaluation
The proposed system shown to achieve results compatible with the obtained state of the art results for the English task (about 2% lower): Parse Type Performance(ACC/F1) Dependency 74.38 / 80.35 Constituency 69.20 / 74.83
Definitions and Motivation Previous Work Contribution of this Work
Interesting Results
לארשירבעלתוטקרירילכיסריוואהליח העוצרהמתוטקררוגישלכיסריוואהליח
Definitions and Motivation Previous Work Contribution of this Work
Interesting Results
הירוסליסוטמדגנתוכרעמוברקיסוטמקפסתהיסור הירוסלתיריוואהנגהתוכרעמוברקיסוטמרוכמתהיסור
Definitions and Motivation Previous Work Contribution of this Work
Interesting Results
צעחמרגאמלארשיבקויקוחהרשוא לארשיבקויצעחמרגאמהרשיאתסנכה
Definitions and Motivation Previous Work Contribution of this Work