A Study In Hebrew Paraphrase Identification Thesis Presentation - - PowerPoint PPT Presentation

a study in hebrew paraphrase identification
SMART_READER_LITE
LIVE PREVIEW

A Study In Hebrew Paraphrase Identification Thesis Presentation - - PowerPoint PPT Presentation

Definitions and Motivation Previous Work Contribution of this Work A Study In Hebrew Paraphrase Identification Thesis Presentation Submitted by Gabriel Stanovsky Advised by Prof. Michael Elhadad Definitions and Motivation Previous Work


slide-1
SLIDE 1

Definitions and Motivation Previous Work Contribution of this Work

A Study In Hebrew Paraphrase Identification

Thesis Presentation

Submitted by Gabriel Stanovsky Advised by Prof. Michael Elhadad

slide-2
SLIDE 2

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-3
SLIDE 3

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-4
SLIDE 4

Definitions and Motivation Previous Work Contribution of this Work

Textual Entailment

Text fragment A will Textually Entail B if a human being who trusts A, on all its parts - will consequently have to infer that B is also true. Example A - דחיהכזוסוקיינתאנפתצובקלייפלאתנשבעיגהשטקורוד הנשהתואבהפוריאתופילאבהמיע B - ייפלאתנשבהפוריאתופילאבהתכזסוקיינתאנפתצובק

slide-5
SLIDE 5

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase

Text fragments (A, B) are said to be in Paraphrase Relationship if A entails B and vice-versa. Paraphrase Identification is the task of determining whether two given texts stand in a relation of paraphrasing. Simple Example A - רצואהרשתיבלומהנגפהבדאמלקהעצפנריפשויתס B - רצואהרשתיבדילהנגפהבדאמלקהעגפנריפשויתס

slide-6
SLIDE 6

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase?

The first example was a very simple one - what about the following pairs? Paraphrase? היסורעטפנרחסכסהלהעיגהיס היסורעטפנרחסמכסהלעהמתחלועבתסלכואמההנידמה

slide-7
SLIDE 7

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase?

The first example was a very simple one - what about the following pairs? Paraphrase? הירוסבתורירגשההרגסבהרא הירוסמהרירגשתאהריזחהבהרא

slide-8
SLIDE 8

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase?

The first example was a very simple one - what about the following pairs? Paraphrase? הירוסבינורחאהיעוראהתאהרקיסתשרה הירוסבתונורחאהתויושחרתההלעהחווידהריזגלאתשר

slide-9
SLIDE 9

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase?

The first example was a very simple one - what about the following pairs? Paraphrase? ראלרזחכמרחאלויסורהותימעעשגפנוהינתנימינב ישדחהילועהתייגוסתאוינפבייצויסורהוליבקמעדעונוהינתנימינב We are in need of rigorous definitions! These were produced for Hebrew during the course of this work, following similar English definitions.

slide-10
SLIDE 10

Definitions and Motivation Previous Work Contribution of this Work

Why Paraphrase?

1

Automatic Summarization: while scanning through a document, paraphrases found in text body could be detected, and then omitted, in order to provide a shorter version of the document

slide-11
SLIDE 11

Definitions and Motivation Previous Work Contribution of this Work

Why Paraphrase?

2

Automatic Construction of Thesaurus: Identifying paraphrases from freely occurring text in conjunction with exploiting knowledge of the sentence structure can be used to yield a bank of Hebrew words which are, with high probability, synonyms.

slide-12
SLIDE 12

Definitions and Motivation Previous Work Contribution of this Work

Why Paraphrase?

3

Automatic Filter of News Stream: Identification of paraphrases can be used upon a parallel news stream to detect the first occurrence of a news item (a task recently known as “first story detection”)

slide-13
SLIDE 13

Definitions and Motivation Previous Work Contribution of this Work

Why Paraphrase?

4

Plus, it is a challenging task of automating a process which is carried out naturally and with no apparent effort by human beings.

slide-14
SLIDE 14

Definitions and Motivation Previous Work Contribution of this Work

What’s Interesting in Hebrew Paraphrasing?

1

Word Agglutination: function words (prepositions, conjunctions and articles) in Hebrew can be agglutinated with other words - giving its speakers more articulations to express the same meaning. Example יצעהלשיענהלצבבשיאוה השרוחבעיגרמהלצהתחתחנאוה

slide-15
SLIDE 15

Definitions and Motivation Previous Work Contribution of this Work

What’s Interesting in Hebrew Paraphrasing?

2

Syntactic Variations Exploiting Free-Word Order in Hebrew: Sentences in Hebrew may be expressed in different word orderings, as a tool to emphasize different notions within the same occurrence Example הגועהתאיתנכהינא יתנכההגועהתא

slide-16
SLIDE 16

Definitions and Motivation Previous Work Contribution of this Work

What’s Interesting in Hebrew Paraphrasing?

3

Lexical Replacement: Replacing a Hebrew word with another derived from another language with transliterations, changing its part of speech Example הנואתביטולחלהסרהנתינוכמה הנואתהתובקעבסוללאטוטהרבעתינוכמה

slide-17
SLIDE 17

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-18
SLIDE 18

Definitions and Motivation Previous Work Contribution of this Work

Parsing

Parsing (also referred to as Syntax Analysis) is the process that maps an input sentence to the more abstract representation of a syntactic tree. This tree represents the relation among words within the sentence. The parse tree of a sentence is not naturally embedded in the text itself. Language specific information (such as the language grammar, and knowledge of specific relation between words) is often needed.

slide-19
SLIDE 19

Definitions and Motivation Previous Work Contribution of this Work

Parsing Conventions

Parsing is commonly a basic feature of NLP systems, and will play a prominent role in the systems to be described henceforth. Several conventions exist for the construction of parse

  • trees. The dominants are that of constituency parsing and

dependency parsing We will use pre-trained systems for Hebrew parsing. for both of these conventions

slide-20
SLIDE 20

Definitions and Motivation Previous Work Contribution of this Work

Phrase Structure Grammar

Phrase structure grammar (constituent grammar) was

  • riginally defined by Chomsky(1956) as part of the

generative school. A Phrase structure grammar is formally defined as a 4-tuple G = (N, T, S, P):

1

N T = ∅, N - The non-terminal set, T - the terminal set

2

S ∈ N, S being the start symbol

3

P = {(u, v) : u, v ∈ (N T)∗}, P is finite and called the production rules

slide-21
SLIDE 21

Definitions and Motivation Previous Work Contribution of this Work

Phrase Structure Grammar

According to these grammars, the leaves (terminals) of the parse tree are the words of the original sentence, appearing in the original sentence order. The rules by which an input sentence is parsed onto a parse tree are thus a modeling of a human language.

slide-22
SLIDE 22

Definitions and Motivation Previous Work Contribution of this Work

Phrase Structure Example

אל רעי ו יבוד אל . NN NN NP NP NP yyDOT CC MOD NP FRAG MOD NP

slide-23
SLIDE 23

Definitions and Motivation Previous Work Contribution of this Work

Dependency Grammar

Dependency Grammar date to the work of Tesni‘re (1959) Dependency parsing views the syntax analysis of a sentence as consisting of binary asymmetrical relations between words. According to this linguistic theory, the speaker of a language analyzes syntax by perceiving connections between words, the dependency relation aims at modeling this connection

slide-24
SLIDE 24

Definitions and Motivation Previous Work Contribution of this Work

Dependency Rules

Various definitions exist for determining when two words will appear in a dependency relation. Following are a few of these definitions (where: H marks the head and D marks the dependent):

H is obligatory, D may be optional. The form of D depends on H. The linear position of D is specified with reference to H.

slide-25
SLIDE 25

Definitions and Motivation Previous Work Contribution of this Work

Dependency Grammar Example

slide-26
SLIDE 26

Definitions and Motivation Previous Work Contribution of this Work

Parsing as a Step Towards Detecting Paraphrases

Parsing seems a necessary step to assess whether two sentences are syntactic variants. With parsing one can align paraphrase candidates so that each part of the sentence can be further analyzed in terms

  • f lexical similarity.

This is exemplified in the next slide

slide-27
SLIDE 27

Definitions and Motivation Previous Work Contribution of this Work

Dependency Transliteration Paraphrasing Example

slide-28
SLIDE 28

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-29
SLIDE 29

Definitions and Motivation Previous Work Contribution of this Work

Overview

Recent times have seen a great research interest in paraphrase related tasks, mainly in the English language. We will survey the efforts which were done and on which this work build upon. We begin by Surveying the Hebrew research that was carried out in this field.

slide-30
SLIDE 30

Definitions and Motivation Previous Work Contribution of this Work

Hebrew Research

In comparison to the vast research efforts invested in English paraphrasing, very little work has been done in the field of Hebrew paraphrase. Ordan(2007) has developed a medium scale Wordnet for Hebrew. They aligned English and Hebrew expressions, and infer relations from the English available Wordnet onto their created Hebrew Wordnet.

slide-31
SLIDE 31

Definitions and Motivation Previous Work Contribution of this Work

Foreign Languages Research

The research in the field can be divided into three main

  • categories. We will mention some of the interesting

projects in each field

1

Generation of paraphrases: The Microsoft NLP team (2004) created a statistical machine translation based technique for generating paraphrases of a given system.

2

Extraction of paraphrases from large texts: Hashimoto(2011) created a system which scans large unannotated texts, looking for “definition sentences”, deeming definition of the same term as paraphrases.

slide-32
SLIDE 32

Definitions and Motivation Previous Work Contribution of this Work

Foreign Languages Research

3

Identification of a paraphrase: Socher et al (2011) created a paraphrase identification algorithm which is considered as the current state of the art. We will elaborate on the components of this algorithm in the following sections.

slide-33
SLIDE 33

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-34
SLIDE 34

Definitions and Motivation Previous Work Contribution of this Work

Introduction to Deep Learning

Deep Learning is a recent approach which aims at modeling the human perception of complicated notions in several levels of representation. It is implemented using several neural networks connected in such a way that one network’s output it transferred to another’s input. Backpropogation is carried across networks, starting from the topmost network to the bottom.

slide-35
SLIDE 35

Definitions and Motivation Previous Work Contribution of this Work

Deep Learning Illustration

slide-36
SLIDE 36

Definitions and Motivation Previous Work Contribution of this Work

Multi Task Learning

As can be seen in the prior illustration, a word representation(also known as word embeddings) is being trained during the process, which serve several NLP tasks Collobert & Weston (2011) trained such a system on 4 NLP tasks (POS tagging, Semantic Role Labeling, Chunking, Name Entity Recognition) and achieved near state of the art results on all tasks

slide-37
SLIDE 37

Definitions and Motivation Previous Work Contribution of this Work

Word Embeddings

As a by-product of this process - The embeddings of words were published for English dictionaries It was shown by Turian (2010) to enhance performance in systems which treat words simply as index to a finite dictionary.

slide-38
SLIDE 38

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-39
SLIDE 39

Definitions and Motivation Previous Work Contribution of this Work

The Problem with the Connectionist Approach

A major drawback on the model of neural networks is the constant size of the network, on all its parts. This constant size of input and inner representations (known as the connectionist approach) does not the fit the nature of most AI tasks. In the field of this work, it is easy to observe that paraphrases must not be of same length, although by definition they convey the same amount of information.

slide-40
SLIDE 40

Definitions and Motivation Previous Work Contribution of this Work

RAAM

This property seem to limit, or cancel the use of neural networks in the field of arbitrary length instances. To attack this problem, several connectionist system were devised to cope with the unbounded input size. One of these is the Recursive Auto-Associative Memory (RAAM), due to Pollack (1990). RAAM is also known and extended later as Auto Encoders

slide-41
SLIDE 41

Definitions and Motivation Previous Work Contribution of this Work

Auto Encoders

The system Pollack devised was composed of two concatenated neural networks:

Encoder: neural network composed of an input layer of 2K elements fully connected to an output layer of K elements. Decoder: a neural network composed of an input layer of K elements fully connected to an output layer of 2K elements.

These two networks are trained concurrently as a single network: a network of 2K input elements, K hidden units in

  • ne hidden layer, and 2K elements in the output layer.
slide-42
SLIDE 42

Definitions and Motivation Previous Work Contribution of this Work

Auto Encoders Illustration

slide-43
SLIDE 43

Definitions and Motivation Previous Work Contribution of this Work

Auto Encoders

During training on an input element, the element itself is presented as the network desired outcome Thus training the network to output the input it received, after going through the hidden layer. The hidden layer (recall that this is actually the output layer

  • f the encoder), now holds a compact representation of the

input.

slide-44
SLIDE 44

Definitions and Motivation Previous Work Contribution of this Work

How to Use Auto Encoders?

The input is assumed to be a variable length list of fixed size elements (a character string for example), mark each element representation size as K. Go over the input instance of size K · n and “encode” every pair of consecutive input elements onto one compressed representation. Thus obtaining a second level of representation of size K · n

2, this process is repeated until the last level contains

  • ne element of fixed size which represents the entire

variable size input, in a fixed size representation.

slide-45
SLIDE 45

Definitions and Motivation Previous Work Contribution of this Work

Encoding of a Parse Tree

slide-46
SLIDE 46

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-47
SLIDE 47

Definitions and Motivation Previous Work Contribution of this Work

Framework Overview

Socher et al (2011), devised a framework for identifying paraphrases:

1

Replace each of the words in the two texts with its word embedding.

2

Autoencode both texts to receive embeddings of sentences.

3

Upon the resulting representation they performed a method they called “dynamic pooling” which should discriminate between paraphrases and non-paraphrases

slide-48
SLIDE 48

Definitions and Motivation Previous Work Contribution of this Work

Socher’s Results

They tested against the English reference corpus - the MicroSoft Research Paraphrase Corpus (MSRP). Results are given in the following table (along with other systems’ performance for comparison): Model ACC F1 Wan et al. (2006) 75.6 83.0 Das and Smith (2009) 76.1 82.7 Socher et al. (2011) 76.8 83.6

slide-49
SLIDE 49

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-50
SLIDE 50

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase Identification System

The main body of work aims at constructing an Hebrew paraphrase identification system. The path chosen begins similar to that of the state of the art in English, to exploit auto encoding over feature embeddings of words, in order to create compact representation of the embedded parse trees. The use of these autoencoded parse trees differs in the course of this work. A new concept was defined - Tree Matching.

slide-51
SLIDE 51

Definitions and Motivation Previous Work Contribution of this Work

Framework Overview

slide-52
SLIDE 52

Definitions and Motivation Previous Work Contribution of this Work

Embedding Computation

Each word w ∈ D (dictionary) is mapped onto a vector rw ∈ Rd The structure of this vector space should reflect word similarity, so that if two words are “similar” (along multiple linguistic dimensions, meaning, spelling, morphology, parts

  • f speech etc), their vector encoding will be “close” in Rd.

These embeddings were computed for Hebrew dictionary

  • f 5K words, training in a deep learning architecture a

desired language model.

slide-53
SLIDE 53

Definitions and Motivation Previous Work Contribution of this Work

Language Model

The method taken for defining a language model, is that it would be able to separate valid occurrences of text from non-valid ones. Thus, a large segmented corpus of 131M tokens was sampled for overlapping 5-grams, to obtain valid examples, and non-valid (or corrupt) 5-grams were obtained by replacing one word in the valid n-gram. the error backpropogated was max(0, 1 − f(s) + f(sw)) Where w is a word chosen uniformly at random from the dictionary.

slide-54
SLIDE 54

Definitions and Motivation Previous Work Contribution of this Work

Language Model Overview

slide-55
SLIDE 55

Definitions and Motivation Previous Work Contribution of this Work

Auto Encoding

As described in previous slides - an autoencoder was developed and trained. Both on dependency as well as constituency parse trees. The 131M token corpus was sampled for 150K sentences which were used for training.

slide-56
SLIDE 56

Definitions and Motivation Previous Work Contribution of this Work

Binarization

As can be seen, auto encoding needs the tree to have a fixed number of child nodes, this is not the case in parse trees. Although binarization of constituency parse trees is quite common, no such algorithm was found for dependency. An algorithm was developed for Dependency Parse Tree Binarization.

slide-57
SLIDE 57

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase Pair After Binarization

slide-58
SLIDE 58

Definitions and Motivation Previous Work Contribution of this Work

Tree Matching

Definition Consider two binary trees - t1, t2 (corresponding to sentences s1, s2, and obtained from them by auto encoding) Define a “Tree Match” M, to be any set of tuples (n1, n2), where n1, n2 are nodes of t1, t2 accordingly, s.t. for every word w in s1(s2), M contains exactly one tuple which contains a node in the path from w to the root of t1 (t2)

slide-59
SLIDE 59

Definitions and Motivation Previous Work Contribution of this Work

Motivation for Tree matching

This definition captures the idea that a paraphrase pair consists

  • f sentences whose parts are interchangeable, from the

sentence level down to the word level (including word-reordering).

slide-60
SLIDE 60

Definitions and Motivation Previous Work Contribution of this Work

Tree Match Score

Following this definition, a score of a match can be defined: Definition S(M) =

  • (n1,n2)∈M

(||n1 − n2||2 · (# spanned leaves by n1 and n2)) A Minimal Match for sentece pair (s1, s2) is the match which gives the minimal score for these sentences.

slide-61
SLIDE 61

Definitions and Motivation Previous Work Contribution of this Work

Training

During training, a simple classifier can learn the threshold below which minimal matches represents pairs which are paraphrases. After training, the classifier would yield not only a binary value, but also significant matching between the pair, "explaining" why they are a paraphrase.

slide-62
SLIDE 62

Definitions and Motivation Previous Work Contribution of this Work

Tree Matching Example

slide-63
SLIDE 63

Definitions and Motivation Previous Work Contribution of this Work

Tree Matching is NP Complete

It can be shown that Tree Matching, as defined above, is an NP-Complete problem by showing: NP-Completeness Positive SubsetSum ≤p Tree Matching

slide-64
SLIDE 64

Definitions and Motivation Previous Work Contribution of this Work

Paraphrase Corpus for Algorithm Evaluation

In order to test the framework, an Hebrew paraphrase corpus was needed to be collected. An algorithm was developed to acquire news articles from leading news sites, and align these based on the time they were published, and syntactic similarity. A very large unannotated corpus (about 1.4M headlines)

  • f possible paraphrase pair was collected
slide-65
SLIDE 65

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-66
SLIDE 66

Definitions and Motivation Previous Work Contribution of this Work

As a by product of developing the framework, several resources were compiled, which can be re-used in future research: Annotated Paraphrase Corpus: 1K of the possible pairs were tagged by human judges to obtain an annotated reference corpus for future research comparison. Word Embedding: An embedding dictionary of 5K common Hebrew words was calculated, and proven to be useful as a plugin enhancer for supervised NLP tasks.

slide-67
SLIDE 67

Definitions and Motivation Previous Work Contribution of this Work

Outline

1

Definitions and Motivation What is a Paraphrase? Linguistic Background

2

Previous Work Overview Deep Learning Method Recursive Auto Encoding State of the Art English Paraphrasing Identification

3

Contribution of this Work Algorithms Developed Generated Resources Results

slide-68
SLIDE 68

Definitions and Motivation Previous Work Contribution of this Work

Embeddings as plugin enhancer

The produced embeddings show improvement when adding them to a CRF POS tagger (values show ACC/F1): without embeddings with embeddings TB1 0.879 / 0.735 0.900 / 0.804 A7 0.910 / 0.701 0.940 / 0.821 All 0.866 / 0.662 0.880 / 0.723

slide-69
SLIDE 69

Definitions and Motivation Previous Work Contribution of this Work

Paraphrasing Framework Evaluation

The proposed system shown to achieve results compatible with the obtained state of the art results for the English task (about 2% lower): Parse Type Performance(ACC/F1) Dependency 74.38 / 80.35 Constituency 69.20 / 74.83

slide-70
SLIDE 70

Definitions and Motivation Previous Work Contribution of this Work

Interesting Results

לארשירבעלתוטקרירילכיסריוואהליח העוצרהמתוטקררוגישלכיסריוואהליח

slide-71
SLIDE 71

Definitions and Motivation Previous Work Contribution of this Work

Interesting Results

הירוסליסוטמדגנתוכרעמוברקיסוטמקפסתהיסור הירוסלתיריוואהנגהתוכרעמוברקיסוטמרוכמתהיסור

slide-72
SLIDE 72

Definitions and Motivation Previous Work Contribution of this Work

Interesting Results

צעחמרגאמלארשיבקויקוחהרשוא לארשיבקויצעחמרגאמהרשיאתסנכה

slide-73
SLIDE 73

Definitions and Motivation Previous Work Contribution of this Work

The End

THANK YOU!