Paraphrase Recognition Using Machine Learning to Combine Similarity - PowerPoint PPT Presentation

Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos Malakasiotis Department of Informatics Athens University of Economics and Business

Paraphrase Recognition • Given a pair of phrases, sentences, or patterns [S 1 , S 2 ] decide if they are paraphrases, i.e., if they have (almost) the same meaning. – “X is the writer of Y” ≈ “X wrote Y” ≈ “X is the author of Y” • Related to, but not the same as textual entailment. – “Athens is the capital of Greece” ╞ “Athens is located in Greece”, but not the reverse. • Paraphrasing can be seen as bidirectional textual entailment.

Paraphrase recognition with Machine Learning Training stage Classifier (S 1 ,S 2 ,YES) 1 <f 1 ,f 2 ,…,f m ,1> 1 Vector Creation (S 1 ,S 2 ,NO) 2 <f 1 ,f 2 ,…,f m ,0> 2 Trained … … Preprocessing Classifier (S 1 ,S 2 ,YES) n <f 1 ,f 2 ,…,f m ,1> n Classification stage (S 1 ,S 2 ,?) 1 <f 1 ,f 2 ,…,f m ,?> 1 <f 1 ,f 2 ,…,f m ,0> 1 Vector Creation Trained (S 1 ,S 2 ,?) 2 <f 1 ,f 2 ,…,f m ,?> 2 <f 1 ,f 2 ,…,f m ,1> 2 Classifier Preprocessing (S 1 ,S 2 ,NO) 1 • Experiments with 3 configurations: (S 1 ,S 2 ,YES) 2 – INIT, INIT+WN,INIT+WN+DEP 3

INIT Configuration • The input pairs [S 1 , S 2 ] are represented as vectors of similarity scores measured on 4 forms of [S 1 , S 2 ]: – (1) words, (2) stems, (3) POS-tags, (4) soundex codes • 9 similarity measures, applied to the 4 forms: – Levenshtein (edit distance), Jaro-Winkler, Manhattan, Euclidean distance, cosine similarity, n-gram, matching coefficient, Dice, and Jaccard coefficient (see paper). – Similarities are measured in terms of tokens.

Partial Matching Features While Bolton apparently fell and was immobilized, Selenski used the S 1 mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a 10-foot, S 2 razor-wire fence, Fischi said. Avg: 0.64 S 1 W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9 W 10 S 2 W’ 1 W’ 2 W’ 3 W’ 4 W’ 5 W’ 6 W’ 7 W’ 8 • Find S 1 ’s (longer sentence) part that is most similar to S 2 (shorter sentence) using a sliding window: – At each step, calculate the average of 9 similarity scores. • Use the highest average (Avg) and the 9 scores it was computed from as additional features in INIT. • Do this for words, stems, POS-tags, and soundex codes.

A Partial Matching Example Initial Avg: 0.76 While Bolton apparently fell and was immobilized, Selenski used While Bolton apparently fell and was immobilized, Selenski used the mattress to scale a 10-foot, razor-wire fence, Fischi said. the mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a After the other inmate fell, Selenski used the mattress to scale a Avg: 0.68 10-foot, razor-wire fence, Fischi said. 10-foot, razor-wire fence, Fischi said. While Bolton apparently fell and was immobilized, Selenski used the mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a Avg: 0.72 10-foot, razor-wire fence, Fischi said. While Bolton apparently fell and was immobilized, Selenski used the mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a Avg: 10-foot, razor-wire fence, Fischi said. 0.82

INIT+WN Configuration • The same as INIT, but: – It treats words from S 1 and S 2 that are synonyms (in WordNet) as identical. Fewer than a dozen FBI agents were dispatched to secure and analyze evidence. Fewer than a dozen FBI agents will be sent to Iraq to secure and analyze evidence of the bombing. INIT’s Avg: INIT+WN’s Avg: 0.73 0.78

INIT+WN+DEP Configuration • Same as INIT+WN, but: – 3 additional features that measure dependency grammar similarity between S 1 and S 2 : | | common dependenci es R = 1 | | S dependenci es 1 | | common dependenci es R = 2 | | S dependenci es 2 ⋅ ⋅ 2 R R = 1 2 F + R , R 1 2 R R 1 2

INIT+WN+DEP: negative example The dollar was at 116.92 yen against The dollar was at 116.78 yen JPY, the yen, flat on the session, and at virtually flat on the session, and at 1.2891 against the Swiss franc, also 1.2871 against the Swiss franc CHF, flat. down 0.1 percent. det(dollar-2, The-1) det(dollar-2, The-1) nsubj(flat-25, dollar-2) nsubj(was-3, dollar-2) … … dep(at-4, against-7) det(session-14, the-13) det(yen-9, the-8) prep_on(flat-11, session-14) … … det(session-14, the-13) dep(at-17, against-19) pobj(on-12, session-14) det(CHF-23, the-20) … Avg = 0.72 … R 1 = 0.14 R 2 = 0.16 F R1,R2 = 0.15

INIT+WN+DEP: positive example Last week the power station’s US The news comes after Drax's American owners, AES Corp, walked away owner, AES Corp. AES.N, last week from the plant after banks and walked away from the plant after banks bondholders refused to accept its and bondholders refused to accept its financial restructuring offer. restructuring offer. det(news-2, The-1) amod(week-2, Last-1) … tmod(walked-13, week-2) amod(week-18, last-17) … dep(walked-19, week-18) prt(walked-13, away-14) … … prt(walked-19, away-20) … Avg= 0.71 R 1 = 0.52 R 2 = 0.59 F R1,R2 = 0.55

Feature Selection • Start with an empty feature set. • Gradually add features: – Form new feature sets by adding one feature. – Measure the predictive power of the new sets. – Keep the best new feature set(s). – Tried both hill-climbing and beam-search. • A lot of redundancy in the full feature set. – Feature selection leads to competitive results with much fewer features (10 instead of 136). • But the full feature set leads to better results.

Experiments • Microsof Research (MSR) Paraphrase Corpus: – 5,801 pairs of sentences evaluated by judges. – 4,076 training pairs. – 1,725 testing pairs. • Baseline (BASE): – Use a threshold on edit distance to decide if a pair is positive (paraphrases) or negative. – The threshold is tuned on the training pairs.

Results on MSR corpus 90,00% 85,00% 80,00% 75,00% I I I I N N N N I I I I 70,00% T T T T + + + + I I I I N W N W N W N W N N N N I I I I 65,00% B I T + B I T + B I T + B I T + A N + D A N + D A N + D A N + D S I W E S I W E S I W E S I W E E T N P E T N P E T N P E T N P 60,00% Accuracy Precision Recall F-measure

Best known results 95,00% 90,00% 85,00% 80,00% I I I I 75,00% N N N N I I I I T T T T 70,00% + + + + W W W W N F Z N F Z N F Z N F Z + I + I + I + I 65,00% H H H H D N Q A D N Q A D N Q A D N Q A W W W W E E E C I A N C I A N C I A N E C I A N P H U N G P H U N G P H U N G P H U N G 60,00% Accuracy Precision Recall F-measure

INIT performs well too! 90,00% 85,00% 80,00% 75,00% I I I I N N N N I I I I 70,00% T T T T + + + + W W W W N N N N 65,00% I + I + I + I + D D N N N D N D W W W W I E A I E A I E A I E A T P N T P N T P N T P N 60,00% Accuracy Precision Recall F-measure

Conclusions • INIT is competitive to the best known systems, using fewer resources. – Useful for languages where WordNet, reliable dependency parsers etc. are unavailable. • INIT+WN and INIT+WN+DEP perform even better, but they require more resources and the improvement is small. – The differences may be small because of the high lexical overlap of the paraphrases in the MSR corpus.

Thank you! Questions?

Paraphrase Recognition Using Machine Learning to Combine Similarity - PowerPoint PPT Presentation

Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos Malakasiotis Department of Informatics Athens University of Economics and Business Paraphrase Recognition Given a pair of phrases, sentences, or

Using Discourse Information for Paraphrase Extraction Michaela Regneri & Rui Wang Saarland

Summary-Paraphrase-Analysis 1 revised: 10.06.11 || English 1301: Composition & Rhetoric I ||

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Bayesian Learning l A powerful approach in machine learning l Combine data seen so far with prior

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto

Distance Measure for Querying Arrangements of Temporal Intervals Orestis Kostakis, Panagiotis

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Investor Similarity Affects Investment Decisions This Paper: investors who trade an asset care

Citation networks in economics Carlo D Ippoliti Carlo D Ippoliti Citation Networks in

t t tts

Textual Similarity of Firm Accounting Policy Disclosures Reginald Edwards, PhD April 2018

Paraphrase Recognition Using Machine Learning to Combine Similarity - PowerPoint PPT Presentation

Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos Malakasiotis Department of Informatics Athens University of Economics and Business Paraphrase Recognition Given a pair of phrases, sentences, or

Using Discourse Information for Paraphrase Extraction Michaela Regneri &amp; Rui Wang Saarland

Summary-Paraphrase-Analysis 1 revised: 10.06.11 || English 1301: Composition &amp; Rhetoric I ||

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Topics in Computational Linguistics Learning to Paraphrase: An Unsupervised Approach Using

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Bayesian Learning l A powerful approach in machine learning l Combine data seen so far with prior

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto

Distance Measure for Querying Arrangements of Temporal Intervals Orestis Kostakis, Panagiotis

Phylogenetics Eliran Avni, Reuven Cohen, Sagi Snir Presentation by Ashu Gupta Motivation

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Investor Similarity Affects Investment Decisions This Paper: investors who trade an asset care

Citation networks in economics Carlo D Ippoliti Carlo D Ippoliti Citation Networks in

t t tts

Textual Similarity of Firm Accounting Policy Disclosures Reginald Edwards, PhD April 2018

Using Discourse Information for Paraphrase Extraction Michaela Regneri & Rui Wang Saarland

Summary-Paraphrase-Analysis 1 revised: 10.06.11 || English 1301: Composition & Rhetoric I ||