paraphrase recognition using machine learning to combine
play

Paraphrase Recognition Using Machine Learning to Combine Similarity - PowerPoint PPT Presentation

Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos Malakasiotis Department of Informatics Athens University of Economics and Business Paraphrase Recognition Given a pair of phrases, sentences, or


  1. Paraphrase Recognition Using Machine Learning to Combine Similarity Measures Prodromos Malakasiotis Department of Informatics Athens University of Economics and Business

  2. Paraphrase Recognition • Given a pair of phrases, sentences, or patterns [S 1 , S 2 ] decide if they are paraphrases, i.e., if they have (almost) the same meaning. – “X is the writer of Y” ≈ “X wrote Y” ≈ “X is the author of Y” • Related to, but not the same as textual entailment. – “Athens is the capital of Greece” ╞ “Athens is located in Greece”, but not the reverse. • Paraphrasing can be seen as bidirectional textual entailment.

  3. Paraphrase recognition with Machine Learning Training stage Classifier (S 1 ,S 2 ,YES) 1 <f 1 ,f 2 ,…,f m ,1> 1 Vector Creation (S 1 ,S 2 ,NO) 2 <f 1 ,f 2 ,…,f m ,0> 2 Trained … … Preprocessing Classifier (S 1 ,S 2 ,YES) n <f 1 ,f 2 ,…,f m ,1> n Classification stage (S 1 ,S 2 ,?) 1 <f 1 ,f 2 ,…,f m ,?> 1 <f 1 ,f 2 ,…,f m ,0> 1 Vector Creation Trained (S 1 ,S 2 ,?) 2 <f 1 ,f 2 ,…,f m ,?> 2 <f 1 ,f 2 ,…,f m ,1> 2 Classifier Preprocessing (S 1 ,S 2 ,NO) 1 • Experiments with 3 configurations: (S 1 ,S 2 ,YES) 2 – INIT, INIT+WN,INIT+WN+DEP 3

  4. INIT Configuration • The input pairs [S 1 , S 2 ] are represented as vectors of similarity scores measured on 4 forms of [S 1 , S 2 ]: – (1) words, (2) stems, (3) POS-tags, (4) soundex codes • 9 similarity measures, applied to the 4 forms: – Levenshtein (edit distance), Jaro-Winkler, Manhattan, Euclidean distance, cosine similarity, n-gram, matching coefficient, Dice, and Jaccard coefficient (see paper). – Similarities are measured in terms of tokens.

  5. Partial Matching Features While Bolton apparently fell and was immobilized, Selenski used the S 1 mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a 10-foot, S 2 razor-wire fence, Fischi said. Avg: 0.64 S 1 W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9 W 10 S 2 W’ 1 W’ 2 W’ 3 W’ 4 W’ 5 W’ 6 W’ 7 W’ 8 • Find S 1 ’s (longer sentence) part that is most similar to S 2 (shorter sentence) using a sliding window: – At each step, calculate the average of 9 similarity scores. • Use the highest average (Avg) and the 9 scores it was computed from as additional features in INIT. • Do this for words, stems, POS-tags, and soundex codes.

  6. Partial Matching Features While Bolton apparently fell and was immobilized, Selenski used the S 1 mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a 10-foot, S 2 razor-wire fence, Fischi said. Avg: 0.71 S 1 W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9 W 10 S 2 W’ 1 W’ 2 W’ 3 W’ 4 W’ 5 W’ 6 W’ 7 W’ 8 • Find S 1 ’s (longer sentence) part that is most similar to S 2 (shorter sentence) using a sliding window: – At each step, calculate the average of 9 similarity scores. • Use the highest average (Avg) and the 9 scores it was computed from as additional features in INIT. • Do this for words, stems, POS-tags, and soundex codes.

  7. Partial Matching Features While Bolton apparently fell and was immobilized, Selenski used the S 1 mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a 10-foot, S 2 razor-wire fence, Fischi said. Avg: 0.82 S 1 W 1 W 2 W 3 W 4 W 5 W 6 W 7 W 8 W 9 W 10 S 2 W’ 1 W’ 2 W’ 3 W’ 4 W’ 5 W’ 6 W’ 7 W’ 8 • Find S 1 ’s (longer sentence) part that is most similar to S 2 (shorter sentence) using a sliding window: – At each step, calculate the average of 9 similarity scores. • Use the highest average (Avg) and the 9 scores it was computed from as additional features in INIT. • Do this for words, stems, POS-tags, and soundex codes.

  8. A Partial Matching Example Initial Avg: 0.76 While Bolton apparently fell and was immobilized, Selenski used While Bolton apparently fell and was immobilized, Selenski used the mattress to scale a 10-foot, razor-wire fence, Fischi said. the mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a After the other inmate fell, Selenski used the mattress to scale a Avg: 0.68 10-foot, razor-wire fence, Fischi said. 10-foot, razor-wire fence, Fischi said. While Bolton apparently fell and was immobilized, Selenski used the mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a Avg: 0.72 10-foot, razor-wire fence, Fischi said. While Bolton apparently fell and was immobilized, Selenski used the mattress to scale a 10-foot, razor-wire fence, Fischi said. After the other inmate fell, Selenski used the mattress to scale a Avg: 10-foot, razor-wire fence, Fischi said. 0.82

  9. INIT+WN Configuration • The same as INIT, but: – It treats words from S 1 and S 2 that are synonyms (in WordNet) as identical. Fewer than a dozen FBI agents were dispatched to secure and analyze evidence. Fewer than a dozen FBI agents will be sent to Iraq to secure and analyze evidence of the bombing. INIT’s Avg: INIT+WN’s Avg: 0.73 0.78

  10. INIT+WN+DEP Configuration • Same as INIT+WN, but: – 3 additional features that measure dependency grammar similarity between S 1 and S 2 : | | common dependenci es R = 1 | | S dependenci es 1 | | common dependenci es R = 2 | | S dependenci es 2 ⋅ ⋅ 2 R R = 1 2 F + R , R 1 2 R R 1 2

  11. INIT+WN+DEP: negative example The dollar was at 116.92 yen against The dollar was at 116.78 yen JPY, the yen, flat on the session, and at virtually flat on the session, and at 1.2891 against the Swiss franc, also 1.2871 against the Swiss franc CHF, flat. down 0.1 percent. det(dollar-2, The-1) det(dollar-2, The-1) nsubj(flat-25, dollar-2) nsubj(was-3, dollar-2) … … dep(at-4, against-7) det(session-14, the-13) det(yen-9, the-8) prep_on(flat-11, session-14) … … det(session-14, the-13) dep(at-17, against-19) pobj(on-12, session-14) det(CHF-23, the-20) … Avg = 0.72 … R 1 = 0.14 R 2 = 0.16 F R1,R2 = 0.15

  12. INIT+WN+DEP: positive example Last week the power station’s US The news comes after Drax's American owners, AES Corp, walked away owner, AES Corp. AES.N, last week from the plant after banks and walked away from the plant after banks bondholders refused to accept its and bondholders refused to accept its financial restructuring offer. restructuring offer. det(news-2, The-1) amod(week-2, Last-1) … tmod(walked-13, week-2) amod(week-18, last-17) … dep(walked-19, week-18) prt(walked-13, away-14) … … prt(walked-19, away-20) … Avg= 0.71 R 1 = 0.52 R 2 = 0.59 F R1,R2 = 0.55

  13. Feature Selection • Start with an empty feature set. • Gradually add features: – Form new feature sets by adding one feature. – Measure the predictive power of the new sets. – Keep the best new feature set(s). – Tried both hill-climbing and beam-search. • A lot of redundancy in the full feature set. – Feature selection leads to competitive results with much fewer features (10 instead of 136). • But the full feature set leads to better results.

  14. Experiments • Microsof Research (MSR) Paraphrase Corpus: – 5,801 pairs of sentences evaluated by judges. – 4,076 training pairs. – 1,725 testing pairs. • Baseline (BASE): – Use a threshold on edit distance to decide if a pair is positive (paraphrases) or negative. – The threshold is tuned on the training pairs.

  15. Results on MSR corpus 90,00% 85,00% 80,00% 75,00% I I I I N N N N I I I I 70,00% T T T T + + + + I I I I N W N W N W N W N N N N I I I I 65,00% B I T + B I T + B I T + B I T + A N + D A N + D A N + D A N + D S I W E S I W E S I W E S I W E E T N P E T N P E T N P E T N P 60,00% Accuracy Precision Recall F-measure

  16. Best known results 95,00% 90,00% 85,00% 80,00% I I I I 75,00% N N N N I I I I T T T T 70,00% + + + + W W W W N F Z N F Z N F Z N F Z + I + I + I + I 65,00% H H H H D N Q A D N Q A D N Q A D N Q A W W W W E E E C I A N C I A N C I A N E C I A N P H U N G P H U N G P H U N G P H U N G 60,00% Accuracy Precision Recall F-measure

  17. INIT performs well too! 90,00% 85,00% 80,00% 75,00% I I I I N N N N I I I I 70,00% T T T T + + + + W W W W N N N N 65,00% I + I + I + I + D D N N N D N D W W W W I E A I E A I E A I E A T P N T P N T P N T P N 60,00% Accuracy Precision Recall F-measure

  18. Conclusions • INIT is competitive to the best known systems, using fewer resources. – Useful for languages where WordNet, reliable dependency parsers etc. are unavailable. • INIT+WN and INIT+WN+DEP perform even better, but they require more resources and the improvement is small. – The differences may be small because of the high lexical overlap of the paraphrases in the MSR corpus.

  19. Thank you! Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend