social media text analysis
play

Social Media & Text Analysis lecture 7 - Paraphrase - PowerPoint PPT Presentation

Social Media & Text Analysis lecture 7 - Paraphrase Identification and Linear Regression CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org Alan Ritter socialmedia-class.org (Recap) what is


  1. Social Media & Text Analysis lecture 7 - Paraphrase Identification and Linear Regression CSE 5539-0010 Ohio State University Instructor: Alan Ritter Website: socialmedia-class.org Alan Ritter ◦ socialmedia-class.org

  2. (Recap) what is Paraphrase? “sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)

  3. (Recap) what is Paraphrase? “sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012) rich wealthy word

  4. (Recap) what is Paraphrase? “sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012) rich wealthy word His Majesty’s address the king’s speech phrase

  5. (Recap) what is Paraphrase? “sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012) rich wealthy word His Majesty’s address the king’s speech phrase … after Boeing Co. Chief … the forced resignation Executive Harry Stonecipher of the CEO of Boeing, sentence was ousted from … Harry Stonecipher, for …

  6. The Ideal

  7. (Recap) Paraphrase Research 80s '01 '04 '05 '13 '11 '12* ‘01 '13* '14* '15*'16* WordNet Novels News Bi-Text Video Style Web Twitter Simple Xu Xu Ritter Callison-Burch Dolan Napoles Grishman Cherry Xu Ritter Callison-Burch Dolan Ji

  8. Distributional Similarity Lin and Panel (2001) operationalize the Distributional Hypothesis using dependency relationships to define similar environments. Duty and responsibility share a similar set of dependency contexts in large volumes of text: modified by adjectives objects of verbs additional, administrative, assert, assign, assume, assigned, assumed, attend to, avoid, become, collective, congressional, breach ... constitutional ... Decking Lin and Patrick Pantel. “DIRT - Discovery of Inference Rules from Text” In KDD (2001)

  9. Bilingual Pivoting word alignment ... 5 farmers were in Ireland ... thrown into jail ... fünf Landwirte , weil ... festgenommen ... oder wurden , gefoltert ... festgenommen ... or have been imprisoned , tortured ... Source: Chris Callison-Burch

  10. Bilingual Pivoting word alignment ... 5 farmers were in Ireland ... thrown into jail ... fünf Landwirte , weil ... festgenommen ... oder wurden , gefoltert ... festgenommen ... or have been imprisoned , tortured ... Source: Chris Callison-Burch

  11. Bilingual Pivoting word alignment ... 5 farmers were in Ireland ... thrown into jail ... fünf Landwirte , weil ... festgenommen ... oder wurden , gefoltert ... festgenommen ... or have been imprisoned , tortured ... Source: Chris Callison-Burch

  12. Bilingual Pivoting word alignment ... 5 farmers were in Ireland ... thrown into jail ... fünf Landwirte , weil ... festgenommen ... oder wurden , gefoltert ... festgenommen ... or have been imprisoned , tortured ... Source: Chris Callison-Burch

  13. Bilingual Pivoting word alignment ... 5 farmers were in Ireland ... thrown into jail ... fünf Landwirte , weil ... festgenommen ... oder wurden , gefoltert ... festgenommen ... or have been imprisoned , tortured ... Source: Chris Callison-Burch

  14. Key Limitations of PPDB?

  15. Key Limitations of PPDB? word sense insect, beetle, microbe, virus, pest, mosquito, bacterium, fly germ, parasite microphone, tracker, mic, bug bother, annoy, wire, earpiece, pester cookie squealer, snitch, rat, mole glitch, error, malfunction, fault, failure Source: Chris Callison-Burch

  16. Another Key Limitation 80s '01 '04 '05 '13 '11 '12* ‘01 '13* '14* '15*'16* WordNet Novels News Bi-Text Video Style Web Twitter Simple only paraphrases, no non-paraphrases

  17. Paraphrase Identification obtain sentential paraphrases automatically Yes!% Mancini has been sacked by Manchester City Mancini gets the boot from Man City No!$ WORLD OF JENKS IS ON AT 11 World of Jenks is my favorite show on tv (meaningful) non-paraphrases are needed to train classifiers! Wei Xu , Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. “Extracting Lexically Divergent Paraphrases from Twitter” In TACL (2014)

  18. Also Non-Paraphrases 80s '01 '04 '05 '13 '11 '12* ‘01 '13* ’14* 17* '15*'16* WordNet Novels News Bi-Text Video Style Web Twitter Simple (meaningful) non-paraphrases are needed to train classifiers!

  19. News Paraphrase Corpus Microsoft Research Paraphrase Corpus also contains some non-paraphrases (Dolan, Quirk and Brockett, 2004; Dolan and Brockett, 2005; Brockett and Dolan, 2005)

  20. Twitter Paraphrase Corpus also contains a lot of non-paraphrases Wei Xu , Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. “Extracting Lexically Divergent Paraphrases from Twitter” In TACL (2014)

  21. Paraphrase Identification: A Binary Classification Problem • Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} • Output: - a predicted class y ∈ Y ( y = 0 or y = 1 ) Alan Ritter ◦ socialmedia-class.org

  22. Paraphrase Identification: A Binary Classification Problem • Input: negative (non-paraphrases) - a sentence pair x - a fixed set of binary classes Y = {0, 1} • Output: - a predicted class y ∈ Y ( y = 0 or y = 1 ) Alan Ritter ◦ socialmedia-class.org

  23. Paraphrase Identification: A Binary Classification Problem • Input: negative (non-paraphrases) - a sentence pair x - a fixed set of binary classes Y = {0, 1} positive (paraphrases) • Output: - a predicted class y ∈ Y ( y = 0 or y = 1 ) Alan Ritter ◦ socialmedia-class.org

  24. Paraphrase Identification: A Binary Classification Problem • Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} • Output: - a predicted class y ∈ Y ( y = 0 or y = 1 ) Alan Ritter ◦ socialmedia-class.org

  25. Classification Method: Supervised Machine Learning • Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} - a training set of m hand-labeled sentence pairs 
 (x (1) , y (1) ), … , (x (m) , y (m) ) • Output: - a learned classifier 𝜹 : x → y ∈ Y ( y = 0 or y = 1 ) Alan Ritter ◦ socialmedia-class.org

  26. Classification Method: Supervised Machine Learning • Input: - a sentence pair x (represented by features) - a fixed set of binary classes Y = {0, 1} - a training set of m hand-labeled sentence pairs 
 (x (1) , y (1) ), … , (x (m) , y (m) ) • Output: - a learned classifier 𝜹 : x → y ∈ Y ( y = 0 or y = 1 ) Alan Ritter ◦ socialmedia-class.org

  27. (Recap) Classification Method: Supervised Machine Learning • Naïve Bayes • Logistic Regression • Support Vector Machines (SVM) • … Alan Ritter ◦ socialmedia-class.org

  28. (Recap) Naïve Bayes • Cons: features t i are assumed independent given the class y P ( t 1 , t 2 ,..., t n | y ) = P ( t 1 | y ) ⋅ P ( t 2 | y ) ⋅ ... ⋅ P ( t n | y ) • This will cause problems: - correlated features ➞ double-counted evidence - while parameters are estimated independently - hurt classier’s accuracy Alan Ritter ◦ socialmedia-class.org

  29. Classification Method: Supervised Machine Learning • Naïve Bayes • Logistic Regression • Support Vector Machines (SVM) • … Alan Ritter ◦ socialmedia-class.org

  30. Logistic Regression • One of the most useful supervised machine learning algorithm for classification! • Generally high performance for a lot of problems. • Much more robust than Naïve Bayes 
 (better performance on various datasets).

  31. Before Logistic Regression Let’s start with something simpler!

  32. Paraphrase Identification: Simplified Features • We use only one feature: - number of words that two sentences share in common Alan Ritter ◦ socialmedia-class.org

  33. A very related problem of Paraphrase Identification: Semantic Textual Similarity • How similar (close in meaning) two sentences are? 5: completely equivalent in meaning 4: mostly equivalent, but some unimportant details differ 3: roughly equivalent, some important information differs/missing 2: not equivalent, but share some details 1: not equivalent, but are on the same topic 0: completely dissimilar Alan Ritter ◦ socialmedia-class.org

  34. A Simpler Model: Linear Regression 5 Sentence Similarity (rated by Human) 4 3 2 1 0 0 5 10 15 20 #words in common (feature) Alan Ritter ◦ socialmedia-class.org

  35. A Simpler Model: Linear Regression 5 Sentence Similarity 4 3 2 1 0 0 5 10 15 20 #words in common (feature) • also supervised learning (learn from annotated data) • but for Regression : predict real-valued output 
 (Classification: predict discrete-valued output) Alan Ritter ◦ socialmedia-class.org

  36. A Simpler Model: Linear Regression 5 Sentence Similarity 4 3 threshold ➞ Classification 2 1 0 0 5 10 15 20 #words in common (feature) • also supervised learning (learn from labeled data) • but for Regression : predict real-valued output 
 (Classification: predict discrete-valued output) Alan Ritter ◦ socialmedia-class.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend