exploiting and expanding corpus resources for frame
play

Exploiting and Expanding Corpus Resources for Frame-Semantic - PowerPoint PPT Presentation

Exploiting and Expanding Corpus Resources for Frame-Semantic Parsing Nathan Schneider, CMU (with Chris Dyer & Noah A. Smith) April 26, 2013 IFNW13 1 FrameNet + NLP = <3 We want to develop systems that understand text


  1. Exploiting and Expanding Corpus Resources for Frame-Semantic Parsing Nathan Schneider, CMU (with Chris Dyer & Noah A. Smith) April 26, 2013 ■ IFNW’13 1

  2. FrameNet + NLP = <3 • We want to develop systems that understand text • Frame semantics and FrameNet o ff er a linguistically & computationally satisfying theory/representation for semantic relations 2

  3. Roadmap • A frame-semantic parser • Multiword expressions • Simplifying annotation for syntax + semantics 3

  4. Frame-semantic parsing SemEval Task 19 [Baker, Ellsworth, & Erk 2007] 4

  5. Frame-semantic parsing SemEval Task 19 [Baker, Ellsworth, & Erk 2007] • Given a text sentence, analyze its frame semantics. Mark: 4

  6. Frame-semantic parsing SemEval Task 19 [Baker, Ellsworth, & Erk 2007] • Given a text sentence, analyze its frame semantics. Mark: ‣ words/phrases that are lexical units 4

  7. Frame-semantic parsing SemEval Task 19 [Baker, Ellsworth, & Erk 2007] • Given a text sentence, analyze its frame semantics. Mark: ‣ words/phrases that are lexical units ‣ frame evoked by each LU 4

  8. Frame-semantic parsing SemEval Task 19 [Baker, Ellsworth, & Erk 2007] • Given a text sentence, analyze its frame semantics. Mark: ‣ words/phrases that are lexical units ‣ frame evoked by each LU ‣ frame elements (role–argument pairings) 4

  9. Frame-semantic parsing SemEval Task 19 [Baker, Ellsworth, & Erk 2007] • Given a text sentence, analyze its frame semantics. Mark: ‣ words/phrases that are lexical units ‣ frame evoked by each LU ‣ frame elements (role–argument pairings) • Analysis is in terms of groups of tokens. No assumption that we know the syntax. 4

  10. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 5

  11. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 5

  12. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 5

  13. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 5

  14. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 6

  15. SEMAFOR [Das, Schneider, Chen, & Smith 2010] ✓ 6

  16. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 6

  17. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 6

  18. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 6

  19. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 6

  20. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 7

  21. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR consists of a pipeline: preprocessing → target identi fi cation → frame identi fi cation → argument identi fi cation 7

  22. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR consists of a pipeline: preprocessing → target identi fi cation → frame identi fi cation → argument identi fi cation • Preprocessing: syntactic parsing 7

  23. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR consists of a pipeline: preprocessing → target identi fi cation → frame identi fi cation → argument identi fi cation • Preprocessing: syntactic parsing • Heuristics + 2 statistical models 7

  24. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR consists of a pipeline: preprocessing → target identi fi cation → frame identi fi cation → argument identi fi cation • Preprocessing: syntactic parsing • Heuristics + 2 statistical models • Trained/tuned on English FrameNet’s full-text annotations 7

  25. Full-text Annotations https://framenet.icsi.berkeley.edu/fndrupal/index.php?q=fulltextIndex 8

  26. Full-text annotations 9

  27. SEMAFOR [Das, Schneider, Chen, & Smith 2010] 10

  28. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR’s models consist of features over observable parts of the sentence (words, lemmas, POS tags, dependency edges & paths) that may be predictive of frame/role labels 10

  29. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR’s models consist of features over observable parts of the sentence (words, lemmas, POS tags, dependency edges & paths) that may be predictive of frame/role labels • Full-text annotations as training data for (semi)supervised learning 10

  30. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • SEMAFOR’s models consist of features over observable parts of the sentence (words, lemmas, POS tags, dependency edges & paths) that may be predictive of frame/role labels • Full-text annotations as training data for (semi)supervised learning • Extensive body of work on semantic role labeling [starting with Gildea & Jurafsky 2002 for FrameNet; also much work for PropBank] 10

  31. SEMAFOR [Das, Schneider, Chen, & Smith 2010] [Das et al. 2013 to appear] 11

  32. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • State-of-the-art performance on SemEval’07 evaluation (outperforms the best system from the task, Johansson & Nugues 2007) [Das et al. 2013 to appear] 11

  33. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • State-of-the-art performance on SemEval’07 evaluation (outperforms the best system from the task, Johansson & Nugues 2007) • On SE07: [F] 74% [A] 68% [F → A] 46% On FN1.5: [F] 91% [A] 80% [F → A] 69% [Das et al. 2013 to appear] 11

  34. SEMAFOR [Das, Schneider, Chen, & Smith 2010] • State-of-the-art performance on SemEval’07 evaluation (outperforms the best system from the task, Johansson & Nugues 2007) • On SE07: [F] 74% [A] 68% [F → A] 46% On FN1.5: [F] 91% [A] 80% [F → A] 69% [Das et al. 2013 to appear] • BUT: This task is really hard. Room for improvement at all stages. 11

  35. SEMAFOR Demo http://demo.ark.cs.cmu.edu/parse 12

  36. How to improve? • Better modeling with current resources? • Ways to use non-FrameNet resources? • Create new resources? 13

  37. How to improve? • Better modeling with current resources? • Ways to use non-FrameNet resources? • Create new resources? Dipanjan Das Sam Thomson 13

  38. Better Modeling? • We already have over a million features. • better use of syntactic parsers (e.g., better argument span heuristics, considering alternative parses, constituent parsers) • recall-oriented learning? [Mohit et al. 2012 for NER] • better search in decoding [Das, Martins, & Smith 2012] • joint frame ID & argument ID? 14

  39. Use Other Resources? • FN1.5 has just 3k sentences/20k targets in full-text annotations. data sparseness • semisupervised learning: reasoning about unseen predicates with distributional similarity [Das & Smith 2011] • NER? supersense tagging? • use PropBank → FrameNet mappings to get more training data? 15

  40. Roadmap • A frame-semantic parser • Multiword expressions • Simplifying annotation for syntax + semantics 16

  41. Roadmap • A frame-semantic parser • Multiword expressions • Simplifying annotation for new resources syntax + semantics 16

  42. Roadmap • A frame-semantic parser • A frame-semantic parser • Multiword expressions • Multiword expressions • Simplifying annotation for • Simplifying annotation for new resources syntax + semantics syntax + semantics 16

  43. Multiword Expressions Christmas Day.n Losing_it : German measles.n lose it.v along with.prep go ballistic.v also_known_as.a fl ip out.v armed forces.n blow cool.v bear arms.v freak out.v beat up.v double-check.v 17

  44. Multiword Expressions • 926 unique multiword LUs in FrameNet lexicon ‣ 545 w/ space, 222 w/ underscore, 177 w/ hyphen ‣ 361 frames have an LU containing a space, underscore, or hyphen • support constructions like ‘take a walk’: only the N should be frame-evoking [Calzolari et al. 2002] 18

  45. 19

  46. ✗ ✗ 19

  47. 20

  48. ✓ 20

  49. ✓ ✗ 20

  50. ...even though take break.v is listed as an LU! (probably not in training data) 21

  51. ✗ ✗ ...even though take break.v is listed as an LU! (probably not in training data) 21

  52. ✗ ✗ ...even though take break.v is listed as an LU! (probably not in training data) ✗ 21

  53. 22

  54. • There has been a lot of work on speci fi c kinds of MWEs (e.g. noun-noun compounds, phrasal verbs) [Baldwin & Kim, 2010] 22

  55. • There has been a lot of work on speci fi c kinds of MWEs (e.g. noun-noun compounds, phrasal verbs) [Baldwin & Kim, 2010] ‣ Special datasets, tasks, tools 22

  56. • There has been a lot of work on speci fi c kinds of MWEs (e.g. noun-noun compounds, phrasal verbs) [Baldwin & Kim, 2010] ‣ Special datasets, tasks, tools • Can MWE identi fi cation be formulated in an open-ended annotate-and-model fashion? 22

  57. • There has been a lot of work on speci fi c kinds of MWEs (e.g. noun-noun compounds, phrasal verbs) [Baldwin & Kim, 2010] ‣ Special datasets, tasks, tools • Can MWE identi fi cation be formulated in an open-ended annotate-and-model fashion? ‣ Linguistic challenge: understanding and guiding annotators’ intuitions 22

  58. MWE Annotation • We are annotating the 50k-word Reviews portion of the English Web Treebank with multiword units (MWEs + NEs) 23

  59. MWE Annotation 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend