improving information extraction by acquiring external
play

Improving Information Extraction by Acquiring External Evidence - PowerPoint PPT Presentation

Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay CSAIL, MIT 1 Information Extraction: State of the Art Dependence on large training sets ACE: 300K


  1. Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay CSAIL, MIT 1

  2. Information Extraction: State of the Art • Dependence on large training sets ACE: 300K words Freebase: 24M rela6ons Not available for many domains (ex. medicine, crime) • Even large corpora do not guarantee high performance ~ 75% F1 on relation extraction (ACE) ~ 58% F1 on event extraction (ACE) 2

  3. A hard reading task for you Task : Identify food carcinogens Coffee significantly reduced ER and cyclin D1 abundance in ER(+) cells … Coffee reduced the pAkt levels in both ER(+) and ER(-) cells. 3

  4. A hard reading task for you Task : Identify food carcinogens Coffee significantly reduced ER and cyclin D1 abundance in ER(+) cells … Coffee reduced the pAkt levels in both ER(+) and ER(-) cells. Is coffee a carcinogen? 3

  5. A hard reading task for machines: IE Extraction (NumWounded) A 2 year old girl and four other people were wounded in a shoo6ng in West four Englewood Thursday night, police said 4

  6. A hard reading task: IE (not always!) Extraction (NumWounded) A 2 year old girl and four other people were wounded in a shoo6ng in West four Englewood Thursday night, police said The last shoo6ng leP five people five wounded. 5

  7. Incorporate External Evidence extract + reason Traditional formulation extract agg. Our approach extra articles 6

  8. Challenges 1. Event Coreference 2. Reconciling Predictions Shooter : Scott Westerhuis NumKilled : 4 Location : S.D Shooter : Scott Westerhuis NumKilled : 6 Location : Platte Several irrelevant articles! Inconsistent extractions 7

  9. Learning through Reinforcement Original extract Shooter : Scott Westerhuis NumKilled : 4 Location : S.D Start with traditional extraction system 8

  10. Learning through Reinforcement Original extract Shooter : Scott Westerhuis NumKilled : 4 Location : S.D extract Shooter : Scott Westerhuis query NumKilled : 6 Location : Platte Perform a query and extract from a new article 9

  11. Learning through Reinforcement Current Original extract Shooter : Scott Westerhuis NumKilled : 4 Location : S.D extract Shooter : Scott Westerhuis search NumKilled : 6 Location : Platte State New 10

  12. RL: State Conf 0.3 Shooter : Scott Westerhuis Curr 0.2 NumKilled : 4 Location : S.D 0.1 0.4 Shooter : Scott Westerhuis New NumKilled : 6 0.6 Location : Platte 0.3 State 11

  13. RL: State 0.3 0.2 currentConf 0.1 Conf 0.3 Shooter : Scott Westerhuis Curr 0.2 NumKilled : 4 Location : S.D 0.1 0.4 Shooter : Scott Westerhuis New NumKilled : 6 0.6 Location : Platte 0.3 State 11

  14. RL: State 0.3 0.2 currentConf 0.1 Conf 0.4 0.3 Shooter : Scott Westerhuis 0.6 newConf Curr 0.2 NumKilled : 4 0.3 Location : S.D 0.1 0.4 Shooter : Scott Westerhuis New NumKilled : 6 0.6 Location : Platte 0.3 State 11

  15. RL: State 0.3 0.2 currentConf 0.1 Conf 0.4 0.3 Shooter : Scott Westerhuis 0.6 newConf Curr 0.2 NumKilled : 4 0.3 Location : S.D 0.1 1 0 matches 0.4 0 Shooter : Scott Westerhuis New NumKilled : 6 0.6 Location : Platte 0.3 State 11

  16. RL: State 0.3 0.2 currentConf 0.1 Conf 0.4 0.3 Shooter : Scott Westerhuis 0.6 newConf Curr 0.2 NumKilled : 4 0.3 Location : S.D 0.1 1 0 matches 0.4 0 Shooter : Scott Westerhuis New NumKilled : 6 0.6 0.65 docSim Location : Platte 0.3 State 11

  17. RL: State 0.3 0.2 currentConf 0.1 Conf 0.4 0.3 Shooter : Scott Westerhuis 0.6 newConf Curr 0.2 NumKilled : 4 0.3 Location : S.D 0.1 1 0 matches 0.4 0 Shooter : Scott Westerhuis New NumKilled : 6 0.6 0.65 docSim Location : Platte 0.3 1 0 State .. context 0 0 11

  18. RL: Actions Shooter : Scott Westerhuis Shooter : Scott Westerhuis reconcile Curr NumKilled : 4 NumKilled : 6 Location : S.D Location : S.D Shooter : Scott Westerhuis New NumKilled : 6 Location : Platte State 1 1. Reconcile (d) old values and new values. Pick a single value, all values or no value from new set 12

  19. RL: Actions Shooter : Scott Westerhuis Shooter : Scott Westerhuis reconcile Curr NumKilled : 4 NumKilled : 6 Location : S.D Location : S.D Final Shooter : Scott Westerhuis New NumKilled : 6 Location : Platte State 1 2. Decide how to proceed: Stop 13

  20. RL: Actions Shooter : Scott Westerhuis Shooter : Scott Westerhuis reconcile Curr NumKilled : 4 NumKilled : 6 Location : S.D Location : S.D select Shooter : Westerhuis Shooter : Scott Westerhuis q New NumKilled : 6 NumKilled : 4 extract Location : Platte Location : Platte search State 2 State 1 2. Decide how to proceed: Select next query (q) 14

  21. Queries Query templates are induced automa<cally • Title of original ar6cle • Content words having high mutual informa6on with gold values <title> <title> + ( suspect | shooter | said | men | arrested | …) <title> + ( injured | wounded | victims | shot | … ) 15

  22. Rewards • Change in accuracy Current Values Previous Values Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 6 NumKilled : 6 NumWounded : 1 NumWounded : 0 Location : Platte Location : Platte X = 1 Acc( e j cur ) − Acc( e j R ( s, a ) = prev ) entity j • Small penalty for each transi6on 16

  23. Deep Q-Network State space is continuous: requires function approximation (reconcile) (query) Q ( s, a ) ≈ Q ( s, a ; θ ) Trained to maximize cumulative reward 17

  24. Acquiring External Evidence 1. Select a query to search for articles on the same event 2. Use base extractor to obtain values for entities of interest Shooter : Scott Westerhuis extract NumKilled : 6 Location : Platte 3. Reconcile old and new extractions Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 4 NumKilled : 6 Location : S.D Location : Platte 18

  25. Related Work • Open Information Extraction (Etzioni et al., 2011; Fader et al., 2011; Wu and Weld, 2010) 19

  26. Related Work • Open Information Extraction (Etzioni et al., 2011; Fader et al., 2011; Wu and Weld, 2010) • Slot filling (Surdeanu et al., 2010; Ji and Grishman, 2011) 19

  27. Related Work • Open Information Extraction (Etzioni et al., 2011; Fader et al., 2011; Wu and Weld, 2010) • Slot filling (Surdeanu et al., 2010; Ji and Grishman, 2011) • Searching for additional sources on the web (Banko et al., 2002, West et al., 2014; Kanani and McCallum, 2012) 19

  28. Datasets 1. Mass shootings in the United States Shooter Name Num Killed Num Wounded City Train Test Dev Source 306 292 66 Downloaded 8k 7.9k 1.6k 20

  29. Datasets 2. Adulteration events from Foodshield EMA Food Adulterant Location Train Test Dev Source 292 148 42 Downloaded 7.6k 5.3k 1.5k 21

  30. Base Extraction Model Maximum entropy model with contextual features (Chieu and Ng, 2002; Bunescu et al., 2005) Indirect supervision: Project database values onto articles 22

  31. Baselines (1) Simple Aggregation systems: • Confidence-based : Choose entity value with highest confidence 0.3 Shooter : Scott Westerhuis Original NumKilled : 4 0.2 Location : S.D 0.1 Shooter : Scott Westerhuis 0.4 Shooter : Scott Westerhuis NumKilled : 6 0.6 NumKilled : 6 Location : Platte Location : Platte 0.3 Extra Final 0.7 Shooter : Scott Westerhuis 0.2 NumKilled : 6 Location : S.D (Skounakis and Craven, 2003) 0.1 23

  32. Baselines (1) Simple Aggregation systems: • Majority-based : Choose entity value extracted the most from all articles on the event Shooter : Scott Westerhuis Original NumKilled : 4 Location : S.D Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 6 NumKilled : 6 Location : S.D Location : Platte Extra Final Shooter : Scott Westerhuis NumKilled : 6 Location : S.D (Skounakis and Craven, 2003) 24

  33. Baselines (2) Meta-classifier: • Same input space S and set of reconciliation decisions as RL agent. Original Extra Reconciled Shooter : Scott Westerhuis Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 4 NumKilled : 6 NumKilled : 6 Location : S.D Location : Platte Location : Platte Shooter : Westerhuis Shooter : Westerhuis Shooter : Scott Westerhuis NumKilled : 0 NumKilled : 4 NumKilled : 4 Location : S.D Location : Platte Location : Platte Shooter : Scott Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 2 NumKilled : 4 NumKilled : 2 Location : S.D Location : S.D Location : S.D 25

  34. Baselines (2) Meta-classifier: • Same input space S and set of reconciliation decisions as RL agent. Original Extra Reconciled Confidence agg. Shooter : Scott Westerhuis Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 4 NumKilled : 6 NumKilled : 6 Location : S.D Location : Platte Location : Platte Shooter : Westerhuis Shooter : Westerhuis Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 0 NumKilled : 4 NumKilled : 4 NumKilled : 6 Location : S.D Location : Platte Location : Platte Location : Platte Final Shooter : Scott Shooter : Scott Westerhuis Shooter : Scott Westerhuis NumKilled : 2 NumKilled : 4 NumKilled : 2 Location : S.D Location : S.D Location : S.D 26

  35. Accuracy (Shootings) 80 NumKilled 77.6 75 Accuracy 70.7 70 70.3 69.7 65 60 Maxent Confidence Agg. Meta-Classifier RL-Extract 27

  36. Accuracy (Shootings) 80 NumKilled 77.6 75 Accuracy 70.7 70 70.3 69.7 65 60 Maxent Confidence Agg. Meta-Classifier RL-Extract 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend