hybrid information extraction
play

Hybrid Information Extraction PD Dr. Gnter Neumann DFKI GmbH - PowerPoint PPT Presentation

Hybrid Information Extraction PD Dr. Gnter Neumann DFKI GmbH Dienstag, 8. Februar 2011 Hybrid Is a system, if consists of different technologies can be combined each one depicts a solution by its own the integration


  1. Hybrid Information Extraction PD Dr. Günter Neumann DFKI GmbH Dienstag, 8. Februar 2011

  2. Hybrid • Is a system, if consists of different technologies • can be combined • each one depicts a solution by its own • the integration constitute an innovative plus for the whole system Dienstag, 8. Februar 2011

  3. Examples Dienstag, 8. Februar 2011

  4. Examples hybrid engine Dienstag, 8. Februar 2011

  5. Examples hybrid engine HumanMachine Dienstag, 8. Februar 2011

  6. Examples hybrid engine Hybrid Language Processing HumanMachine Dienstag, 8. Februar 2011

  7. Information Extraction • The aim of information extraction (IE) is the identification and structuring of domain specific information from free text by skipping irrelevant information at the same time. • What counts as relevant is given to the system in form of pre-defined domain specific annotations, lexicon entries or rules. Dienstag, 8. Februar 2011

  8. Example: news about turnover Dienstag, 8. Februar 2011

  9. Example: news about turnover turnover(Company, Year, Manner, Amount, Tendendcy, Differnce) Dienstag, 8. Februar 2011

  10. Example: news about turnover turnover(Company, Year, Manner, Amount, Tendendcy, Differnce) Eine Mixtur aus wachsendem Dienstleistungsgeschäft, Kostensenkungen und erfolgreichen Akquisitionen brachte Wettbewerber IBM im zweiten Quartal deutlich verbesserte Ergebnisse. Zwischen April und Juni stiegen der Umsatz um 10% auf 21,6 Mrd.$ und der Reingewinn auf 1,7 Mrd.$. Sonderlasten in Höhe von 1,4 Mrd.$ hatten den Vorjahresgewinn auf 56 Mill.$ gedrückt. Dienstag, 8. Februar 2011

  11. Example: news about turnover turnover(Company, Year, Manner, Amount, Tendendcy, Differnce) Eine Mixtur aus wachsendem Dienstleistungsgeschäft, Kostensenkungen und erfolgreichen Akquisitionen brachte Wettbewerber IBM im zweiten Quartal deutlich verbesserte Ergebnisse. Zwischen April und Juni stiegen der Umsatz um 10% auf 21,6 Mrd.$ und der Reingewinn auf 1,7 Mrd.$. Sonderlasten in Höhe von 1,4 Mrd.$ hatten den Vorjahresgewinn auf 56 Mill.$ gedrückt. Dienstag, 8. Februar 2011

  12. Example: news about turnover turnover(Company, Year, Manner, Amount, Tendendcy, Differnce) Eine Mixtur aus wachsendem Dienstleistungsgeschäft, Kostensenkungen und erfolgreichen Akquisitionen brachte Wettbewerber IBM im zweiten Quartal deutlich verbesserte Ergebnisse. Zwischen April und Juni stiegen der Umsatz um 10% auf 21,6 Mrd.$ und der Reingewinn auf 1,7 Mrd.$. Sonderlasten in Höhe von 1,4 Mrd.$ hatten den Vorjahresgewinn auf 56 Mill.$ gedrückt. Dienstag, 8. Februar 2011

  13. IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was a systematic division of labor into IE subtasks • Named-Entity Extraction (NER) • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Coreferential analysis Dienstag, 8. Februar 2011

  14. IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was a systematic division of labor into IE subtasks • Named-Entity Extraction (NER) • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Coreferential analysis The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  15. IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was Bill Gates a systematic division of labor into IE is a Person subtasks • Named-Entity Extraction (NER) • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Coreferential analysis The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  16. IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was Bill Gates a systematic division of labor into IE is a Person subtasks • Named-Entity Extraction (NER) • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Microsoft Coreferential analysis is a Company The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  17. Seattle is a Location IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was Bill Gates a systematic division of labor into IE is a Person subtasks • Named-Entity Extraction (NER) • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Microsoft Coreferential analysis is a Company The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  18. Seattle is a Location IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was Bill Gates a systematic division of labor into IE is a Person subtasks • Named-Entity Extraction (NER) founder_of • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Microsoft Coreferential analysis is a Company The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  19. Seattle is a Location IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was lives_in specialized for specific applications, cf. MUC systems of the 90th. • One result of the MUC challenges was Bill Gates a systematic division of labor into IE is a Person subtasks • Named-Entity Extraction (NER) founder_of • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Microsoft Coreferential analysis is a Company The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  20. Seattle is a Location IE - History • Early IE-systems were mainly rule- based (manual or learned) and the underlying methodology was lives_in specialized for specific applications, cf. MUC systems of the 90th. hq_located_in • One result of the MUC challenges was Bill Gates a systematic division of labor into IE is a Person subtasks • Named-Entity Extraction (NER) founder_of • Relation Entity Extraction (REE) • Event Entity Extraction (EEE) • Microsoft Coreferential analysis is a Company The founder of Microsoft, Bill Gates, lives in Seattle, Washington, which is also the place of the company‘s headquarter. Dienstag, 8. Februar 2011

  21. IE - the Present • There exists knowledge-based IE (KIE) and statistical IE (SIE) • SIE is the State-of-the-Art in research, WIE in industry • There exists a number of different strategies for the various IE- subtasks • from simple gazetteers to complex ontologies • from supervised, to minimal supervised to unsupervised Machine Learning algorithms • Recently, the research focus is on NER, REE, Web-based IE, scalability, domain adaptivity, ... • Open question: Which method is actually better suited for which text source, domain and application? Dienstag, 8. Februar 2011

  22. Hybrid IE • Methods and strategies for the combination of different IE-components and the analysis of their plausibility. • What are possible combinations ? Dienstag, 8. Februar 2011

  23. Multi-Strategy Dienstag, 8. Februar 2011

  24. Multi-Strategy IE IE IE IE Dienstag, 8. Februar 2011

  25. Multi-Strategy Combiner IE IE IE IE Dienstag, 8. Februar 2011

  26. Multi-Strategy Combiner IE IE IE IE Dienstag, 8. Februar 2011

  27. Example: NER Combiner Ling Open BiQue Sprout NLP Pipe Dienstag, 8. Februar 2011

  28. Example: NER LOC 2 PER 3 Wort1 Wort2 Wort3 Wort4 Wort5 LOC 3 ORG 4 LOC 5 Problem: Combiner - Ambiguities - Bracketing Ling Open BiQue Sprout NLP Pipe Dienstag, 8. Februar 2011

  29. Example: NER LOC 2 PER 3 Solutions: Wort1 Wort2 Wort3 Wort4 Wort5 - meta-learning - consider IE i as independent LOC 3 ORG 4 LOC 5 black-boxes Problem: Combiner - Ambiguities - Bracketing Ling Open BiQue Sprout NLP Pipe Dienstag, 8. Februar 2011

  30. Example: NER LOC 2 PER 3 Wort1 Wort2 Wort3 Wort4 Wort5 LOC 3 ORG 4 LOC 5 Problem: Combiner - Ambiguities - Bracketing Ling Open BiQue Sprout NLP Pipe Good news:* Hybrid NER are better than the single NER wrt. Combining Information Extraction Systems Using Voting and Stacked Generalization recall and precision. by: G Sigletos et al., J. Mach. Learn. Res., Vol. 6 (2005), pp. 1751-1782. Dienstag, 8. Februar 2011

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend