weakly supervised acquisition of labeled class instances
play

Weakly-Supervised Acquisition of Labeled Class Instances for - PowerPoint PPT Presentation

Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction Partha Pratim Talukdar (UPenn) Joseph Reisinger (UT Austin) Marius Pa sca (Google) Deepak Ravichandran (Google) Rahul Bhagat (USC) Fernando


  1. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction Partha Pratim Talukdar (UPenn) Joseph Reisinger (UT Austin) Marius Pa¸ sca (Google) Deepak Ravichandran (Google) Rahul Bhagat (USC) Fernando Pereira (Google) Work done at Google during Summer 2008. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  2. Motivation • (Class, Instance) pairs ( e.g. (pain killer, aspirin) ) can be useful in many applications e.g. web search. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  3. Motivation • (Class, Instance) pairs ( e.g. (pain killer, aspirin) ) can be useful in many applications e.g. web search. • Given an entity/instance, it is often desirable to know its type. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  4. Motivation • (Class, Instance) pairs ( e.g. (pain killer, aspirin) ) can be useful in many applications e.g. web search. • Given an entity/instance, it is often desirable to know its type. • A limited number of classes are not enough: Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  5. Motivation • (Class, Instance) pairs ( e.g. (pain killer, aspirin) ) can be useful in many applications e.g. web search. • Given an entity/instance, it is often desirable to know its type. • A limited number of classes are not enough: • Web search queries include active volcanoes like Kilauea , zoonotic diseases like monkeypox etc., demonstrating general user interest in them. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  6. Motivation • (Class, Instance) pairs ( e.g. (pain killer, aspirin) ) can be useful in many applications e.g. web search. • Given an entity/instance, it is often desirable to know its type. • A limited number of classes are not enough: • Web search queries include active volcanoes like Kilauea , zoonotic diseases like monkeypox etc., demonstrating general user interest in them. • Covering one class at a time (as in standard Named Entity Extraction) is resource intensive and not sufficient. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  7. Motivation • (Class, Instance) pairs ( e.g. (pain killer, aspirin) ) can be useful in many applications e.g. web search. • Given an entity/instance, it is often desirable to know its type. • A limited number of classes are not enough: • Web search queries include active volcanoes like Kilauea , zoonotic diseases like monkeypox etc., demonstrating general user interest in them. • Covering one class at a time (as in standard Named Entity Extraction) is resource intensive and not sufficient. • Need open domain extraction involving large number of classes and large number of instances. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  8. Previous Work Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  9. Previous Work • Named Entity Extraction: small number of classes, extensive supervision. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  10. Previous Work • Named Entity Extraction: small number of classes, extensive supervision. • (Van Durme and Pasca, AAAI 08): open domain extraction, high precision, low recall: precision drops fast with increasing recall. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  11. Previous Work • Named Entity Extraction: small number of classes, extensive supervision. • (Van Durme and Pasca, AAAI 08): open domain extraction, high precision, low recall: precision drops fast with increasing recall. • Our starting point: extractions from (Van Durme and Pasca, 2008). Class Size Examples of Instances Book Publishers 70 Crown Publishing, Kluwer Academic, Prentice Hall, Puffin, . . . Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  12. Objectives Starting with such automatically extracted (class, instance) pairs: Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  13. Objectives Starting with such automatically extracted (class, instance) pairs: • Extract additional instances for existing classes . Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  14. Objectives Starting with such automatically extracted (class, instance) pairs: • Extract additional instances for existing classes . • Identify additional class labels for existing instances . Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  15. Objectives Starting with such automatically extracted (class, instance) pairs: • Extract additional instances for existing classes . • Identify additional class labels for existing instances . • Handle initial pairs from diverse sources and methods. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  16. Objectives Starting with such automatically extracted (class, instance) pairs: • Extract additional instances for existing classes . • Identify additional class labels for existing instances . • Handle initial pairs from diverse sources and methods. • Require minimal human supervision. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  17. Objectives Starting with such automatically extracted (class, instance) pairs: • Extract additional instances for existing classes . • Identify additional class labels for existing instances . • Handle initial pairs from diverse sources and methods. • Require minimal human supervision. • Do all these in a scalable manner. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  18. Objectives Starting with such automatically extracted (class, instance) pairs: • Extract additional instances for existing classes . • Identify additional class labels for existing instances . • Handle initial pairs from diverse sources and methods. • Require minimal human supervision. • Do all these in a scalable manner. • Increase coverage (recall) at comparable quality (precision)! Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  19. Where do we get instances from? Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  20. Where do we get instances from? • A8: Extractions from unstructured text by (Van Durme and Pasca, AAAI 08). Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  21. Where do we get instances from? • A8: Extractions from unstructured text by (Van Durme and Pasca, AAAI 08). • WebTables (Cafarella et al., VLDB 2008) Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  22. Where do we get instances from? • A8: Extractions from unstructured text by (Van Durme and Pasca, AAAI 08). • WebTables (Cafarella et al., VLDB 2008) • 154M HTML tables extracted from the web. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  23. Where do we get instances from? • A8: Extractions from unstructured text by (Van Durme and Pasca, AAAI 08). • WebTables (Cafarella et al., VLDB 2008) • 154M HTML tables extracted from the web. • Rich source of instances, already segmented by webpage creators. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  24. Where do we get instances from? • A8: Extractions from unstructured text by (Van Durme and Pasca, AAAI 08). • WebTables (Cafarella et al., VLDB 2008) • 154M HTML tables extracted from the web. • Rich source of instances, already segmented by webpage creators. • Structured text. Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  25. Assigning class labels to WebTable instances WebTable A8 Year Artist Albums musician . . . . . . Johnny Cash Bob Dylan . . Bob Dylan . . . . . . Johnny Cash . . . Bob Dylan . . . . Score (musician, Johnny Cash) = 0.87 Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  26. Putting together tuples from first phase extractors Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  27. Putting together tuples from first phase extractors • A graph based representation is used: each tuple from A8 and WebTable is a weighted edge, with nodes representing classes and instances. Bob Dylan 0.95 musician 0.87 0.82 Johnny Cash 0.73 singer 0.75 Billy Joel Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  28. Initialization: Seed Labels Marked musician 1.0 Bob Dylan 0.95 Seed Labels musician 0.87 0.82 Johnny Cash 0.73 singer singer 1.0 0.75 Billy Joel Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

  29. Label Propagation: Adsorption (Baluja et al., 2008) Weakly-Supervised Acquisition of Labeled Class Instances for Open-Domain Information Extraction

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend