towards an efficient combination of similarity measures
play

Towards an Efficient Combination of Similarity Measures for Semantic - PowerPoint PPT Presentation

Introduction Methodology Evaluation Results Conclusion and Further Research Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Universit


  1. Introduction Methodology Evaluation Results Conclusion and Further Research Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Université catholique de Louvain & Bauman Moscow State Technical University 5th December 2011 / CLAIM Seminar, BMSTU Alexander Panchenko 1/30

  2. Introduction Methodology Evaluation Results Conclusion and Further Research Plan Introduction 1 Methodology 2 Evaluation 3 Results 4 Conclusion and Further Research 5 Alexander Panchenko 2/30

  3. Introduction Methodology Evaluation Results Conclusion and Further Research Reference Papers Panchenko A. Method for Automatic Construction of Semantic Relations Between Concepts of an Information Retrieval Thesaurus . // In Herald of the Voronezh State University. Series “Systems Analysis and Information Technologies”, vol.2, pages 131–139, 2011. http://www.vestnik.vsu.ru/program/view/view.asp?sec= analiz & year=2010 & num=02 & f_name=2010-02-26 Panchenko A. Comparison of the Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction // Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011 , pages 11-21, 2011. http://aclweb.org/anthology/W/W11/W11-2502.pdf Panchenko A. Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction // Submitted to the Student Workshop of EACL 2012 . Alexander Panchenko 3/30

  4. Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations r = � c i , t , c j � – semantic relation , where c i , c j ∈ C , t ∈ T C – terms e.g. radio or receiver operating characteristic T – semantic relation types , e.g. hyponymy or synonymy R ⊆ C × T × C – set of semantic relations Alexander Panchenko 4/30

  5. Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. Alexander Panchenko 5/30

  6. Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. R = � energy-generating product, NT, energy industry � � energy technology, NT, energy industry � � petrolium, RT, fossil fuel � � energy technology, RT, oil technology � ... Alexander Panchenko 5/30

  7. Introduction Methodology Evaluation Results Conclusion and Further Research General Problem: Automatic Thesaurus Construction Figure: A technology of automatic thesaurus construction. How thesaurus is used? Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus Alexander Panchenko 6/30

  8. Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Alexander Panchenko 7/30

  9. Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Pattern-based relations extraction , where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision (–) Complexity and cost pattern construction (–) Patterns are highly task and domain dependent Alexander Panchenko 7/30

  10. Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Pattern-based relations extraction , where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision (–) Complexity and cost pattern construction (–) Patterns are highly task and domain dependent Similarity-based relation extraction (Philippovich and Prokhorov, 2002; Grefenstette, 1994; Curran and Moens, 2002) (–) Less precise (+) Little or no manual work (+) More adaptive across domains Alexander Panchenko 7/30

  11. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Research Questions : Alexander Panchenko 8/30

  12. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. Research Questions : Alexander Panchenko 8/30

  13. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Alexander Panchenko 8/30

  14. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Which similarity measure is the best for relation extraction ? Alexander Panchenko 8/30

  15. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Which similarity measure is the best for relation extraction ? How to efficiently combine similarity measures so as to improve relation extraction? Alexander Panchenko 8/30

  16. Introduction Methodology Evaluation Results Conclusion and Further Research The Key Contributions Up To Now A protocol for evaluation of the similarity-based relation extraction Comparison of 34 single measures Two methods of combination – similarity and relation fusion Six best combinations outperforming single measures are found Alexander Panchenko 9/30

  17. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; Alexander Panchenko 10/30

  18. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure Alexander Panchenko 10/30

  19. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure normalize – similarity score normalization Alexander Panchenko 10/30

  20. Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure normalize – similarity score normalization threshold – kNN thresholding R = � | C | i = 1 { � c i , t , c j � : c j ∈ top k % terms ∧ s ij ≥ γ } . Alexander Panchenko 10/30

  21. Introduction Methodology Evaluation Results Conclusion and Further Research Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Alexander Panchenko 11/30

  22. Introduction Methodology Evaluation Results Conclusion and Further Research Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len ( c i , c j ) – length of the shortest path between terms c i and c j len ( c i , lcs ( c i , c j )) – length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len ( c root , lcs ( c i , c j )) – length of the shortest path from the root term c root to the LCS of c i and c j P ( c ) – probability of the term c , estimated from a corpus P ( lcs ( c i , c j )) – probability of the LCS of c i and c j Alexander Panchenko 11/30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend