Towards an Efficient Combination of Similarity Measures for Semantic - PowerPoint PPT Presentation

Introduction Methodology Evaluation Results Conclusion and Further Research Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Université catholique de Louvain & Bauman Moscow State Technical University 5th December 2011 / CLAIM Seminar, BMSTU Alexander Panchenko 1/30

Introduction Methodology Evaluation Results Conclusion and Further Research Plan Introduction 1 Methodology 2 Evaluation 3 Results 4 Conclusion and Further Research 5 Alexander Panchenko 2/30

Introduction Methodology Evaluation Results Conclusion and Further Research Reference Papers Panchenko A. Method for Automatic Construction of Semantic Relations Between Concepts of an Information Retrieval Thesaurus . // In Herald of the Voronezh State University. Series “Systems Analysis and Information Technologies”, vol.2, pages 131–139, 2011. http://www.vestnik.vsu.ru/program/view/view.asp?sec= analiz & year=2010 & num=02 & f_name=2010-02-26 Panchenko A. Comparison of the Knowledge-, Corpus-, and Web-based Similarity Measures for Semantic Relations Extraction // Proceedings of the GEMS 2011 Workshop on Geometrical Models of Natural Language Semantics, EMNLP 2011 , pages 11-21, 2011. http://aclweb.org/anthology/W/W11/W11-2502.pdf Panchenko A. Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction // Submitted to the Student Workshop of EACL 2012 . Alexander Panchenko 3/30

Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations r = � c i , t , c j � – semantic relation , where c i , c j ∈ C , t ∈ T C – terms e.g. radio or receiver operating characteristic T – semantic relation types , e.g. hyponymy or synonymy R ⊆ C × T × C – set of semantic relations Alexander Panchenko 4/30

Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. Alexander Panchenko 5/30

Introduction Methodology Evaluation Results Conclusion and Further Research Semantic Relations Example: Thesaurus Figure: A part of a the information retrieval thesaurus EuroVoc. R = � energy-generating product, NT, energy industry � � energy technology, NT, energy industry � � petrolium, RT, fossil fuel � � energy technology, RT, oil technology � ... Alexander Panchenko 5/30

Introduction Methodology Evaluation Results Conclusion and Further Research General Problem: Automatic Thesaurus Construction Figure: A technology of automatic thesaurus construction. How thesaurus is used? Query expansion and query suggestion Navigation and browsing on the corpus Visualization of the corpus Alexander Panchenko 6/30

Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Alexander Panchenko 7/30

Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Pattern-based relations extraction , where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision (–) Complexity and cost pattern construction (–) Patterns are highly task and domain dependent Alexander Panchenko 7/30

Introduction Methodology Evaluation Results Conclusion and Further Research The Problem Semantic Relations Extraction Input: terms C , semantic relation types T Ouput: lexico-semantic relations ^ R ∼ R Pattern-based relations extraction , where patterns are built manually (Hearst, 1992) or semi-automatically (Snow, 2004) (+) High precision (–) Complexity and cost pattern construction (–) Patterns are highly task and domain dependent Similarity-based relation extraction (Philippovich and Prokhorov, 2002; Grefenstette, 1994; Curran and Moens, 2002) (–) Less precise (+) Little or no manual work (+) More adaptive across domains Alexander Panchenko 7/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Research Questions : Alexander Panchenko 8/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. Research Questions : Alexander Panchenko 8/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Alexander Panchenko 8/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Which similarity measure is the best for relation extraction ? Alexander Panchenko 8/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Relation Extraction State of the Art : There exist many heterogeneous similarity measures based on corpus, knowledge, web, definitions, etc. Various measures provide complimentary types of semantic information. This suggest their combination . Research Questions : Which similarity measure is the best for relation extraction ? How to efficiently combine similarity measures so as to improve relation extraction? Alexander Panchenko 8/30

Introduction Methodology Evaluation Results Conclusion and Further Research The Key Contributions Up To Now A protocol for evaluation of the similarity-based relation extraction Comparison of 34 single measures Two methods of combination – similarity and relation fusion Six best combinations outperforming single measures are found Alexander Panchenko 9/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; Alexander Panchenko 10/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure Alexander Panchenko 10/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure normalize – similarity score normalization Alexander Panchenko 10/30

Introduction Methodology Evaluation Results Conclusion and Further Research Similarity-based Semantic Relations Extraction Semantic Relations Extraction Algorithm Input : Terms C , Sim.parameters P , Threshold k , Min.similarity value γ Output : Semantic relations ^ R (unlabeled) 1 S ← sim ( C , P ) ; 2 S ← normalize ( S ) ; 3 ^ R ← threshold ( S , k , γ ) ; 4 return ^ R ; sim – a similarity measure normalize – similarity score normalization threshold – kNN thresholding R = � | C | i = 1 { � c i , t , c j � : c j ∈ top k % terms ∧ s ij ≥ γ } . Alexander Panchenko 10/30

Introduction Methodology Evaluation Results Conclusion and Further Research Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Alexander Panchenko 11/30

Introduction Methodology Evaluation Results Conclusion and Further Research Knowledge-based Measures (6) Data: semantic network WordNet 3.0, corpus SemCor. Variables: len ( c i , c j ) – length of the shortest path between terms c i and c j len ( c i , lcs ( c i , c j )) – length of the shortest path from c i to the lowest common subsumer (LCS) of c i and c j len ( c root , lcs ( c i , c j )) – length of the shortest path from the root term c root to the LCS of c i and c j P ( c ) – probability of the term c , estimated from a corpus P ( lcs ( c i , c j )) – probability of the LCS of c i and c j Alexander Panchenko 11/30

Towards an Efficient Combination of Similarity Measures for Semantic - PowerPoint PPT Presentation

Introduction Methodology Evaluation Results Conclusion and Further Research Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Universit

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

(Dis-)Similarity Measures for Description Logics Representation Claudia dAmato Computer

I/O-EFFICIENT SIMILARITY JOIN R. Pagh, N. Pham, F. Silvestri, M. Stckel Similarity Join R = Q

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Topological measures of similarity Erin Wolf Chambers Saint Louis University

MT System Combination Silja Hildebrand MT System Combination System Combination in MT

Similarity Measures There are an enormous number of ways in which we can measure similarity

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

A Semantic Similarity Measure for Formal Ontologies Mark Hall Final presentation for the master

Towards Efficient Distributed Towards Efficient Distributed Simulation in Modelica using

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

COSAF: Foster Care Awareness Julie Agosto Valeri Garcia Assistant Director of Guardian

Where are we today? IOM report in 2000 highlighted nutrition priorities for enhanced coverage

Department of Health Department of Social Services Department of Corrections Correctional Health

An in-Depth Look at Fat Necrosis YO KOK, J JOETHY, I Al JAJEH, KY CHEW, KC TAN, FOO CL 1

Evaluation on five years of implementing the silicosis prevention project in Vietnam (1999-2003)

Polytechnical School Kragujevac Example of good practice From its establishment the School keeps

Cutting & Stamping & Machining DIAM GROUP La Croix de l'Alizier 63500 PALLADUC - France

Med edioban banca ca 9M results as at 31 March 2017 CONT NTINU NUING ING GROW OWTH Milan,

Towards an Efficient Combination of Similarity Measures for Semantic - PowerPoint PPT Presentation

Introduction Methodology Evaluation Results Conclusion and Further Research Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction Alexander Panchenko alexander.panchenko@student.uclouvain.be Universit

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

(Dis-)Similarity Measures for Description Logics Representation Claudia dAmato Computer

I/O-EFFICIENT SIMILARITY JOIN R. Pagh, N. Pham, F. Silvestri, M. Stckel Similarity Join R = Q

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Topological measures of similarity Erin Wolf Chambers Saint Louis University

MT System Combination Silja Hildebrand MT System Combination System Combination in MT

Similarity Measures There are an enormous number of ways in which we can measure similarity

Unification of CSC and SE ABET Effor ts Similarity of CSC and SE Programs Similarity of CSC and

LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE Thanks to: Tan,

COMP9313: Big Data Management High Dimensional Similarity Search Similarity Search Problem

DATA MINING LECTURE 4 Similarity and Distance Recommender Systems SIMILARITY AND DISTANCE

DATA MINING LECTURE 5 Similarity and Distance Sketching, Locality Sensitive Hashing SIMILARITY

A Semantic Similarity Measure for Formal Ontologies Mark Hall Final presentation for the master

Towards Efficient Distributed Towards Efficient Distributed Simulation in Modelica using

Multi-Probe LSH: Efficient Indexing for Efficient Indexing for Multi-Probe LSH:

COSAF: Foster Care Awareness Julie Agosto Valeri Garcia Assistant Director of Guardian

Where are we today? IOM report in 2000 highlighted nutrition priorities for enhanced coverage

Department of Health Department of Social Services Department of Corrections Correctional Health

An in-Depth Look at Fat Necrosis YO KOK, J JOETHY, I Al JAJEH, KY CHEW, KC TAN, FOO CL 1

Evaluation on five years of implementing the silicosis prevention project in Vietnam (1999-2003)

Polytechnical School Kragujevac Example of good practice From its establishment the School keeps

Cutting &amp; Stamping &amp; Machining DIAM GROUP La Croix de l'Alizier 63500 PALLADUC - France

Med edioban banca ca 9M results as at 31 March 2017 CONT NTINU NUING ING GROW OWTH Milan,

Cutting & Stamping & Machining DIAM GROUP La Croix de l'Alizier 63500 PALLADUC - France