Semantic Documents Relatedness using Concept Graph Representation
Date : 2016/07/12 Author : Yuan Ni et al. IBM Research, China Source : ACM WSDM’16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee
Semantic Documents Relatedness using Concept Graph Representation - - PowerPoint PPT Presentation
Semantic Documents Relatedness using Concept Graph Representation Date : 2016/07/12 Author : Yuan Ni et al. IBM Research, China Source : ACM WSDM16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee WSDM 2016 The 9th ACM International
Date : 2016/07/12 Author : Yuan Ni et al. IBM Research, China Source : ACM WSDM’16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee
Search and Data Mining
2
3
4
5
continuous vectors.
6
document through through references to the entity in a knowledge base.
relationships among the concepts.
7
8
Spotlight or TagMe
9
and services in the economy.”
10
…
11
12
through the three kinds of association
13
in same contexts.
context association.
14
Concept Sets of incoming links to the concept Total number of concepts in the knowledge base
taxonomy of categories.
associated with the same topic.
first finding the pairwise similarity between any two individual categories they belong to.
15
black politic president republic concept: Obama
Information content score for the taxonomy Common ancestor of ci and cj with highest information content
information content of a node in the taxonomy.
the taxonomy backbone.
each category in the taxonomy.
higher is its information content.
16
I E
17
The depth of the category in the taxonomy The maximum depth of the taxonomy I The number of descendants of a category in the taxonomy The set of descendants
The set of all categories in the taxonomy I Also the number of instances that belong to a category The number of instances that belong to the category
The total number of instances in DBPedia E
defined as the average best similarity over all pairs.
and C2 = {c21, c22, … c2q} denote the respective sets of categories that m1 and m2 belong to.
18
The maximal pairwise similarity between c1i and any category in C2 The maximal pairwise similarity between c2j and any category in C1 The number of category in the C1 which is related to concept m1 The number of category in the C2 which is related to concept m2
concepts and their various types of relationships.
labeled by predicates pred(e) that indicate the type of the relationship.
19
Frequent predicates represent a general, less significant relationship
three relevance to the aspects of the document.
20
which suit different purposes, different graph properties of a node are considered for the evaluation of its centrality.
21
The set of concepts in the concept graph Weight parameter Weight parameter Weight parameters
similarity between the Wikipedia page of the concept m and the given document d.
mention detection tool.
22
documents in terms of features of concept graphs.
23
neural network to represent concepts as continuous vectors.
Kappa Kappa Psi and Phi Beta Kappa and earned a Rhodes Scholarship to attend the Oxford University.
Kappa_Kappa_Psi and Phi_Beta_ Kappa_Society and earned a Rhodes_Scholarship to attend the University_of_Oxford.
24
concept vectors.
25
similarity and cosine similarity.
26
{m1i}pi=1 is the concepts in the concept graph of D1 {w1i}pi=1 is the weights to the concepts of D1 The best pairwise similarity for m2j (m1i respectively)
from the Australian Broadcasting Corporation (ABC)
generate the Concept2Vector model
27
Parameter Pearson correlation LSA 0.60 GED 0.63 ESA paper 0.720 ESA implemented 0.727 ConceptGraphSim 0.745 WikiWalk + ESA 0.772 ConceptGraphSim + ESA 0.786 ConceptsLearned 0.808
Might be overfitted to the LP50
28
Not domain specific
29
30
31
document using its detected concepts.
ConceptGraphSim between two documents by comparing their concept graphs.
neural networks has a high potential that can be further exploited to achieve better results.
32