 
              Hierarchical Link Analysis for Ranking Web Data Renaud Delbru, Nickolai Toupikov, Michele Catasta, Giovanni Tummarello, and Stefan Decker Digital Enterprise Research Institute, Galway June 1, 2010
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Introduction Web of Data There is a growing increase of web data sources ... Linked Open Data cloud; Open Graph protocol; e-commerces (good relations), e-government, ... How to search and retrieve relevant information ? One single query can return million of entities ... ... and users expect only the most relevant ones. Web data search engines (e.g., Sindice) need effective way to rank entities. Partial solution: Popularity-based entity ranking. 1 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Outline: Web Data Model Web Data Model Web Data Graph Dataset Graph Internal and External Node Intra and Inter-Dataset Edge Linkset Two-Layer Model Quantifying the Two-Layer Model 3 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Web Data Graph Figure: Web data graph 4 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Dataset Graph Figure: Dataset graph 5 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Internal and External Node Figure: Internal (red) and external nodes (blue) 6 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Intra and Inter-Dataset Edge Figure: Inter-dataset (orange) and intra-dataset (black) edges 7 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Linkset Figure: Linkset 8 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Two-Layer Model Figure: Two-layer model of the Web of Data 9 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Quantifying the two-layer model Datasets DBpedia 17.7 million of entities Citeseer (RKBExplorer) 2.48 million of entities Geonames 13.8 million of entities Sindice 60 million of entities among 50.000 datasets Dataset Intra Inter DBpedia 88M (93.2%) 6.4M (6.8%) Citeseer 12.9M (77.7%) 3.7M (22.3%) Geonames 59M (98.3%) 1M (1.7%) Sindice 287M (78.8%) 77M (21.2%) Table: Ratio intra / inter dataset links 10 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Outline: The DING Model The DING Model Overview Unsupervised Link Weighting Computing DatasetRank Computing Local EntityRank Combining Dataset Rank and Entity Rank 11 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion The DING Model: Overview DING Principles DING performs entity ranking in three steps: dataset ranks are computed by performing link analysis on the 1 top layer (i.e. the dataset graph); for each dataset, entity ranks are computed by performing link 2 analysis on the local entity collection; the popularity of the dataset is propagated to its entities and 3 combined with their local ranks to estimate a global entity rank. 12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion The DING Model: Overview DING Principles DING performs entity ranking in three steps: dataset ranks are computed by performing link analysis on the 1 top layer (i.e. the dataset graph); for each dataset, entity ranks are computed by performing link 2 analysis on the local entity collection; the popularity of the dataset is propagated to its entities and 3 combined with their local ranks to estimate a global entity rank. 12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion The DING Model: Overview DING Principles DING performs entity ranking in three steps: dataset ranks are computed by performing link analysis on the 1 top layer (i.e. the dataset graph); for each dataset, entity ranks are computed by performing link 2 analysis on the local entity collection; the popularity of the dataset is propagated to its entities and 3 combined with their local ranks to estimate a global entity rank. 12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion The DING Model: Overview DING Principles DING performs entity ranking in three steps: dataset ranks are computed by performing link analysis on the 1 top layer (i.e. the dataset graph); for each dataset, entity ranks are computed by performing link 2 analysis on the local entity collection; the popularity of the dataset is propagated to its entities and 3 combined with their local ranks to estimate a global entity rank. 12 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Unsupervised Link Weighting Intuition TF-IDF applied on link labels Link Frequency - Inverse Dataset Frequency (LF-IDF) Link weighting factor w σ, i , j Assign low weight to very common links, such as rdfs:seeAlso | L σ, i , j | N w σ, i , j = LF ( L σ, i , j ) × IDF ( σ ) = L τ, i , k | L τ, i , k | × log � 1 + freq ( σ ) 13 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Unsupervised Link Weighting Intuition TF-IDF applied on link labels Link Frequency - Inverse Dataset Frequency (LF-IDF) Link weighting factor w σ, i , j Assign low weight to very common links, such as rdfs:seeAlso | L σ, i , j | N w σ, i , j = LF ( L σ, i , j ) × IDF ( σ ) = L τ, i , k | L τ, i , k | × log � 1 + freq ( σ ) 14 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Unsupervised Link Weighting Intuition TF-IDF applied on link labels Link Frequency - Inverse Dataset Frequency (LF-IDF) Link weighting factor w σ, i , j Assign low weight to very common links, such as rdfs:seeAlso | L σ, i , j | N w σ, i , j = LF ( L σ, i , j ) × IDF ( σ ) = L τ, i , k | L τ, i , k | × log � 1 + freq ( σ ) 15 / 36
Introduction Web Data Model The DING Model Experimental Results Scalability Conclusion Computing Dataset Rank Assumption Dataset surfing behaviour is the same as the web page surfing behaviour in PageRank 16 / 36
Recommend
More recommend