knowledge graph embedding for mining cultural heritage
play

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada - PowerPoint PPT Presentation

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada Mimouni and Jean-Claude Moissinac Telecom ParisTech Institut Mines Telecom January 24 th , 2019 DIG - LTCI Knowledge Graph Embedding for Mining Cultural Heritage Data 1 / 34


  1. Knowledge Graph Embedding for Mining Cultural Heritage Data Nada Mimouni and Jean-Claude Moissinac – Telecom ParisTech Institut Mines Telecom January 24 th , 2019 DIG - LTCI Knowledge Graph Embedding for Mining Cultural Heritage Data 1 / 34

  2. Project Data Method Experiments Conclusion Outline Project presentation 1 Data 2 Knowledge Graph Embedding 3 Entities extraction Context graph Graph walks and kernel Neural language model Using the model Experiments and preliminary results 4 Entity similarity and relatedness Entity matching Conclusion 5 Knowledge Graph Embedding for Mining Cultural Heritage Data 2 / 34

  3. Project Data Method Experiments Conclusion Outline Project presentation 1 Data 2 Knowledge Graph Embedding 3 Entities extraction Context graph Graph walks and kernel Neural language model Using the model Experiments and preliminary results 4 Entity similarity and relatedness Entity matching Conclusion 5 Knowledge Graph Embedding for Mining Cultural Heritage Data 3 / 34

  4. Project Data Method Experiments Conclusion Project presentation Knowledge Graph Embedding for Mining Cultural Heritage Data 4 / 34

  5. Project Data Method Experiments Conclusion Project presentation Knowledge Graph Embedding for Mining Cultural Heritage Data 5 / 34

  6. Project Data Method Experiments Conclusion Outline Project presentation 1 Data 2 Knowledge Graph Embedding 3 Entities extraction Context graph Graph walks and kernel Neural language model Using the model Experiments and preliminary results 4 Entity similarity and relatedness Entity matching Conclusion 5 Knowledge Graph Embedding for Mining Cultural Heritage Data 6 / 34

  7. Project Data Method Experiments Conclusion Data Gather data from institutions: Collect data respecting privacy Adopt homogeneous representations to make the data comparable Choose a model able to represent links between data Rely on external data: DataTourism , tourist office data on places and events OpenAgenda , and other event calendar Joconde database, and other cultural data General knowledge bases: DBPedia , Wikidata , ... Geographical knowledge bases: geonames , data on data.gouv.fr ... Knowledge Graph Embedding for Mining Cultural Heritage Data 7 / 34

  8. Project Data Method Experiments Conclusion A simple example of links generation Knowledge Graph Embedding for Mining Cultural Heritage Data 8 / 34

  9. Project Data Method Experiments Conclusion Objectives Questions: How to collect, integrate and enrich this complex and large amount of data? How to mine such type of data to extract useful information? Hypothesis: Integrate external data source to enhance the quality of the original data; Limit the analysis to a specified context help boosting performance. Knowledge Graph Embedding for Mining Cultural Heritage Data 9 / 34

  10. Project Data Method Experiments Conclusion Approach Represent instances as a set of n-dimensional numerical feature vectors Use representation with different ML tasks Adapt neural language model : Word2vec 1 Transform RDF graph into sequences of entities and relations 2 (sentences) Train the model and generate entity vectors 3 + Conserve the information in the original graph + Semantically similar/related entities have close vectors in the embedded space + Generate a reusable model, that could be enriched with new entities Knowledge Graph Embedding for Mining Cultural Heritage Data 10 / 34

  11. Project Data Method Experiments Conclusion Outline Project presentation 1 Data 2 Knowledge Graph Embedding 3 Entities extraction Context graph Graph walks and kernel Neural language model Using the model Experiments and preliminary results 4 Entity similarity and relatedness Entity matching Conclusion 5 Knowledge Graph Embedding for Mining Cultural Heritage Data 11 / 34

  12. Project Data Method Experiments Conclusion Knowledge graph embedding process Paris 1 CMN Musée KG Recommandation completion Input Data 2 Extract entities Similarity / Link Community 7 Relatedness prediction detection 3 Build context graph 4 Generate walks . . . . . . . . . . . . ... random tf-idf black-list kernel 6 V n V1 V 2 V 3 5 Train neural language model Entities feature vectors Knowledge Graph Embedding for Mining Cultural Heritage Data 12 / 34

  13. Project Data Method Experiments Conclusion Extract entities (2) Identify entities’ URIs from input data URI exist: read and identify URI from data files 1 URI ! exist: use entity name to build URI (dbpedia, frdbpedia, wikidata) 2 Knowledge Graph Embedding for Mining Cultural Heritage Data 13 / 34

  14. Project Data Method Experiments Conclusion Build context graph (3) For each entity URI: Build context from a generalized data source , ’around’ the entity Data source: e.g. DBpedia ’around’: get neighbours in the graph within α hops Consider the undirected graph α = 1 or 2 Define a black-list to ignore predicates and objects: very general, e.g. <http://www.w3.org/2002/07/owl#Thing> non-informative, e.g. <http://fr.dbpedia.org/resource/Mod` ele:P.> noisy, e.g. <http://www.w3.org/2000/01/rdf-schema#comment> Knowledge Graph Embedding for Mining Cultural Heritage Data 14 / 34

  15. Project Data Method Experiments Conclusion Merge context graphs (3) e 8 e 7 e 8 e 7 e 5 e 14 e 4 e 4 e 3 e 13 e 6 e y e x e 12 e 9 e 6 e 5 e 1 e 10 e 2 e 3 Context graph of entity e y Context graph of entity e x e 12 e 14 e 13 e 7 e 5 e y e 8 e 4 e 3 e 6 e x e 10 e 9 e 1 e 2 Global context graph Knowledge Graph Embedding for Mining Cultural Heritage Data 15 / 34

  16. Project Data Method Experiments Conclusion Generate walks (4) Paris 1 CMN Musée KG Recommandation completion Input Data Extract entities 2 Similarity / Link Community 7 Relatedness prediction detection 3 Build context graph 4 Generate walks . . . . . . . . . . . . ... tf-idf black-list kernel random 6 V n V1 V 2 V 3 5 Train neural language model Entities feature vectors Knowledge Graph Embedding for Mining Cultural Heritage Data 16 / 34

  17. Project Data Method Experiments Conclusion Random walk (4) Intuition: all neighbours are equally important for an entity Specify walk parameters nb-walks: number of walks (example: 500 walk) depth: number of hops in the graph (2, 4, 8) example: d=4 ⇒ e → p 1 → e 1 → p 2 → e 2 Specify the list of entities (all entities in the global context graph / a predefined list) For each entity: get a random list of direct neighbours 1 calculate the corresponding number of walks for each neighbour 2 recursively.. 3 Adjust the number of walks according to specific cases: if (nb-neighbours < nb-walks) : divide, get the entire part of the division, sum-up the rest and add it to a randomly selected neighbour if (nb-neighbours == 0) : transfer its nb-walks to another randomly selected neighbour Knowledge Graph Embedding for Mining Cultural Heritage Data 17 / 34

  18. Project Data Method Experiments Conclusion Tf-Idf graph walk (4) Intuition: Some neighbours are more important for an entity. Prioritize important neighbours by weighting their predicates. Calculate tf-idf weights for predicates tf : evaluate the importance of a predicate p for an entity e t o ( p , e ) = number of p occurrences for entity e t p ( e ) = number of predicates associated with e tf ( p , e ) = t o ( p , e ) / t p ( e ) idf : evaluate the importance of a predicate p on the whole graph D = number of entities in the graph d ( p ) = number of entities using predicate p idf ( p ) = log ( D / d ( p )) tfidf ( p , e ) = tf ( p , e ) ∗ idf ( p ) Knowledge Graph Embedding for Mining Cultural Heritage Data 18 / 34

  19. Project Data Method Experiments Conclusion Black-list walk (4) Intuition: some predicates are noisy (less important) for an entity Put weights on predicates: predicate in the black-list: weight = 0 (to ignore) other predicate: weight = 1 (to consider in the walk) Example: { http://dbpedia.org/ontology/wikiPageWikiLink } Knowledge Graph Embedding for Mining Cultural Heritage Data 19 / 34

  20. Project Data Method Experiments Conclusion Weisfeiler-Lehman kernel (4) Intuition: Weisfeiler-Lehman subtree RDF graph kernels capture (richer) information of an entire subtree in a single node. de Vries, Gerben K. D., ”A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data”, ECML PKDD 2013. Knowledge Graph Embedding for Mining Cultural Heritage Data 20 / 34

  21. Project Data Method Experiments Conclusion Weisfeiler-Lehman kernel (4) For each iteration, for each entity in the graph, get random walks of depth d After 1 iteration, graph G sequences: 1 − > 6 − > 11; 1 − > 6 − > 11 − > 13; 1 − > 6 − > 11 − > 10; ... 4 − > 11 − > 6; 4 − > 11 − > 13; 4 − > 11 − > 10; 4 − > 11 − > 10 − > 8; ... Ristoski, Paulheim, ”RDF2Vec: RDF Graph Embeddings for Data Mining”, ISWC 2016. Knowledge Graph Embedding for Mining Cultural Heritage Data 21 / 34

  22. Project Data Method Experiments Conclusion Neural language model (5,6) Word2vec A two-layer neural net that processes text Input: a text corpus (sentences) Output: a set of vectors (feature vectors for words in that corpus) Create neural embeddings for any group of discrete and co-occurring states → RDF data Knowledge Graph Embedding for Mining Cultural Heritage Data 22 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend