Knowledge Graph Embedding for Mining Cultural Heritage Data Nada - PowerPoint PPT Presentation

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada Mimouni and Jean-Claude Moissinac – Telecom ParisTech Institut Mines Telecom January 24 th , 2019 DIG - LTCI Knowledge Graph Embedding for Mining Cultural Heritage Data 1 / 34

Project Data Method Experiments Conclusion Outline Project presentation 1 Data 2 Knowledge Graph Embedding 3 Entities extraction Context graph Graph walks and kernel Neural language model Using the model Experiments and preliminary results 4 Entity similarity and relatedness Entity matching Conclusion 5 Knowledge Graph Embedding for Mining Cultural Heritage Data 2 / 34

Project Data Method Experiments Conclusion Project presentation Knowledge Graph Embedding for Mining Cultural Heritage Data 4 / 34

Project Data Method Experiments Conclusion Project presentation Knowledge Graph Embedding for Mining Cultural Heritage Data 5 / 34

Project Data Method Experiments Conclusion Data Gather data from institutions: Collect data respecting privacy Adopt homogeneous representations to make the data comparable Choose a model able to represent links between data Rely on external data: DataTourism , tourist office data on places and events OpenAgenda , and other event calendar Joconde database, and other cultural data General knowledge bases: DBPedia , Wikidata , ... Geographical knowledge bases: geonames , data on data.gouv.fr ... Knowledge Graph Embedding for Mining Cultural Heritage Data 7 / 34

Project Data Method Experiments Conclusion A simple example of links generation Knowledge Graph Embedding for Mining Cultural Heritage Data 8 / 34

Project Data Method Experiments Conclusion Objectives Questions: How to collect, integrate and enrich this complex and large amount of data? How to mine such type of data to extract useful information? Hypothesis: Integrate external data source to enhance the quality of the original data; Limit the analysis to a specified context help boosting performance. Knowledge Graph Embedding for Mining Cultural Heritage Data 9 / 34

Project Data Method Experiments Conclusion Approach Represent instances as a set of n-dimensional numerical feature vectors Use representation with different ML tasks Adapt neural language model : Word2vec 1 Transform RDF graph into sequences of entities and relations 2 (sentences) Train the model and generate entity vectors 3 + Conserve the information in the original graph + Semantically similar/related entities have close vectors in the embedded space + Generate a reusable model, that could be enriched with new entities Knowledge Graph Embedding for Mining Cultural Heritage Data 10 / 34

Project Data Method Experiments Conclusion Knowledge graph embedding process Paris 1 CMN Musée KG Recommandation completion Input Data 2 Extract entities Similarity / Link Community 7 Relatedness prediction detection 3 Build context graph 4 Generate walks . . . . . . . . . . . . ... random tf-idf black-list kernel 6 V n V1 V 2 V 3 5 Train neural language model Entities feature vectors Knowledge Graph Embedding for Mining Cultural Heritage Data 12 / 34

Project Data Method Experiments Conclusion Extract entities (2) Identify entities’ URIs from input data URI exist: read and identify URI from data files 1 URI ! exist: use entity name to build URI (dbpedia, frdbpedia, wikidata) 2 Knowledge Graph Embedding for Mining Cultural Heritage Data 13 / 34

Project Data Method Experiments Conclusion Build context graph (3) For each entity URI: Build context from a generalized data source , ’around’ the entity Data source: e.g. DBpedia ’around’: get neighbours in the graph within α hops Consider the undirected graph α = 1 or 2 Define a black-list to ignore predicates and objects: very general, e.g. <http://www.w3.org/2002/07/owl#Thing> non-informative, e.g. <http://fr.dbpedia.org/resource/Mod` ele:P.> noisy, e.g. <http://www.w3.org/2000/01/rdf-schema#comment> Knowledge Graph Embedding for Mining Cultural Heritage Data 14 / 34

Project Data Method Experiments Conclusion Merge context graphs (3) e 8 e 7 e 8 e 7 e 5 e 14 e 4 e 4 e 3 e 13 e 6 e y e x e 12 e 9 e 6 e 5 e 1 e 10 e 2 e 3 Context graph of entity e y Context graph of entity e x e 12 e 14 e 13 e 7 e 5 e y e 8 e 4 e 3 e 6 e x e 10 e 9 e 1 e 2 Global context graph Knowledge Graph Embedding for Mining Cultural Heritage Data 15 / 34

Project Data Method Experiments Conclusion Generate walks (4) Paris 1 CMN Musée KG Recommandation completion Input Data Extract entities 2 Similarity / Link Community 7 Relatedness prediction detection 3 Build context graph 4 Generate walks . . . . . . . . . . . . ... tf-idf black-list kernel random 6 V n V1 V 2 V 3 5 Train neural language model Entities feature vectors Knowledge Graph Embedding for Mining Cultural Heritage Data 16 / 34

Project Data Method Experiments Conclusion Random walk (4) Intuition: all neighbours are equally important for an entity Specify walk parameters nb-walks: number of walks (example: 500 walk) depth: number of hops in the graph (2, 4, 8) example: d=4 ⇒ e → p 1 → e 1 → p 2 → e 2 Specify the list of entities (all entities in the global context graph / a predefined list) For each entity: get a random list of direct neighbours 1 calculate the corresponding number of walks for each neighbour 2 recursively.. 3 Adjust the number of walks according to specific cases: if (nb-neighbours < nb-walks) : divide, get the entire part of the division, sum-up the rest and add it to a randomly selected neighbour if (nb-neighbours == 0) : transfer its nb-walks to another randomly selected neighbour Knowledge Graph Embedding for Mining Cultural Heritage Data 17 / 34

Project Data Method Experiments Conclusion Tf-Idf graph walk (4) Intuition: Some neighbours are more important for an entity. Prioritize important neighbours by weighting their predicates. Calculate tf-idf weights for predicates tf : evaluate the importance of a predicate p for an entity e t o ( p , e ) = number of p occurrences for entity e t p ( e ) = number of predicates associated with e tf ( p , e ) = t o ( p , e ) / t p ( e ) idf : evaluate the importance of a predicate p on the whole graph D = number of entities in the graph d ( p ) = number of entities using predicate p idf ( p ) = log ( D / d ( p )) tfidf ( p , e ) = tf ( p , e ) ∗ idf ( p ) Knowledge Graph Embedding for Mining Cultural Heritage Data 18 / 34

Project Data Method Experiments Conclusion Black-list walk (4) Intuition: some predicates are noisy (less important) for an entity Put weights on predicates: predicate in the black-list: weight = 0 (to ignore) other predicate: weight = 1 (to consider in the walk) Example: { http://dbpedia.org/ontology/wikiPageWikiLink } Knowledge Graph Embedding for Mining Cultural Heritage Data 19 / 34

Project Data Method Experiments Conclusion Weisfeiler-Lehman kernel (4) Intuition: Weisfeiler-Lehman subtree RDF graph kernels capture (richer) information of an entire subtree in a single node. de Vries, Gerben K. D., ”A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data”, ECML PKDD 2013. Knowledge Graph Embedding for Mining Cultural Heritage Data 20 / 34

Project Data Method Experiments Conclusion Weisfeiler-Lehman kernel (4) For each iteration, for each entity in the graph, get random walks of depth d After 1 iteration, graph G sequences: 1 − > 6 − > 11; 1 − > 6 − > 11 − > 13; 1 − > 6 − > 11 − > 10; ... 4 − > 11 − > 6; 4 − > 11 − > 13; 4 − > 11 − > 10; 4 − > 11 − > 10 − > 8; ... Ristoski, Paulheim, ”RDF2Vec: RDF Graph Embeddings for Data Mining”, ISWC 2016. Knowledge Graph Embedding for Mining Cultural Heritage Data 21 / 34

Project Data Method Experiments Conclusion Neural language model (5,6) Word2vec A two-layer neural net that processes text Input: a text corpus (sentences) Output: a set of vectors (feature vectors for words in that corpus) Create neural embeddings for any group of discrete and co-occurring states → RDF data Knowledge Graph Embedding for Mining Cultural Heritage Data 22 / 34

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada - PowerPoint PPT Presentation

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada Mimouni and Jean-Claude Moissinac Telecom ParisTech Institut Mines Telecom January 24 th , 2019 DIG - LTCI Knowledge Graph Embedding for Mining Cultural Heritage Data 1 / 34

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Natural & Cultural Scottish Natural Heritage Heritage Fund Natural & Cultural

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Tay Heritage 1 Contents Importance of Heritage The Heritage Act Heritage Committee

Culture and Cultural Heritage Dr. Gabriela Avram Outline p What is culture? p What do

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Indian National Trust for Art and Cultural Heritage (INTACH), India Cultural Mapping of a

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Cultural Heritage Tourism What How Why Nancy B. Kramer Program Coordinator Northwest

LOST OR FOUND? INTANGIBLE CULTURAL HERITAGE Is intangible cultural heritage now part of place

Nidderdale AONB Heritage Officer Nidderdale AONB Heritage Volunteering Project Heritage

Intrinsic value of cultural heritage as driver for heritage-led entrepreneurship November,

Study Area Western Counties Cultural Heritage Landscape Study Area Western Counties

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Welcome to RAID 2009 Saint-Malo France Septembre 23-25 and to Saint-Malo, Brittany RAID

THE 5 PHASES OF A THE 5 PHASES OF A SUCCESSFUL BWC PILOT SUCCESSFUL BWC PILOT March 31, 2015

Results from the MINOS Experiment Gregory Pawloski Stanford University On behalf of the MINOS

University Engagement: Faculty Learning Commons Jesus Huerta, Catholic Relief Service Regional

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone As time

Anglo-Saxon Prose and Short Poetry 05.23.13 || English 2322: British Literature: Anglo-Saxon

MINOS Neutrino Oscillation Results and the new NO A experiment Alec Habig, for the MINOS &

(More) Flavor Physics from Fermilab and MILC Steven Gottlieb Indiana University (MILC &

Sambuz

Useful Links

Newsletter

Mail Us

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada - PowerPoint PPT Presentation

Knowledge Graph Embedding for Mining Cultural Heritage Data Nada Mimouni and Jean-Claude Moissinac Telecom ParisTech Institut Mines Telecom January 24 th , 2019 DIG - LTCI Knowledge Graph Embedding for Mining Cultural Heritage Data 1 / 34

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Natural &amp; Cultural Scottish Natural Heritage Heritage Fund Natural &amp; Cultural

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Tay Heritage 1 Contents Importance of Heritage The Heritage Act Heritage Committee

Culture and Cultural Heritage Dr. Gabriela Avram Outline p What is culture? p What do

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Indian National Trust for Art and Cultural Heritage (INTACH), India Cultural Mapping of a

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Cultural Heritage Tourism What How Why Nancy B. Kramer Program Coordinator Northwest

LOST OR FOUND? INTANGIBLE CULTURAL HERITAGE Is intangible cultural heritage now part of place

Nidderdale AONB Heritage Officer Nidderdale AONB Heritage Volunteering Project Heritage

Intrinsic value of cultural heritage as driver for heritage-led entrepreneurship November,

Study Area Western Counties Cultural Heritage Landscape Study Area Western Counties

Graph Mining Marco Serafini COMPSCI 532 Lecture 11 Classes of Graph Systems Graph

Welcome to RAID 2009 Saint-Malo France Septembre 23-25 and to Saint-Malo, Brittany RAID

THE 5 PHASES OF A THE 5 PHASES OF A SUCCESSFUL BWC PILOT SUCCESSFUL BWC PILOT March 31, 2015

Results from the MINOS Experiment Gregory Pawloski Stanford University On behalf of the MINOS

University Engagement: Faculty Learning Commons Jesus Huerta, Catholic Relief Service Regional

Welcome! Office Hours will start at 2pm and run until 3pm Please mute your microphone As time

Anglo-Saxon Prose and Short Poetry 05.23.13 || English 2322: British Literature: Anglo-Saxon

MINOS Neutrino Oscillation Results and the new NO A experiment Alec Habig, for the MINOS &amp;

(More) Flavor Physics from Fermilab and MILC Steven Gottlieb Indiana University (MILC &amp;

Sambuz

Useful Links

Newsletter

Mail Us

Natural & Cultural Scottish Natural Heritage Heritage Fund Natural & Cultural

MINOS Neutrino Oscillation Results and the new NO A experiment Alec Habig, for the MINOS &

(More) Flavor Physics from Fermilab and MILC Steven Gottlieb Indiana University (MILC &