extracting and modeling relations with graph
play

Extracting and Modeling Relations with Graph Convolutional Networks - PowerPoint PPT Presentation

Extracting and Modeling Relations with Graph Convolutional Networks Ivan Titov with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianne van den Berg and Peter Bloem 1 Inferring missing facts in knowledge bases: link


  1. Extracting and Modeling Relations with Graph Convolutional Networks Ivan Titov with Diego Marcheggiani, Michael Schlichtkrull, Thomas Kipf, Max Welling, Rianne van den Berg and Peter Bloem 1

  2. Inferring missing facts in knowledge bases: link prediction studied_at Vaganova Academy Mikhail Baryshnikov located_in ? n i _ d e v i l St. Petersburg

  3. Relation Extraction studied_at danced_for ? Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in n i _ d e v i l St. Petersburg Baryshnikov danced for Mariinsky based in what was then Leningrad (now St. Petersburg) danced_for

  4. Generalization of link prediction and relation extraction studied_at danced_for ? Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in n i _ d e v i l St. Petersburg After a promising start in Mariinsky ballet, Baryshnikov defected to Canada in 1974 ... E.g., Universal Schema (Reidel et al., 2013)

  5. KBC: it is natural to represent both sentences and KB with graphs studied_at danced_for ? Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in n i _ d e v i l For sentences, the graphs encode beliefs about their linguistic structure St. Petersburg After a promising start in Mariinsky ballet, Baryshnikov defected to Canada in 1974 ... How can we model (and exploit) these graphs with graph neural networks?

  6. Outline Graph Convolutional Networks (GCNs) Link Prediction with Graph Neural Networks Relational GCNs Denoising Graph Autoencoders for Link Prediction Extracting Semantic Relations: Semantic Role Labeling Syntactic GCNs Semantic Role Labeling Model

  7. Graph Convolutional Networks: Neural Message Passing

  8. Graph Convolutional Networks: message passing v Undirected graph Update for node v Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

  9. Graph Convolutional Networks: message passing v Undirected graph Update for node v Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

  10. GCNs: multilayer convolution operation Initial feature Representations Hidden layer Hidden layer representations informed by node of nodes neighbourhoods Input Output Z = H (N) X = H (0) H (1) H (2) Parallelizable computation, can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

  11. GCNs: multilayer convolution operation Initial feature Representations Hidden layer Hidden layer representations informed by node of nodes neighbourhoods Input Output Z = H (N) X = H (0) H (1) H (2) Parallelizable computation, can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

  12. Graph Convolutional Networks: Previous work Shown very effective on a range of problems - citations graphs, chemistry, ... Mostly: - Unlabeled and undirected graphs - Node labeling in a single large graph (transductive setting) - Classification of graphlets How to apply GCNs to graphs we have in knowledge based completion / construction? See Bronstein et al. (Signal Processing, 2017) for an overview

  13. Link Prediction with Graph Neural Networks

  14. Link Prediction studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg

  15. Link Prediction studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg

  16. Link Prediction studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg

  17. KB Factorization studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg

  18. KB Factorization studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg RESCAL A scoring function is used to X X (Nickel et al., 2011) predict whether a relation holds: Baryshnikov lived_in St. Petersburg

  19. KB Factorization studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg DistMult A scoring function is used to X X (Yang et al., 2014) predict whether a relation holds: Baryshnikov lived_in St. Petersburg

  20. KB Factorization studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg DistMult A scoring function is used to X X (Yang et al., 2014) predict whether a relation holds: Baryshnikov lived_in St. Petersburg Relies on SGD to propagate information across the graph

  21. Relational GCNs studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg DistMult A scoring function is used to X X (Yang et al., 2014) predict whether a relation holds: Baryshnikov lived_in St. Petersburg Use the same scoring function but with GCN node representations rather than parameter vectors Schlichtkrull et al., 2017

  22. Info about Relational GCNs St. Petersburg reached here studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg DistMult A scoring function is used to X X (Yang et al., 2014) predict whether a relation holds: Baryshnikov lived_in St. Petersburg Use the same scoring function but with GCN node representations rather than parameter vectors Schlichtkrull et al., 2017

  23. Info about Relational GCNs St. Petersburg reached here studied_at danced_for Vaganova Academy Mikhail Baryshnikov Mariinsky Theatre located_in located_in ? n i _ d e v i l St. Petersburg DistMult A scoring function is used to X X (Yang et al., 2014) predict whether a relation holds: Baryshnikov lived_in St. Petersburg Use the same scoring function but with GCN node representations rather than parameter vectors Schlichtkrull et al., 2017

  24. Relational GCNs

  25. Relational GCNs How do we train Relational GCNs? How do we compactly parameterize Relational GCNs?

  26. GCN Denoising Autoencoders citizen_of U.S.A. awarded danced_for Vilcek Prize studied_at Mariinsky Theatre Mikhail Baryshnikov lived_in located_in n _ i d e t a c o l Vaganova Academy St. Petersburg Take the training graph Schlichtkrull et al (2017)

  27. GCN Denoising Autoencoders U.S.A. awarded danced_for Vilcek Prize studied_at Mariinsky Theatre Mikhail Baryshnikov located_in n _ i d e t a c o l Vaganova Academy St. Petersburg Produce a noisy version: drop some random edges Use this graph for encoding nodes with GCNs Schlichtkrull et al., 2017

  28. GCN Denoising Autoencoders citizen_of U.S.A. awarded danced_for Vilcek Prize studied_at Mariinsky Theatre Mikhail Baryshnikov located_in lived_in n _ i d e t a c o l Vaganova Academy St. Petersburg Force the model to reconstruct the original graph (including dropped edges) (a ranking loss on edges) Schlichtkrull et al., 2017

  29. Training Our R-GCN Classic DistMult Decoder (e.g., node Encoder embeddings) (e.g., node embeddings) Schlichtkrull et al., 2017

  30. GCN Autoencoders: Denoising vs Variational Instead of denoising AEs, we can use variational AEs to train R-GCNs VAE R-GCN can be regarded as an inference network performing amortized variational inference Intuition: R-GCN AEs are amortized versions of factorization models

  31. Relational GCN v There are too many relations in realistic KBs, we cannot use full rank matrices Schlichtkrull et al., 2017

  32. Relational GCN Naive logic: We score with a diagonal matrix (DistMul), let’s use a diagonal one in GCN

  33. Relational GCN Block diagonal assumption: Latent features can be grouped into sets of tightly inter-related features, modeling dependencies across the sets is less important

  34. Relational GCN Basis / Dictionary learning: Represent every KB relation as a linear combination of basis transformations coefficients basis transformations

  35. Results on FB15k-237 (hits@10) Our model Our R-GCN relies on DistMult in the decoder: DistMult is its natural baseline DistMult baseline See other results and metrics in the paper. Results for ComplEX, TransE and HolE from code of Trouillon et al. (2016). Results for HolE using code by Nickel et al. (2015)

  36. Relational GCNs Fast and simple approach to Link Prediction Captures multiple paths without the need to explicitly marginalize over them Unlike factorizations, can be applied to subgraphs unseen in training FUTURE WORK: R-GCNs can be used in combination with more powerful factorizations / decoders Objectives favouring recovery of paths rather than edges Gates and memory may be effective

  37. Extracting Semantic Relations

  38. Semantic Role Labeling Closely related to the relation extraction task Discovering the predicate-argument structure of a sentence Sequa makes and repairs jet engines

  39. Semantic Role Labeling Closely related to the relation extraction task Discovering the predicate-argument structure of a sentence - Discover predicates Sequa makes and repairs jet engines

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend