distributional embedding approach for relational
play

Distributional Embedding Approach for Relational Knowledge - PowerPoint PPT Presentation

Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal Supervisor: Dr. Tao A. Aziz Altowayan Pace University March 2017 Contents overview Part 1: Brief introduction Topic, Issue, and Solution idea


  1. Distributional Embedding Approach for Relational Knowledge Representation Dissertation Proposal Supervisor: Dr. Tao A. Aziz Altowayan Pace University March 2017

  2. Contents overview Part 1: Brief introduction ◮ Topic, Issue, and Solution idea Part 2: Details ◮ Methods, Related work, and Proposed work

  3. PART ONE

  4. Introduction Overview ◮ Relational Learning through Knowledge Base Representation ◮ Relational Knowledge Representation ◮ Knowledge ≈ entities + their relationships Motivation and importance ◮ Relationships between entities are a rich source of information

  5. Knowledge Bases (KBs) Web-scale Extracted KBs provide a structured representation of the world knowledge ◮ Large quantities of knowledge publicly available in relational and across different domains in interlinked form The ability to learn from relational data has significant impact on many applications

  6. Applications and use cases World Wide Web and Semantic Web ◮ linkage of related documents, and semantically structured data Bioinformatics and Molecular Biology ◮ gene-disease association Social Networks ◮ relationships between persons Question Answering ◮ link prediction in knowledge base queries

  7. Example KB: FreeBase 1 1 info source: Deep Learning Lectures (Bordes, 2015)

  8. Example KB: WordNet 2 2 info source: Deep Learning Lectures (Bordes, 2015)

  9. Problem Collectively KBs have over 60 billion published facts and growing (Nickel, 2013) KBs have large dimensions, and thus they are ... ◮ Hard to manipulate ◮ Sparse (with few valid links) ◮ Noisy and/or Incomplete Tackling these issues is a key to automatically understand/utilize the structure of large-scale knowledge bases

  10. Idea Modeling Knowledge Bases ◮ KBs Embeddings (inspired by Word Embeddings) How ? 1. Encode (embed) KBs into low-dimensional vector space s.t. similar entities/relations are represented in a similar “nearby” vectors 2. Use these representations: ◮ to complete/visualize KBs ◮ as KB data in text applications (Bordes et al., 2013)

  11. Example use case: link prediction ◮ Question Answering Systems ◮ Assess the validity of results from Information Retrieval Systems An example fragment of a KB.

  12. Word Embeddings The most successful approach for word meanings (now standard) Two main components . . . 1. Neural Language Modeling “NLM” ◮ neural networks approach for text representations 2. Distributional Semantics Hypothesis 3 ◮ words which are similar in meaning occur in similar contexts (Rubenstein and Goodenough, 1965) 3 One of the most successful ideas of modern statistical NLP As described in: Deep Learning for NLP (Socher, 2016)

  13. Relations Embeddings Current state-of-the-art in relation embeddings is exploiting (only): Neural Language Modeling However, (not): Distributional Semantics Hypothesis As a result, The performance of current KBs modeling methods is far from being useful to leverage in real world applications

  14. Current Approach Current relation embeddings approaches are not making use of distributional similarities over KBs relations

  15. Proposal Approach Proposed approach (as inspired by word embeddings) brings entire experience of word representations to relation embeddings by incorporating Distributional Similarity

  16. PART TWO

  17. Knowledge Base What is Knowledge Base ? Knowledge bases (KBs) store factual information about the real-world in form of binary relations between entities (e.g. FreeBase, NELL, WordNet, YAGO). In KBs, facts are expressed as triplets “binary relations” between entities a triplet of tokens: (subject, verb, object) with the values: (entity i , relation k , entity j )

  18. Sample fragments of KBs Table 1: Sample KB triplets for “molecule” entity from WordNet18 (Miller, 1995) head relation tail __radical_NN_1 _part_of __molecule_NN_1 __physics_NN_1 _member_of_domain_topic __molecule_NN_1 __molecule_NN_1 _has_part __atom_NN_1 __unit_NN_5 _hyponym __molecule_NN_1 __chemical_chain_NN_1 _part_of __molecule_NN_1 __molecule_NN_1 _hypernym __unit_NN_5 Table 2: Sample triplets from NELL 4 KB head relation tail action_movies is_a movie action_movies is_a_generalization_of die_hard leonardo_dicaprio is_an actor akiva_goldsman directedMovie batman_forever leonard_nimoy StarredIn star_trek motorola acquiredBy google david_beckham playSport soccer 4 http://rtw.ml.cmu.edu/rtw/kbbrowser/

  19. Introduction: Knowledge Graphs Knowledge Graphs are graph structured knowledge bases (KBs). ◮ The multi-relational data (of KBs) can form directed graphs (of knowledge) whose nodes correspond to entities and edges correspond to relations between entities. ◮ Multigraph Structure Entity = Node Relation Type = Edge type Fact = Edge

  20. Word Embeddings: Semantic Theory Distributional similarity representation ◮ Distributional Hypothesis “You shall know a word by the company it keeps.” (Firth, 1957) Examples 5 ◮ It was found in the banks of the Amoy River .. ◮ I was seated in my office at the bank when a card . . . ◮ with a plunge, like the swimmer who leaves the bank . . . ◮ through the issue of bank notes, the money capital . . . ◮ settlements were on the north bank of the Ohio River . . . 5 https://youtu.be/T1O3ikmTEdA?t=16m29s

  21. Word Embeddings: Neural Language Model Vector Space Models (VSMs) Distributed representation of words to solve dimensionally problem. VSMs Approaches: 1. count-based: Latent Semantic Analysis “LSA” 2. prediction-based: Neural probabilistic language models (Bengio et al., 2003)

  22. Word Embeddings: Sparse Representations The sparsity of Symbolic Representations make them suffers from the “ curse of dimensionality ” ◮ Lose word order ◮ No account for semantic Hypothetical example: Symbolic Representations of the terms Dog and Cat

  23. Word Embeddings: Distributed Representation Distributed Representations can address the sparsity issue ◮ Low-dimensional ◮ Induce a rich similarity space Hypothetical example: Distributed Representations of the terms Dog and Cat Question: How can we generate such rich vector representations ?

  24. Word Embeddings: Neural Word Embedding word2vec (Mikolov et al., 2013): Most successful example for modeling semantics (and syntactic) similarities of words. It trains (generates) word vectors (embeddings) by leveraging Distributional Hypothesis to predict following/previous words in a given sequence.

  25. Word Embeddings: Word2vec In word2vec’s skip-gram model, the goal is to maximize the sum log-likelihood given all training vocabulary as target: T � � log p ( w c | w t ) t =1 c ∈ C t Where, exp ( w T t . w c ) p ( w c | w t ) = w i ∈ V exp ( w T � t w i )

  26. Word Embeddings: Example architecture 6 Word2vec leverages Distributional Hypothesis “contexts” to estimate words embeddings Probability of a target word is estimated based on its context words 6http://bit.ly/2eIMHR7

  27. Word Embeddings: Example VSM Words Represented in Vector Space 7 7http://projector.tensorflow.org/

  28. Word Embeddings: Example Usage 8 Representing words as vectors allows easy computation of similarity Spain to Madrid :as: Italy to ? Arithmetic operations can be performed on word embeddings (e.g. to find similar words) 8https://www.tensorflow.org/tutorials/word2vec

  29. Relation Embeddings: Related Work TransE State-of-the-art: Translating Embedding for Modeling Multi-relational Data “ TransE ” (Bordes et al., 2013) ◮ Learning objective : h + l ≈ t when ( h , l , t ) holds. In other words, score ( R l ( h , t )) = − dsit ( h + l , t ) wehre dist is L 1 -norm or L 2 -norm and { h , l , t } ∈ R k

  30. Proposed approach: example scenario Table 3: Table 1: An example training dataset. training examples - triplets (e3, r1, e2) (e1, r2, e3) (e1, r3, e4) (e2, r2, e5) (e6, r2, e4) (e3, r1, e6) (e5, r3, e3) We have E = ( e 1 , e 2 , e 3 , e 4 , e 5 , e 6) and R = ( r 1 , r 2 , r 3), and assuming current training target: ( e 1 , r 2 , e 3) And the window size is 1 In this case, the target’s context would be the triplets ( e 6 , r 2 , e 4) and ( e 2 , r 2 , e 5)

  31. Proposed approach: model With that being said, a triplet in our approach is treated just like a word in word2vec model R 1 � � log p ( t r j | t r i ) R i =1 j ∈ C Where a triplet t is a compositional vector t r = d ( e h , r l , e t ) And exp ( t i r · t jr ) p ( t jr | t i r ) = t k r ∈R exp ( t i r · t kr ) �

  32. Plan of Action Rough estimate of the planned/implemented tasks throughout the entire PhD program:

  33. Timeline Date Work Summer 2014 Web-technology tools Fall 2014/Spring 2015 Semantic Web methods Fall 2015 Shift focus to future-proof solution Spring 2016 ML/AI and data collection Fall 2016 Re-produce/test related work models Spring/Summer 2017 Build and evaluate proposed model Summer/Fall 2017 Write up the dissertation

Recommend


More recommend