Efficient Algorithm for Answering Fact-based Queries Using - - PowerPoint PPT Presentation

efficient algorithm for answering fact based queries
SMART_READER_LITE
LIVE PREVIEW

Efficient Algorithm for Answering Fact-based Queries Using - - PowerPoint PPT Presentation

Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019 Outlines Motivation Problem and


slide-1
SLIDE 1

Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings

Ph.D. Dissertation

Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019

slide-2
SLIDE 2

Outlines

  • Motivation
  • Problem and Challenges
  • Solution, Our Approach, and Validation
  • Building Representation Models (Altoawayn and Tao, 2015), (Altowayan and Tao, 2016) and (Altowayan and Elnagar 2017)
  • Models Evaluation (Altoawayn and Tao, 2019)
  • Application
  • Factoid KGE QA Algorithm
slide-3
SLIDE 3

Demo

Web App available at:

http://bit.ly/KGE_QA

Source code available at:

https://github.com/iamaziz/kge_qa

This slides available at:

http://aziz.nyc/phd/slides.pdf

slide-4
SLIDE 4

Where does this come from?

slide-5
SLIDE 5

The Knowledge Graph (KG)

image credit: https://bit.ly/2RvSC1A

slide-6
SLIDE 6

Knowledge Graphs

The KG is a graph-structured knowledge base used in a specific domain to describe the knowledge contained in that domain and the relationships between the domain’s components.

slide-7
SLIDE 7

Motivation

Leveraging KGs help to understand and solve problems in various domains:

  • In biology: interactions between proteins and genes
  • In medicine: drugs and their effects
  • In social networks: who knows who and where they belong
  • In search engines: finding relevant results



 
 and other domains.

slide-8
SLIDE 8

Where do KGs come from?

image credit: https://bit.ly/2r9HJYV

slide-9
SLIDE 9

Problem: KG Representation

How to use, represent, and leverage Knowledge Graphs?

Issues with representing large scale knowledge graphs

  • KGs are hard to manipulate (Bordes et al., 2015)
  • large dimensions: 100K/100M entities, 10K/1M relation types
  • Sparse: few valid links
  • Noisy/Incomplete: missing/wrong relations/entities
slide-10
SLIDE 10

Relations Representation

slide-11
SLIDE 11

How it all started

Supporting part-Of relations in Ontologies.

slide-12
SLIDE 12

OWL: Representation

slide-13
SLIDE 13

OWL: Representation

(Altowayan and Tao, 2015)

Mapped to Simplifying part-whole relations representation

slide-14
SLIDE 14

OWL: Representation

(Altowayan and Tao, 2015)

Inferred Ontology

slide-15
SLIDE 15

From Ontology To KG

Ontology vs. KG

slide-16
SLIDE 16

KG representation is an NLP problem

Which NLP-modeling approach is better suited for representing KGs?

slide-17
SLIDE 17

Distributed Representations

Neural Language Models A.K.A Word Embeddings Why?

  • Solve the curse-of-dimensionality problem
  • Capture syntactic and semantic similarity of language
slide-18
SLIDE 18

Representing KGs as a Vector Space Model VSM (Embeddings)

slide-19
SLIDE 19

Embedding Idea

Learn dense distributed representations for tokens (words). How?

  • 1. Use Neural Networks to learn the embeddings
  • 2. Assume the Distributional Semantics

The intuition behind Distributional Hypothesis:

slide-20
SLIDE 20

Do Embeddings Really Work?

(Altowayan and Tao, 2016)

Hand-crafted features Hand-crafted features Embedding-based features

Feature representation: using embeddings (only) vs. hand-crafted features

slide-21
SLIDE 21

Improving Embeddings

(Altowayan and Elnagar, 2017)

Task-specific models perform better than the generic ones

Generic Corpus Generic Corpus Specific Corpus Specific Corpus

slide-22
SLIDE 22

Building KGs Embeddings for a real-world application: Question Answering

slide-23
SLIDE 23

Example: Knowledge Graph

  • Triplet form: (Titanic, written by, James Cameron)
slide-24
SLIDE 24

Answering Questions with KGs

We can ask questions in natural language about the knowledge contained within the knowledge graph e.g.

  • Who is the writer of Titanic?
  • Who wrote Titanic movie?
  • Titanic movie is written by whom?

All of which can be answered by finding the associated triplet: (Titanic, written by, James Cameron)

slide-25
SLIDE 25

A simple Q/A example

  • Starting from a simple fact:
  • “James Cameron is the writer of the Titanic movie”
  • Assume that fact is captured in a triplet form:
  • (Titanic, written_by, James_Cameron)
  • Ask a question:
  • Who is the writer of Titanic?
  • To find the answer:
  • Detect Head/Relation in the question e.g. Head: Titanic, Relation: written_by
  • Complete the pair (Titanic, written_by, ?)
slide-26
SLIDE 26

KGE QA System

We present a new algorithm for answering factual questions from Knowledge Graphs. We build two embedding models from the knowledge graph:

  • ne for Named Entities Recognition and the other for

Relation Extraction.

Factoid-Question Answering System Based on Knowledge Graph Embeddings

slide-27
SLIDE 27

Our approach: High-level

slide-28
SLIDE 28

Dataset: Domain Knowledge

We use the benchmark FB15K dataset and filtered it out to keep four domains

slide-29
SLIDE 29

Data Conversion

/film/film/written_by /m/0dr_4 /m/03_gd

Raw FB15K triplet: head relation tail

Description data source: https://github.com/xrb92/DKRL

written_by titanic james_cameron

Cleaned FB15K triplet: Entity Descriptions

slide-30
SLIDE 30

Building ENT/REL models

slide-31
SLIDE 31

Answering Pipeline

slide-32
SLIDE 32

Evaluating Pre-trained Models

(Altowayan and Tao, 2019)

slide-33
SLIDE 33

KGE QA Algorithm

slide-34
SLIDE 34

Determining Tokens Type

Given an input , we decide its type based on its closest vector from and as follows:

token neighbor ENT . vec REL . vec

neighbors = closest_entities(token) + closest_relations(token)

token_type = { type(closest_neighbor), if similarity(closest_neighbor) ≥ MIN_CONFIDENCE OTHER,

  • therwise

closest_neighbor = { max(similarity(neighbors)), if similarity ≥ MAX_CONFIDENCE max(

SM(neighbors) + similarity(neighbors) 2

),

  • therwise

where, similarity is CosineSimilarity and SM is SequenceMatcher 1) 2) 3)

slide-35
SLIDE 35

KGE QA: Main UI

slide-36
SLIDE 36

Demo: example answer

INPUT: who is the writer of Troy movie?

slide-37
SLIDE 37

Demo: visualizing similarities

INPUT: who is the writer of Troy movie?

slide-38
SLIDE 38

Demo: visualizing similarities

INPUT: who is the writer of Troy movie?

slide-39
SLIDE 39

Demo: under the hood

Logging output of the system while answering a question INPUT: who is the writer of Troy movie?

slide-40
SLIDE 40

KGE QA is Customizable

Works with any customized KG “dataset” 1) Create your own dataset 2) Build KGE for new domain knowledge 3) Ask questions

slide-41
SLIDE 41

Assumption: How To Ask Questions

slide-42
SLIDE 42

KGE QA: responses

1) Answer Found 2) When No Entity/Relation detected in the question 3) When Entity and Relation are detected, 
 but no corresponding fact in the KG

slide-43
SLIDE 43

Sample answers

slide-44
SLIDE 44

KGE QA: strengths

  • Captures variations of word senses in ENT/REL (e.g. influences -> influenced)
  • Supports one-to-many results (e.g. people born in nyc has 209 results)
  • Handles typos and letter case-agnostic
  • Supports semantic (meaning) similarity of words (e.g. belongs to -> located_in)

Also ..

  • Works with any customized KG dataset
slide-45
SLIDE 45

KGE QA: limitations

  • Ask one question (fact) at a time
  • Assumes Head/Relation to be present in the question
  • One-direction relation (not reflexive e.g. A has_part B != B partOf A)
  • No nested answers (single-relations only)
  • Sensitive to noisy data (e.g. when similar Head/Relation in the question)
slide-46
SLIDE 46

KGE QA: challenges

  • Acquiring clean data
  • Converting IDs to actual nouns in benchmark datasets

(hard to find mapping data)

  • Choosing thresholds values (for determining token types)

can be tricky

  • i.e. MAX_CONFIDENCE and MIN_CONFIDENCE
slide-47
SLIDE 47

Other Applications

The KGE QA approach applies to other applications as well. For example, in search engines, for finding relevant results by linking to most similar keywords in other webpages.

slide-48
SLIDE 48

Fun stats

KGE QA source code:

slide-49
SLIDE 49

Fun stats

Word count of the dissertation’s chapters:

slide-50
SLIDE 50

Fun stats

The dissertation was written and maintained using Git Version Control with Markdown and LaTeX:

TOTOAL commits: 230 First commit: Sat Feb 18 16:51:21 2017 Last commit: Tue Dec 10 14:55:27 2019

slide-51
SLIDE 51

Related Publications

Altowayan, A. Aziz, and Tao, Lixin (2015) “Simplified approach for representing part-whole relations in OWL-DL ontologies." 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, 2015. Altowayan, A. Aziz, and Tao, Lixin (2016) “Word embeddings for Arabic sentiment analysis." 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016. Altowayan, A. Aziz, and Ashraf Elnagar (2017) “Improving Arabic sentiment analysis with sentiment- specific embeddings." 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017.

Altowayan, A. Aziz and Tao, Lixin (2019) “Evaluating Word Similarity Measure of Embeddings Through Binary Classification”. JCSR 2019, Journal of Computer Science Research. 3. jcsr.v1 Nov. 2019

slide-52
SLIDE 52

Thank you Q/A