Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings
Ph.D. Dissertation
Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019
Efficient Algorithm for Answering Fact-based Queries Using - - PowerPoint PPT Presentation
Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019 Outlines Motivation Problem and
Ph.D. Dissertation
Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019
Web App available at:
http://bit.ly/KGE_QA
Source code available at:
https://github.com/iamaziz/kge_qa
This slides available at:
http://aziz.nyc/phd/slides.pdf
image credit: https://bit.ly/2RvSC1A
The KG is a graph-structured knowledge base used in a specific domain to describe the knowledge contained in that domain and the relationships between the domain’s components.
Leveraging KGs help to understand and solve problems in various domains:
and other domains.
image credit: https://bit.ly/2r9HJYV
How to use, represent, and leverage Knowledge Graphs?
Issues with representing large scale knowledge graphs
Supporting part-Of relations in Ontologies.
(Altowayan and Tao, 2015)
Mapped to Simplifying part-whole relations representation
(Altowayan and Tao, 2015)
Inferred Ontology
Which NLP-modeling approach is better suited for representing KGs?
Neural Language Models A.K.A Word Embeddings Why?
Learn dense distributed representations for tokens (words). How?
The intuition behind Distributional Hypothesis:
(Altowayan and Tao, 2016)
Hand-crafted features Hand-crafted features Embedding-based features
Feature representation: using embeddings (only) vs. hand-crafted features
(Altowayan and Elnagar, 2017)
Task-specific models perform better than the generic ones
Generic Corpus Generic Corpus Specific Corpus Specific Corpus
We can ask questions in natural language about the knowledge contained within the knowledge graph e.g.
All of which can be answered by finding the associated triplet: (Titanic, written by, James Cameron)
We present a new algorithm for answering factual questions from Knowledge Graphs. We build two embedding models from the knowledge graph:
Relation Extraction.
Factoid-Question Answering System Based on Knowledge Graph Embeddings
We use the benchmark FB15K dataset and filtered it out to keep four domains
/film/film/written_by /m/0dr_4 /m/03_gd
Raw FB15K triplet: head relation tail
Description data source: https://github.com/xrb92/DKRL
written_by titanic james_cameron
Cleaned FB15K triplet: Entity Descriptions
(Altowayan and Tao, 2019)
Given an input , we decide its type based on its closest vector from and as follows:
token neighbor ENT . vec REL . vec
neighbors = closest_entities(token) + closest_relations(token)
token_type = { type(closest_neighbor), if similarity(closest_neighbor) ≥ MIN_CONFIDENCE OTHER,
closest_neighbor = { max(similarity(neighbors)), if similarity ≥ MAX_CONFIDENCE max(
SM(neighbors) + similarity(neighbors) 2
),
where, similarity is CosineSimilarity and SM is SequenceMatcher 1) 2) 3)
INPUT: who is the writer of Troy movie?
INPUT: who is the writer of Troy movie?
INPUT: who is the writer of Troy movie?
Logging output of the system while answering a question INPUT: who is the writer of Troy movie?
Works with any customized KG “dataset” 1) Create your own dataset 2) Build KGE for new domain knowledge 3) Ask questions
1) Answer Found 2) When No Entity/Relation detected in the question 3) When Entity and Relation are detected, but no corresponding fact in the KG
Also ..
(hard to find mapping data)
can be tricky
The KGE QA approach applies to other applications as well. For example, in search engines, for finding relevant results by linking to most similar keywords in other webpages.
KGE QA source code:
Word count of the dissertation’s chapters:
The dissertation was written and maintained using Git Version Control with Markdown and LaTeX:
TOTOAL commits: 230 First commit: Sat Feb 18 16:51:21 2017 Last commit: Tue Dec 10 14:55:27 2019
Altowayan, A. Aziz, and Tao, Lixin (2015) “Simplified approach for representing part-whole relations in OWL-DL ontologies." 2015 IEEE 12th International Conference on Embedded Software and Systems. IEEE, 2015. Altowayan, A. Aziz, and Tao, Lixin (2016) “Word embeddings for Arabic sentiment analysis." 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016. Altowayan, A. Aziz, and Ashraf Elnagar (2017) “Improving Arabic sentiment analysis with sentiment- specific embeddings." 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017.
Altowayan, A. Aziz and Tao, Lixin (2019) “Evaluating Word Similarity Measure of Embeddings Through Binary Classification”. JCSR 2019, Journal of Computer Science Research. 3. jcsr.v1 Nov. 2019