Efficient Algorithm for Answering Fact-based Queries Using - PowerPoint PPT Presentation

Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019

Outlines • Motivation • Problem and Challenges • Solution, Our Approach, and Validation • Building Representation Models (Altoawayn and Tao, 2015), (Altowayan and Tao, 2016) and (Altowayan and Elnagar 2017) • Models Evaluation (Altoawayn and Tao, 2019) • Application • Factoid KGE QA Algorithm

Demo Web App available at: http://bit.ly/KGE_QA Source code available at: https://github.com/iamaziz/kge_qa This slides available at: http://aziz.nyc/phd/slides.pdf

Where does this come from?

The Knowledge Graph (KG) image credit: https://bit.ly/2RvSC1A

Knowledge Graphs The KG is a graph-structured knowledge base used in a specific domain to describe the knowledge contained in that domain and the relationships between the domain’s components.

    Motivation Leveraging KGs help to understand and solve problems in various domains: • In biology : interactions between proteins and genes • In medicine : drugs and their e ff ects • In social networks : who knows who and where they belong • In search engines : finding relevant results   and other domains.

Where do KGs come from? image credit: https://bit.ly/2r9HJYV

Problem : KG Representation How to use, represent, and leverage Knowledge Graphs? Issues with representing large scale knowledge graphs • KGs are hard to manipulate (Bordes et al., 2015) • large dimensions : 100K/100M entities, 10K/1M relation types • Sparse : few valid links • Noisy/Incomplete : missing/wrong relations/entities

Relations Representation

How it all started Supporting part-Of relations in Ontologies.

OWL: Representation

OWL: Representation Simplifying part-whole relations representation Mapped to (Altowayan and Tao, 2015)

OWL: Representation Inferred Ontology (Altowayan and Tao, 2015)

From Ontology To KG Ontology vs. KG

KG representation is an NLP problem Which NLP-modeling approach is better suited for representing KGs?

Distributed Representations Neural Language Models A.K.A Word Embeddings Why? • Solve the curse-of-dimensionality problem • Capture syntactic and semantic similarity of language

Representing KGs as a Vector Space Model VSM ( Embeddings )

Embedding Idea Learn dense distributed representations for tokens (words). How? 1. Use Neural Networks to learn the embeddings 2. Assume the Distributional Semantics The intuition behind Distributional Hypothesis:

Do Embeddings Really Work? Feature representation: using embeddings (only) vs. hand-crafted features Hand-crafted features Hand-crafted features Embedding-based features (Altowayan and Tao, 2016)

Improving Embeddings Task-specific models perform better than the generic ones Generic Corpus Generic Corpus Specific Corpus Specific Corpus (Altowayan and Elnagar, 2017)

Building KGs Embeddings for a real-world application: Question Answering

Example: Knowledge Graph • Triplet form : ( Titanic, written by, James Cameron)

Answering Questions with KGs We can ask questions in natural language about the knowledge contained within the knowledge graph e.g. • Who is the writer of Titanic? • Who wrote Titanic movie? • Titanic movie is written by whom? All of which can be answered by finding the associated triplet: (Titanic, written by, James Cameron)

A simple Q/A example • Starting from a simple fact: • “James Cameron is the writer of the Titanic movie” • Assume that fact is captured in a triplet form: • (Titanic, written_by, James_Cameron) • Ask a question: • Who is the writer of Titanic? • To find the answer: • Detect Head/Relation in the question e.g. Head : Titanic, Relation : written_by • Complete the pair (Titanic, written_by , ? )

KGE QA System Factoid- Q uestion A nswering System Based on K nowledge G raph E mbeddings We present a new algorithm for answering factual questions from Knowledge Graphs. We build two embedding models from the knowledge graph: one for Named Entities Recognition and the other for Relation Extraction .

Our approach: High-level

Dataset: Domain Knowledge We use the benchmark FB15K dataset and filtered it out to keep four domains

Data Conversion head relation tail Raw FB15K triplet: /m/0dr_4 /film/film/written_by /m/03_gd Cleaned FB15K triplet: titanic written_by james_cameron Entity Descriptions Description data source: https://github.com/xrb92/DKRL

Building ENT/REL models

Answering Pipeline

Evaluating Pre-trained Models (Altowayan and Tao, 2019)

KGE QA Algorithm

Determining Tokens Type Given an input , we decide its type based on its closest token vector from and as follows: neighbor ENT . vec REL . vec neighbors = closest _ entities ( token ) + closest _ relations ( token ) 1) closest _ neighbor = { if similarity ≥ MAX _ CONFIDENCE max ( similarity ( neighbors )), 2) SM ( neighbors ) + similarity ( neighbors ) otherwise max ( ), 2 where , similarity is CosineSimilarity and SM is SequenceMatcher token _ type = { if similarity ( closest _ neighbor ) ≥ MIN _ CONFIDENCE type ( closest _ neighbor ), 3) otherwise OTHER ,

KGE QA: Main UI

Demo: example answer INPUT : who is the writer of Troy movie?

Demo: visualizing similarities INPUT : who is the writer of Troy movie?

Demo: under the hood INPUT : who is the writer of Troy movie? Logging output of the system while answering a question

KGE QA is Customizable Works with any customized KG “dataset” 1) Create your own dataset 2) Build KGE for new domain knowledge 3) Ask questions

Assumption: How To Ask Questions

KGE QA: responses 2) When No Entity/Relation detected 1) Answer Found in the question 3) When Entity and Relation are detected,   but no corresponding fact in the KG

Sample answers

KGE QA: strengths • Captures variations of word senses in ENT/REL (e.g. influences -> influenced) • Supports one-to-many results (e.g. people born in nyc has 209 results) • Handles typos and letter case-agnostic • Supports semantic (meaning) similarity of words (e.g. belongs to -> located_in) Also .. • Works with any customized KG dataset

KGE QA: limitations • Ask one question (fact) at a time • Assumes Head/Relation to be present in the question • One-direction relation (not reflexive e.g. A has_part B != B partOf A ) • No nested answers (single-relations only) • Sensitive to noisy data (e.g. when similar Head/Relation in the question)

KGE QA: challenges • Acquiring clean data • Converting IDs to actual nouns in benchmark datasets (hard to find mapping data) • Choosing thresholds values (for determining token types) can be tricky • i.e. MAX _ CONFIDENCE and MIN _ CONFIDENCE

Other Applications The KGE QA approach applies to other applications as well. For example, in search engines, for finding relevant results by linking to most similar keywords in other webpages.

Fun stats KGE QA source code:

Fun stats Word count of the dissertation’s chapters:

Fun stats The dissertation was written and maintained using Git Version Control with Markdown and LaTeX: TOTOAL commits: 230 First commit: Sat Feb 18 16:51:21 2017 Last commit: Tue Dec 10 14:55:27 2019

Related Publications Altowayan, A. Aziz, and Tao, Lixin (2015) “Simplified approach for representing part-whole relations in OWL-DL ontologies." 2015 IEEE 12th International Conference on Embedded Software and Systems . IEEE, 2015. Altowayan, A. Aziz, and Tao, Lixin (2016) “Word embeddings for Arabic sentiment analysis." 2016 IEEE International Conference on Big Data (Big Data) . IEEE, 2016. Altowayan, A. Aziz, and Ashraf Elnagar (2017) “Improving Arabic sentiment analysis with sentiment- specific embeddings." 2017 IEEE International Conference on Big Data (Big Data) . IEEE, 2017. Altowayan, A. Aziz and Tao, Lixin (2019) “Evaluating Word Similarity Measure of Embeddings Through Binary Classification” . JCSR 2019, Journal of Computer Science Research. 3. jcsr.v1 Nov. 2019

Thank you Q/A

Efficient Algorithm for Answering Fact-based Queries Using - PowerPoint PPT Presentation

Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019 Outlines Motivation Problem and

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Faculty-Administrator Collaboration Team(FACT) FDP Meeting Jan 2020 Informing the Future of

What solves this equation? Equation: n : if n = 0 then 1 else n 1 ) ? fact fact ( n

Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University

TTC'18: Hawk solution Answering queries with the Neo4j graph database What is Hawk? Hawk is

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Advanced Knowledge Based Systems CS3411 Structural Description Logics Enrico Franconi

Knowledge Engineering (IT4362) Quang Nhat NGUYEN (quang.nguyennhat@hust.edu.vn) Hanoi

Learning in Intelligent Systems Artificial Intelligence @ Allegheny College Janyl Jumadinova

Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja Hose, Simon Razniewski May

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Gdels Incompleteness Theorem Overview Computability and Logic Recap Remember what we

CS61A Lecture #28: The Halting Problem and The Halting Problem Incompleteness An interpreter

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (

Sambuz

Useful Links

Newsletter

Mail Us

Efficient Algorithm for Answering Fact-based Queries Using - PowerPoint PPT Presentation

Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019 Outlines Motivation Problem and

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Queries in PSM The following rules apply to the use of queries: CS 235: 1. Queries

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Range Minimum and Lowest Common Ancestor Queries Slides by Solon P. Pissis November 15, 2019

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

Middleware Queries Queries Middleware Middleware Queries Prof. Paolo Ciaccia Prof. Paolo

Faculty-Administrator Collaboration Team(FACT) FDP Meeting Jan 2020 Informing the Future of

What solves this equation? Equation: n : if n = 0 then 1 else n 1 ) ? fact fact ( n

Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University

TTC'18: Hawk solution Answering queries with the Neo4j graph database What is Hawk? Hawk is

Efficient Detection of Empty-Result Queries Gang Luo IBM Watson Research Centre Damon Sotoudeh

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Geometric Algorithms Range &amp; windowing queries (2 lectures) Database queries 2/180 G.

Computational Geometry Lecture 14: Windowing queries Computational Geometry Lecture 14:

Module 14: Analyzing Queries Overview Queries That Use the AND Operator the OR

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Advanced Knowledge Based Systems CS3411 Structural Description Logics Enrico Franconi

Knowledge Engineering (IT4362) Quang Nhat NGUYEN (quang.nguyennhat@hust.edu.vn) Hanoi

Learning in Intelligent Systems Artificial Intelligence @ Allegheny College Janyl Jumadinova

Enabling Completeness-aware Querying in SPARQL Luis Galrraga, Katja Hose, Simon Razniewski May

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Gdels Incompleteness Theorem Overview Computability and Logic Recap Remember what we

CS61A Lecture #28: The Halting Problem and The Halting Problem Incompleteness An interpreter

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (

Sambuz

Useful Links

Newsletter

Mail Us

Geometric Algorithms Range & windowing queries (2 lectures) Database queries 2/180 G.