Commonsense Properties from Query Logs and Question Answering - - PowerPoint PPT Presentation

commonsense properties from query logs and question
SMART_READER_LITE
LIVE PREVIEW

Commonsense Properties from Query Logs and Question Answering - - PowerPoint PPT Presentation

Commonsense Properties from Query Logs and Question Answering Forums Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, Gerhard Weikum Une cole de lIMT Goal Mine Commonsense Knowledge (CSK) about : Object


slide-1
SLIDE 1

Une école de l’IMT

Commonsense Properties from Query Logs and Question Answering Forums

Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, Gerhard Weikum

slide-2
SLIDE 2

Une école de l’IMT

Goal

■ Mine Commonsense Knowledge (CSK) about :

Object properties

Human behavior

General concepts ■ Focus on salient properties ■ Examples :

(bananas, are, edible)

(children, like, bananas) ■ Applications : Chatbot, Question Answering, Visual content understanding, Search engine queries interpretation, ...

QUASIMODO 2

2019/11/05

slide-3
SLIDE 3

Une école de l’IMT

Challenges

■ Sparseness and bias ■ Rarely expressed ■ Non-encyclopedic (no Wikipedia) ■ Noise and high bias on online content

QUASIMODO 3

2019/11/05

slide-4
SLIDE 4

Une école de l’IMT

Previous Work

■ Traditional Knowledge Bases

No commonsense ■ ConceptNet

Manual, does not scale ■ Webchild

Focus on possible properties, not salient ones ■ TupleKB

Domain specific

QUASIMODO 4

2019/11/05

slide-5
SLIDE 5

Une école de l’IMT

General Pipeline

QUASIMODO 5

2019/11/05

slide-6
SLIDE 6

Une école de l’IMT

Candidate Gathering

■ Main idea : Extract facts from questions

When asking a question, make assumptions about the world

Harvest human curiosity, « wisdom of the crowds »

QUASIMODO 6

2019/11/05

Why are bananas yellow? Bananas are yellow!

slide-7
SLIDE 7

Une école de l’IMT

Candidate Gathering – Query Logs

■ Indirect access to the query logs through autocompletion

QUASIMODO 7

2019/11/05

slide-8
SLIDE 8

Une école de l’IMT

Candidate Gathering – QA Forums

QUASIMODO 8

2019/11/05

Yahoo! Answers (research datasets) Quora (semi-manually) Answers.com (sitemap) Reddit (dump)

why-how questions

slide-9
SLIDE 9

Une école de l’IMT

Candidate Gathering – Statistics

QUASIMODO 9

2019/11/05

slide-10
SLIDE 10

Une école de l’IMT

Candidate Gathering – Results

■ Questions transformed to statements then to triples using OpenIE techniques

QUASIMODO 10

2019/11/05

Why do lions often hunt zebras? Lions often hunt zebras (lions, often eat, zebras) OpenIE Q2S Modality (lions, eat, zebras, often) Positivity (lions, eat, zebras, often, positive) Source (lions, eat, zebras, often, positive, Google, 0.4)

slide-11
SLIDE 11

Une école de l’IMT

Corroboration

■ Reduce noise thanks to additional signals from :

Wikipedia and Simple Wikipedia

Answer snippets from search engines

Google Books

Image Tags from OpenImages and Flickr

Google’s Conceptual Captions dataset ■ Train Naive Bayes from all signals from 700 manually annotated triples (TuplesKB requires 70.000)

Precision of 61%

QUASIMODO 11

2019/11/05

slide-12
SLIDE 12

Une école de l’IMT

Ranking + TODO Example

■ From Corroboration, get plausibility score π ■ Define a probability from it: ■ Derive a typicality τ and a saliency σ:

QUASIMODO 12

2019/11/05

slide-13
SLIDE 13

Une école de l’IMT

Grouping

■ Reduce redundancy ■ Clustering method based on tri-factorization ■ Groups of (Subject, Object) and Predicate

QUASIMODO 13

2019/11/05

slide-14
SLIDE 14

Une école de l’IMT

Statistics

QUASIMODO 14

2019/11/05

slide-15
SLIDE 15

Une école de l’IMT

Examples of facts

■ Practical knowledge from human, e.g. : (car, slip on, ice) ■ Problems linked to a subject, e.g.: (pen, can, leak) ■ Emotions linked to events. e.g.: (divorce, can, hurt) ■ Human behaviors. e.g.: (ghost, scare, people) ■ Negative knowledge, e.g.: Not (elephant, can, jump), ■ Salient modalities, e.g.: Always (doctor, have, unreadable handwriting) ■ Trivial facts, e.g.: (road, has_color, black) ■ Newest facts. e.g.: (trump, build, wall) ■ Cultural knowledge (here U.S.) e.g.: Always (school, have, locker) ■ Comparative knowledge, e.g.: (light, faster than, sound)

QUASIMODO 15

2019/11/05

slide-16
SLIDE 16

Une école de l’IMT

Precision – Entire CSKs

QUASIMODO 16

2019/11/05

slide-17
SLIDE 17

Une école de l’IMT

Precision – Same Subjects

QUASIMODO 17

2019/11/05

slide-18
SLIDE 18

Une école de l’IMT

Recall

QUASIMODO 18

2019/11/05

slide-19
SLIDE 19

Une école de l’IMT

Question Answering

QUASIMODO 19

2019/11/05

slide-20
SLIDE 20

Une école de l’IMT

Conclusion

■ We introduced a new methodology for acquiring CSK from non-standard sources ■ Improve state of the art with better coverage of typical and salient properties, determined by Mturks ■ Extrinsic evaluations illustrate advantages ■ Data and code available: github.com/Aunsiels/CSK

QUASIMODO 20

2019/11/05

slide-21
SLIDE 21

Une école de l’IMT

Additional slides

2019/11/05

QUASIMODO 21

slide-22
SLIDE 22

Une école de l’IMT

Future Work

■ Cultural knowledge ■ Study of stereotypes ■ Temporal evolution of the knowledge base ■ Improve ranking methods ■ Scale to the entire web

QUASIMODO 22

2019/11/05

slide-23
SLIDE 23

Une école de l’IMT

Litterature

■ Data: https://www.mpi-inf.mpg.de/departments/databases-and-informatio n-systems/research/yago-naga/commonsense/quasimodo/ ■ Code: https://github.com/Aunsiels/CSK ■ http://conceptnet.io/ ■ http://data.allenai.org/tuple-kb/ ■ https://www.mpi-inf.mpg.de/departments/databases-and-informatio n-systems/research/yago-naga/commonsense/webchild/

QUASIMODO 23

2019/11/05