Expanding the YAGO knowledge base Regexes Answering Queries with - - PowerPoint PPT Presentation

expanding the yago knowledge base
SMART_READER_LITE
LIVE PREVIEW

Expanding the YAGO knowledge base Regexes Answering Queries with - - PowerPoint PPT Presentation

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Expanding the YAGO knowledge base Regexes Answering Queries with Unix Shell Thomas Rebele Conclusion Tlcom ParisTech


slide-1
SLIDE 1

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

1/50 Expanding the YAGO knowledge base

Thomas Rebele Télécom ParisTech 2018-07-05

slide-2
SLIDE 2

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is a knowledge base? 2/50

Albert Einstein Mileva Mari´ c married

slide-3
SLIDE 3

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is a knowledge base? 2/50

Albert Einstein Mileva Mari´ c married Alfred Kleiner has advisor

slide-4
SLIDE 4

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is a knowledge base? 2/50

Albert Einstein Mileva Mari´ c married Alfred Kleiner has advisor Nobel Prize in Physics won prize

slide-5
SLIDE 5

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is a knowledge base? 2/50

Albert Einstein Mileva Mari´ c married Alfred Kleiner has advisor Nobel Prize in Physics won prize Applications of knowledge bases ◮ question answering ◮ semantic search ◮ text analysis ◮ machine translation

slide-6
SLIDE 6

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is YAGO? 3/50

◮ knowledge base with 10 million entities and >210 million facts ◮ automatically extracted from Wikipedia, Wordnet, and Geonames ◮ multilingual facts from 10 languages ◮ focus on precision ◮ developed by Max-Planck Institute for Informatics and Télécom ParisTech

slide-7
SLIDE 7

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is YAGO? 4/50

◮ I joined the project in 2015 ◮ coordinated / contributed to the evaluation ◮ maintenance, participating in open source release ◮ development

slide-8
SLIDE 8

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

What is YAGO? 5/50

<Albert_Einstein> Mileva married Alfred Kleiner a d v i s

  • r

Nobel Prize w

  • n
slide-9
SLIDE 9

Expanding the YAGO knowledge base Rebele The YAGO knowledge base

What is a knowledge base? What is YAGO? Accuracy

Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Accuracy 6/50

Figure: Screenshot of evaluation result

◮ 2 months evaluation, 15 participants ◮ evaluated 4412 facts of 76 relations (with 60m total facts) ◮ 98% facts of the sample were correct ◮ Wilson center: 95%, interval width: 4.2% Now that we have this knowledge base, what can we do with it?

slide-10
SLIDE 10

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Related Work 7/50

Similar studies using Semantic Web for Digital Humanities ◮ [Schich et al., 2014]: about 150,000 people ◮ [de la Croix et al., 2015]: about 300,000 people ◮ [Gergaud et al., 2017]: about 1,100,000 people These studies are only about few people. Can we do better with YAGO?

slide-11
SLIDE 11

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Related Work 7/50

Similar studies using Semantic Web for Digital Humanities ◮ [Schich et al., 2014]: about 150,000 people ◮ [de la Croix et al., 2015]: about 300,000 people ◮ [Gergaud et al., 2017]: about 1,100,000 people These studies are only about few people. Can we do better with YAGO? YAGO has 2,200,000 people, but, e.g., locations only for 700,000 people How can we make YAGO more complete?

slide-12
SLIDE 12

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Birth and Death Dates 8/50

Previous algorithm:

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

Extracted birth dates 428

  • 427
  • 42#

b

  • r

n O n

slide-13
SLIDE 13

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Birth and Death Dates 8/50

Previous algorithm:

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

Extracted birth dates 428 b

  • r

n O n

  • 427
  • 42#
slide-14
SLIDE 14

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Birth and Death Dates 9/50

New algorithm: filtering with category dates

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

428

  • 427
  • 42#

bornOn (infobox) bornOn (category) ?

slide-15
SLIDE 15

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Birth and Death Dates 9/50

New algorithm: filtering with category dates

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

428

  • 427
  • 42#

bornOn (infobox) bornOn (category) ?

slide-16
SLIDE 16

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Birth and Death Dates 9/50

New algorithm: filtering with category dates

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

428

  • 427
  • 42#

bornOn (infobox) bornOn (category) ?

slide-17
SLIDE 17

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Birth and Death Dates 9/50

New algorithm: filtering with category dates

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

428

  • 427
  • 42#

bornOn (infobox) bornOn (category)

slide-18
SLIDE 18

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Place of residence 10/50

Extract mapping from demonyms / adjectives to locations ”Austrian“ Austria ”Greek“ Greece Take most frequent location as place of residence Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

Greece: 2 Austria: 1 Greece location with

  • max. occurence
slide-19
SLIDE 19

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Place of residence 11/50

Caveat: only take outermost text spans ”Holy Roman Empire“ → <Holy_Roman_Empire> ”Roman Empire“ → <Roman_Empire>

slide-20
SLIDE 20

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 12/50

Previous algorithm:

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

slide-21
SLIDE 21

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 12/50

Previous algorithm:

Languages English German French

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. He laid the foundation for philosophy.

Plato Birth 428 or 427 BC Death 348 BC

Categories: 420s BC births | 340s BC deaths | Greek philosoph | Greek male wrestler | Austrian writer

Extracted gender male gender

slide-22
SLIDE 22

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 13/50

New algorithm:

Languages English German French

Albert Einstein

Albert Einstein was a

  • physicist. Einstein de-

veloped the theory of relativity.

Albert Einstein

Categories: Male scientist | Swiss physicists

slide-23
SLIDE 23

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 13/50

New algorithm:

Languages English German French

Albert Einstein

Albert Einstein was a

  • physicist. Einstein de-

veloped the theory of relativity.

Albert Einstein

Categories: Male scientist | Swiss physicists

Extracted gender male gender

slide-24
SLIDE 24

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 14/50

Albert Schweitzer

Albert Schweitzer was a French-German writer, and philosopher.

Albert Schweitzer

Categories: Male writer

Albert Camus

Albert Camus was a French philosopher, au- thor, and journalist.

Albert Camus

Categories: Male philosopher

Albert Einstein

Albert Einstein was a

  • physicist. Einstein de-

veloped the theory of relativity.

Albert Einstein

Categories: Male scientist | Swiss physicists

”Albert“ male

slide-25
SLIDE 25

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 14/50

Albert Schweitzer

Albert Schweitzer was a French-German writer, and philosopher.

Albert Schweitzer

Categories: Male writer

Albert Camus

Albert Camus was a French philosopher, au- thor, and journalist.

Albert Camus

Categories: Male philosopher

Albert Einstein

Albert Einstein was a

  • physicist. Einstein de-

veloped the theory of relativity.

Albert Einstein

Categories: Male scientist | Swiss physicists

”Albert“ male ”Francesca“ female ”Kathleen“ female in total: 1206 first names

slide-26
SLIDE 26

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Gender 15/50

Prioritize extracted facts

veloped the theory of elativity.

Categories: Male scientist physicists

  • 1. extract gender by category

Albert Camus

Albert Camus was French philosopher thor, and journalist.

  • 2. extract gender by first name

Plato

Plato was a philoso-

  • pher. He founded the

Academy in Athens. laid the foundation for philosophy.

  • 3. extract gender by pronoun
slide-27
SLIDE 27

Using YAGO for the Humanities: Evaluation 16/50

◮ compare extraction process on Wikipedia dump from 2017-02-20 ◮ extracted on 11 languages ◮ evaluate precision based on a sample of 100 people Extraction YAGO before Recall YAGO now Recall Precision DBpedia (en) Birth dates 1.6m 69% 1.7m 74% (+8%) 100% 0.8m Death dates 0.7m 33% 0.8m 36% (+10%) 100% 0.3m Place of residence 0.7m 30% 2.1m 91% (+201%) 97% (*) 0.7m Gender 1.5m 64% 2.0m 87% (+35%) 98% 4k

Table: Coverage and precision of our methods. Recall relative to total number of people in YAGO (2.2m).

m million k thousand (*) 6% of anachronistic residencies (e.g., German Empire instead of Germany)

slide-28
SLIDE 28

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Life expectancy over time 17/50

100 500 1000 1500 1900 45 50 55 60 65 70 75 80 85 Year Median age male female Figure: Median age over time, by year of birth (with the Student’s t confidence interval at 95 ).

slide-29
SLIDE 29

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Life expectancy over time 18/50

1100 1200 1300 1400 1500 1600 1700 1800 1900 50 55 60 65 70 75 80 Year Median age Great Britain Italy India China Figure: Median age over time, by year of birth

slide-30
SLIDE 30

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Births per month 19/50

2 4 6 8 10 12 7.5 % 8 % 8.5 % 9 % Month Relative births YAGO - all National Center for Health Statistics Figure: Births per month in the United States between 2003 and 2015 (with the Student’s t confidence interval at α = 95%).

slide-31
SLIDE 31

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Births per month 20/50

Possible explanation:

Languages English Euskara

Relative age effect

The relative age effect describes a bias. People born early in the selec- tion period of sports or academia are more likely to perform well.

Relative age effect

Categories: Ageism | Epidemiology

slide-32
SLIDE 32

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Births per month 21/50

2 4 6 8 10 12 7.5 % 8 % 8.5 % 9 % Month Relative births YAGO - no sportsmen National Center for Health Statistics Figure: Births per month in the United States between 2003 and 2015 (with the Student’s t confidence interval at α = 95%).

slide-33
SLIDE 33

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Relative population size 22/50

−3000 −2500 −2000 −1500 −1000 −500 500 1000 1500 2000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 Year Relative population size Egypt Babylonian-Empire Syria China Greece Ancient-Rome Britain France Italy Germany United-States

Figure: Relative population size, by century. The y-axis is scaled by a quadratic function.

slide-34
SLIDE 34

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities

Related Work Extensions Birth and Death Dates Place of residence Gender Evaluation Life expectancy over time Births per month Relative population size Summary

Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Using YAGO for the Humanities: Summary 23/50

◮ extension of YAGO ◮ more birth and death dates (+8%/10%, 100% precision) ◮ more people with locations (+201%, 97% precison) ◮ more people with genders (+35%, 98% precision) ◮ case studies ◮ life expectancy ◮ births per month ◮ relative population size

Thomas Rebele Arash Nekoei Fabian Suchanek

publication: ISWC 2017 (workshop paper) We often had to repair regular expressions (e.g., for matching dates). Can we automate this step?

100 500 1000 1500 1900 50 60 70 80 Year Median age male female
slide-35
SLIDE 35

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Introduction 24/50

Why does YAGO not know

the ISBN numbers of my books? ◮ we want to find ISBN numbers in Wikipedia to include it in YAGO ◮ we try the regex ISBN(978|979)?\d{10}

slide-36
SLIDE 36

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Introduction 24/50

Why does YAGO not know

the ISBN numbers of my books? ◮ we want to find ISBN numbers in Wikipedia to include it in YAGO ◮ we try the regex ISBN(978|979)?\d{10} ◮ why does the regex not find I978-2-1234-5680-3 ? ◮ how can we modify the regex automatically to match the word?

slide-37
SLIDE 37

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Problem statement 25/50

Problem statement, first try: Given ◮ a regular expression r and ISBN(978|979)?\d{10} ◮ a set of strings S,

{ I978-2-1234-5680-3 }

find a regular expression r′ such that ◮ L(r) ⊆ L(r′) ◮ S ⊆ L(r′)

slide-38
SLIDE 38

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Problem statement 25/50

Problem statement, first try: Given ◮ a regular expression r and ISBN(978|979)?\d{10} ◮ a set of strings S,

{ I978-2-1234-5680-3 }

find a regular expression r′ such that ◮ L(r) ⊆ L(r′) ◮ S ⊆ L(r′) Solution: r′ = .∗

slide-39
SLIDE 39

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Problem statement 26/50

Problem statement: Given ◮ a regular expression r, ISBN(978|979)?\d{10} ◮ a set of strings S,

{ I978-2-1234-5680-3 }

◮ a set of negative examples E−,

{ 0612345678 }

find a regular expression r′ such that ◮ L(r) ⊆ L(r′) ◮ S ⊆ L(r′) ◮ L(r′) ∩ E− is small

slide-40
SLIDE 40

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Problem statement 26/50

Problem statement: Given ◮ a regular expression r, ISBN(978|979)?\d{10} ◮ a set of strings S,

{ I978-2-1234-5680-3 }

◮ a set of negative examples E−,

{ 0612345678 }

find a regular expression r′ such that ◮ L(r) ⊆ L(r′) ◮ S ⊆ L(r′) ◮ L(r′) ∩ E− is small Additional goals: ◮ precision of r′ ≥ or ≈ precision of r ◮ recall of r′ ≥ recall of r (w.r.t. the intended meaning of the regex)

slide-41
SLIDE 41

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: What is new in our approach 27/50

Previous approaches

  • regex
  • E+

E− regex + + − → Our approach regex S E− regex + + − → Rationale: creating a large set of positive examples is difficult

slide-42
SLIDE 42

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Approximate regex matching 28/50

Step 1: match string and regex approximately [Myers et al. 1989]

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3 ...

slide-43
SLIDE 43

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Finding the gaps 29/50

Step 2: find the gaps ◮ between regex leaves

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d S B N I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3 ...

slide-44
SLIDE 44

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Finding the gaps 29/50

Step 2: find the gaps ◮ between regex leaves ◮ between characters of the string

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

  • ...
slide-45
SLIDE 45

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 30/50

Step 3 (simple approach): adapt regex, so that it includes the missing parts

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d S B N I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3 ...

. . I ? . S B N ? | . 9 7 8 . 9 7 9 {10} \d

slide-46
SLIDE 46

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 30/50

Step 3 (simple approach): adapt regex, so that it includes the missing parts

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

  • ...

. . I ? . S B N ? | . 9 7 8 . 9 7 9 ?

  • {10}

\d

slide-47
SLIDE 47

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 30/50

Step 3 (simple approach): adapt regex, so that it includes the missing parts

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

  • ...

. . I ? . S B N ? | . 9 7 8 . 9 7 9 ?

  • .

\d ?

  • \d

... \d \d

slide-48
SLIDE 48

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 30/50

Step 3 (simple approach): adapt regex, so that it includes the missing parts

. . I S B N ? | . 9 7 8 . 9 7 9 . \d \d ... \d \d I 9 7 8 - 2 - 1 2 3 4 - 5 6 8 0 - 3

  • ...

. . I ? . S B N ? | . 9 7 8 . 9 7 9 ?

  • .

\d ?

  • \d

... \d ?

  • \d
slide-49
SLIDE 49

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 31/50

Step 3 (adaptive approach): adapt regex, so that it includes the missing parts Cases:

. a b ... c d {g1, g2, g3}

. a ? b ... c d {g1, g′

2}

{g′′

2 , g3}

slide-50
SLIDE 50

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 31/50

Step 3 (adaptive approach): adapt regex, so that it includes the missing parts Cases:

. a b ... c d {g1, g2, g3}

. a ? b ... c d {g1, g′

2}

{g′′

2 , g3}

| a ... b

  • nly recursive call
slide-51
SLIDE 51

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 31/50

Step 3 (adaptive approach): adapt regex, so that it includes the missing parts Cases:

. a b ... c d {g1, g2, g3}

. a ? b ... c d {g1, g′

2}

{g′′

2 , g3}

| a ... b

  • nly recursive call

{n,m} r {g1, g2, g3}

. {...} r {...} r {g1, g′

2}

{g′′

2 , g3}

slide-52
SLIDE 52

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Add missing parts 31/50

Step 3 (adaptive approach): adapt regex, so that it includes the missing parts Cases:

. a b ... c d {g1, g2, g3}

. a ? b ... c d {g1, g′

2}

{g′′

2 , g3}

| a ... b

  • nly recursive call

{n,m} r {g1, g2, g3}

. {...} r {...} r {g1, g′

2}

{g′′

2 , g3}

* r {g1, g2}

* r {g1, g′

2, g′′ 2 }

slide-53
SLIDE 53

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Feedback function 32/50

Example ◮ now we want to find URLs ◮ we try regex r = http://[a-zA-Z\.]+ ◮ it does not find s = wikipedia.org ◮ repaired regex r′ = (http://)?[a-zA-Z\.]+ ◮ problem: r′ finds all words ◮ precision drops

slide-54
SLIDE 54

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Feedback function 32/50

Example ◮ now we want to find URLs ◮ we try regex r = http://[a-zA-Z\.]+ ◮ it does not find s = wikipedia.org ◮ repaired regex r′ = (http://)?[a-zA-Z\.]+ ◮ problem: r′ finds all words ◮ precision drops Solution: use feedback on set of negative examples E− ◮ determine the parts of the regex that we can make optional ◮ we use the number of false positives, i.e., f(r′) = |E− ∩ L(r′)| ≤ α|E− ∩ L(r)| ◮ if f(r′) = false, add the word as disjunction instead: http://[a-zA-Z\.]+|wikipedia.org

slide-55
SLIDE 55

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Feedback function 33/50

Summary of the algorithm:

  • 1. match strings in S approximately to r
  • 2. find gaps in the regex or in the strings
  • 3. (adaptive:) find overlaps within the gaps
  • 4. (simple:) add missing parts for every missing word one after the
  • ther

(adaptive:) add missing parts and check intermediate steps with the feedback

  • 5. (adaptive:) add a generalization of non-repaired words

(similar to [Babbar et al. 2010])

slide-56
SLIDE 56

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Experiments 34/50

Input data ◮ datasets: ReLIE [Li et al., 2008], Enron [Babbar et al., 2010], and YAGO infobox attributes ◮ in total 8 tasks ◮ in total 52 regexes Experimental approach ◮ 5 × 2 train/test split ◮ missing words S are selected randomly from E+ \ L(r), |S| ≤ 10 ◮ we draw 10 different sets S

slide-57
SLIDE 57

Adding Words to Regexes: Experiments 35/50

Baselines ◮ dis: r|s1| · · · |sn ◮ star: .* Competitors ◮ B&S: [Babbar et al., 2010] (reimplementation) ◮ simple ◮ adaptive baseline adaptive measure

  • riginal

dis star B&S simple α = 1.0 α = 1.1 α = 1.20 F1 55 55 21 40 56 60 60 60 recall 66 67 62 35 69 75 76 77 precision 64 64 14 71 64 63 63 63 length 56 270 2 3929 250 76 80 81

Table: Averaged measures for the different systems. Length is # of characters of the regex.

slide-58
SLIDE 58

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes

Introduction Problem statement What is new in our approach Approximate regex matching Finding the gaps Add missing parts Feedback function Experiments Summary

Answering Queries with Unix Shell Conclusion

Adding Words to Regexes: Summary 36/50

Summary ◮ algorithm for adding missing words to regexes ◮ increases recall, while keeping precision stable ◮ Source code available at https://github.com/thomasrebele/regex-repair Future work ◮ decrease dependency on E− ◮ add a generalization step as postprocessing

Thomas Rebele Katerina Tzompanaki Fabian Suchanek

publications: ISWC 2017 (demo), PAKDD 2018 (full paper) Now that we have all this data, how can we process it efficiently?

slide-59
SLIDE 59

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Motivation 37/50

How can I find all teachers?

Albert Einstein Relativity Cosmology teaches teaches Alfred Kleiner Statistical physics teaches successorOf Person Person type type

slide-60
SLIDE 60

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Motivation 38/50

Observation:

database

  • result

importing querying

slide-61
SLIDE 61

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Motivation 38/50

Observation:

database

  • result

importing querying

slide-62
SLIDE 62

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Motivation 38/50

Observation:

database

  • result

importing querying

  • result

’grep’ing

slide-63
SLIDE 63

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: System 39/50

  • transformation
  • SPARQL/OWL
  • Datalog
  • r
  • Bash script
  • TSV/n-triples files
  • result
slide-64
SLIDE 64

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: System 39/50

  • SPARQL/OWL
  • Datalog
  • r
  • Bash script
  • TSV/n-triples files
  • result
  • Datalog

σπ

Algebra

  • Optimize
  • Translate
slide-65
SLIDE 65

Answering Queries with Unix Shell: Approach 40/50

Query "Which people teach a course?" in SPARQL

SELECT ?X WHERE { ?X <type> <Person>. ?X <teachesCourse> ?Y. }

Translating the query to Datalog

Person(X) :- facts(X, "type", "Person"). teaches(X, Y) :- facts(X, "teaches", Y). Teacher(X) :- Person(X), teaches(X,Y).

π1

⋊ ⋉1=1

σ3=”Person” σ2=”type” facts σ2=”teaches” facts

slide-66
SLIDE 66

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Approach 41/50

Optimization:

π1

⋊ ⋉1=1

σ3=”Person” σ2=”type” facts σ2=”teaches” facts

π1

⋊ ⋉1=1

π1

σ3=”Person” σ2=”type” facts

π1

σ2=”teaches” facts

slide-67
SLIDE 67

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Approach 42/50

Algebra plan

π1

⋊ ⋉1=1

π1

σ3=”Person” σ2=”type” facts

π1

σ2=”teaches” facts Bash code

sort -u \ <(join -1 1 -2 1 -o 1.1 \ <(sort -k 1 \ <(awk ’($3 == "Person" && $2 == "type") { print $1 };’ facts)) <(sort -k 1 \ <(awk ’($2 == "teaches") { print $1 };’ facts))

slide-68
SLIDE 68

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Approach 42/50

Algebra plan

π1

⋊ ⋉1=1

π1

σ3=”Person” σ2=”type” facts

π1

σ2=”teaches” facts Bash code

sort -u \ <(join -1 1 -2 1 -o 1.1 \ <(sort -k 1 \ <(awk ’($3 == "Person" && $2 == "type") { print $1 };’ facts)) <(sort -k 1 \ <(awk ’($2 == "teaches") { print $1 };’ facts))

slide-69
SLIDE 69

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Approach 42/50

Algebra plan

π1

⋊ ⋉1=1

π1

σ3=”Person” σ2=”type” facts

π1

σ2=”teaches” facts Bash code

sort -u \ <(join -1 1 -2 1 -o 1.1 \ <(sort -k 1 \ <(awk ’($3 == "Person" && $2 == "type") { print $1 };’ facts)) <(sort -k 1 \ <(awk ’($2 == "teaches") { print $1 };’ facts))

slide-70
SLIDE 70

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Approach 42/50

Algebra plan

π1

⋊ ⋉1=1

π1

σ3=”Person” σ2=”type” facts

π1

σ2=”teaches” facts Bash code

sort -u \ <(join -1 1 -2 1 -o 1.1 \ <(sort -k 1 \ <(awk ’($3 == "Person" && $2 == "type") { print $1 };’ facts)) <(sort -k 1 \ <(awk ’($2 == "teaches") { print $1 };’ facts))

slide-71
SLIDE 71

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Approach 42/50

Algebra plan

π1

⋊ ⋉1=1

π1

σ3=”Person” σ2=”type” facts

π1

σ2=”teaches” facts Bash code

sort -u \ <(join -1 1 -2 1 -o 1.1 \ <(sort -k 1 \ <(awk ’($3 == "Person" && $2 == "type") { print $1 };’ facts)) <(sort -k 1 \ <(awk ’($2 == "teaches") { print $1 };’ facts))

slide-72
SLIDE 72

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 43/50

Optimizations ◮ algebraic, e.g., merge union / projects ◮ semi-naive evaluation ◮ join reordering ◮ remove superfluous recursive calls ◮ materialize repeated subplans ◮ read files only once ◮ tweak Unix commands, e.g., using LANG=C and MAWK

slide-73
SLIDE 73

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 44/50

How can I find all professors?

Professor(X) :- Person(X), teachesCourse(X,Y). Professor(X) :- successorOf(X,Y), Professor(Y). Person(X) :- Employee(X). Person(X) :- Professor(X). Combining the first and the last rule leads to Professor(X) :- Professor(X), teachesCourse(X,Y).

slide-74
SLIDE 74

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x { }

slide-75
SLIDE 75

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x { , ?, ?} { } { }

slide-76
SLIDE 76

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x { } { } { } { , ?, ?} { } { }

slide-77
SLIDE 77

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x { } { } { } { , ?, ?} { } { } ⇒ superfluous

slide-78
SLIDE 78

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x { } {?} { } {?} { } { , ?, ?} { } { } ⇒ superfluous {?} {?, ?, } { }

slide-79
SLIDE 79

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x { } {?} { } {?} { } { , ?, ?} { } { } ⇒ superfluous {?} {?, ?, } { } ⇒ necessary

slide-80
SLIDE 80

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 45/50

µx

π1

⋊ ⋉1=1

employee x teachesCourse

π1

⋊ ⋉2=1

successorOf x {c1} {?} {c1} {?} {c1} {c1, ?, ?} {c1} {c1} ⇒ superfluous {?} {?, ?, c1} {c1} ⇒ necessary

slide-81
SLIDE 81

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Optimization 46/50

Assume that employee is an expensive subplan

µx

employee ... y employee

µx

y ...

slide-82
SLIDE 82

Answering Queries with Unix Shell: Experiments 47/50

◮ Dataset: LUBM university benchmark ◮ 14 different queries ◮ competitors: Datalog-based (DLV, Souffle, RDFox), Triple store (Jena, Stardog, Virtuoso), Database management system (MonetDB, Postgres) Number of finished queries LUBM Bash DLV Souffle RDFox Jena Stardog Virtuoso MonetDB* Postgres* 10 14 14 13 14 5 14 6 10 10 500 14 11 14 14 6 10 1000 14 4 14 14 10 Runtime in seconds LUBM Bash DLV Souffle RDFox Jena Stardog Virtuoso MonetDB* Postgres* 10 1.6 9.3 (21.9) 2.2 (78.7) 13.6 (11.8) (5.2) (20.6) 500 83 (310) 132 676 (1581) (600) 1000 258 (346) 278 2009 (1187) * = we folded the TBox into the query

slide-83
SLIDE 83

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Experiments 48/50

Figure: Screenshot of the web interface

slide-84
SLIDE 84

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell

Motivation System Approach Optimization Experiments Experiments Summary

Conclusion

Answering Queries with Unix Shell: Summary 49/50

Summary ◮ Preprocess large datasets without installing software ◮ Supports OWL RL subset and Datalog as query language ◮ Try it online at https://www.thomasrebele.org/projects/bashlog ◮ Source code available at https://github.com/thomasrebele/bashlog Future work ◮ numerical comparisons ◮ aggregations (e.g., max, count)

Thomas Rebele Thomas P. Tanon Fabian Suchanek

publication: ISWC 2018 (full paper)

slide-85
SLIDE 85

Expanding the YAGO knowledge base Rebele The YAGO knowledge base Using YAGO for the Humanities Adding Words to Regexes Answering Queries with Unix Shell Conclusion

Conclusion 50/50

This thesis showed how to extend YAGO along several axes: ◮ Improve completeness w.r.t. people ◮ Automatically repairing of its regular expressions ◮ Preprocessing queries using only a Bash shell ◮ Interdisciplinary project ◮ Source code of all contributions is available online ◮ Publications in ISWC 2016, ISWC 2017, ISWC 2018, PAKDD 2018 (other publication in TPDL 2016 (demo))