Extracting Descriptions of Location Relations from Implicit Textual - - PowerPoint PPT Presentation

extracting descriptions of location relations from
SMART_READER_LITE
LIVE PREVIEW

Extracting Descriptions of Location Relations from Implicit Textual - - PowerPoint PPT Presentation

Extracting Descriptions of Location Relations from Implicit Textual Networks Andreas Spitz, Gloria Feher, Michael Gertz Heidelberg University, Institute of Computer Science Database Systems Research Group { spitz,gertz }


slide-1
SLIDE 1

Extracting Descriptions of Location Relations from Implicit Textual Networks

Andreas Spitz, Gloria Feher, Michael Gertz

Heidelberg University, Institute of Computer Science Database Systems Research Group {spitz,gertz}@informatik.uni-heidelberg.de {feher}@stud.uni-heidelberg.de

11th GIR Workshop Heidelberg, November 30, 2017

slide-2
SLIDE 2

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

What are the relations between Berlin

source: cdn.getyourguide.com

and Vienna?

source: www.wien.info Extracting Descriptions of Location Relations Andreas Spitz 1 of 26

slide-3
SLIDE 3

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Relations between Berlin and Vienna both are capitals spoken language is German located in Europe population > 1,000,000

Extracting Descriptions of Location Relations Andreas Spitz 2 of 26

slide-4
SLIDE 4

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary source: www.wikidata.org Extracting Descriptions of Location Relations Andreas Spitz 3 of 26

slide-5
SLIDE 5

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary source: www.wikidata.org Extracting Descriptions of Location Relations Andreas Spitz 3 of 26

slide-6
SLIDE 6

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

How can we extract other non-trivial connections from texts?

Extracting Descriptions of Location Relations Andreas Spitz 4 of 26

slide-7
SLIDE 7

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Outline

(1) The what and why of implicit textual networks (2) Identifying related locations and geo-entities (3) Extracting descriptive sentences (4) Exploratory results and discussion

Extracting Descriptions of Location Relations Andreas Spitz 5 of 26

slide-8
SLIDE 8

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

What is an Implicit Network?

Spitz and Gertz, Terms over LOAD (2016) Extracting Descriptions of Location Relations Andreas Spitz 6 of 26

slide-9
SLIDE 9

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Implicit Network Edge Weights

For edges (x, y) in which y is a page or sentence, count only (co-) occurrences: ω(x, y) =

  • 1

if y contains x

  • therwise

Extracting Descriptions of Location Relations Andreas Spitz 7 of 26

slide-10
SLIDE 10

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Implicit Network Edge Weights

For edges (x, y) in which y is a page or sentence, count only (co-) occurrences: ω(x, y) =

  • 1

if y contains x

  • therwise

For edges (x, y) between entity types and terms, aggregate co-occurrence instances I: sum over similarities derived from sentence distances s. ω(x, y) :=

  • i∈I

exp(−s(x, y, i))

Extracting Descriptions of Location Relations Andreas Spitz 7 of 26

slide-11
SLIDE 11

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Why Use Implicit Networks?

Existing approaches

  • Knowledge Extraction

⇒ Limited by identifiable patterns or predicates

Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

slide-12
SLIDE 12

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Why Use Implicit Networks?

Existing approaches

  • Knowledge Extraction

⇒ Limited by identifiable patterns or predicates

  • Summarization

⇒ Severe scaling limitations for large input collections

Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

slide-13
SLIDE 13

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Why Use Implicit Networks?

Existing approaches

  • Knowledge Extraction

⇒ Limited by identifiable patterns or predicates

  • Summarization

⇒ Severe scaling limitations for large input collections

  • Vector embeddings

⇒ Encode similarity of contexts, not relatedness of entities

Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

slide-14
SLIDE 14

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Why Use Implicit Networks?

Existing approaches

  • Knowledge Extraction

⇒ Limited by identifiable patterns or predicates

  • Summarization

⇒ Severe scaling limitations for large input collections

  • Vector embeddings

⇒ Encode similarity of contexts, not relatedness of entities Implicit networks

  • Scale well to large document collections
  • Collocation-based weights encode relatedness of entities
  • Work well with dynamic text data

Extracting Descriptions of Location Relations Andreas Spitz 8 of 26

slide-15
SLIDE 15

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Implicit Network Exploration Pipeline

Spitz, Almasian, Gertz, EVELIN (2017) Extracting Descriptions of Location Relations Andreas Spitz 9 of 26

slide-16
SLIDE 16

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Implicit Network Exploration Pipeline

Extracting Descriptions of Location Relations Andreas Spitz 9 of 26

slide-17
SLIDE 17

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Overview: Location Relation Extraction

Extracting descriptive sentences for pairs of locations (1) Find closely related pairs of locations (2) Filter relations that exist in knowledge bases (3) Identify descriptive sentences for the remaining pairs

Extracting Descriptions of Location Relations Andreas Spitz 10 of 26

slide-18
SLIDE 18

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Identifying Closely Related Locations

Obtain a location ranking from the network by (1) Creating weights for directed edges between nodes x ∈ X and y ∈ Y in entity sets X and Y in the implicit network

  • ω(x|y) = ω(x, y) log

|Y | |N(x) ∩ Y | (2) For a given query location q ∈ L, ranking all l ∈ L by ω(l|q)

Rousseau and Vazirgiannis, Graph-of-word (2013) Spitz and Gertz, Terms over LOAD (2016) Extracting Descriptions of Location Relations Andreas Spitz 11 of 26

slide-19
SLIDE 19

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Location Ranking Example

Berlin (Q64) location wikiID score Germany Q183 1.00 West Berlin Q56036 0.42 East Germany Q16957 0.32 Hamburg Q1055 0.31 Munich Q1726 0.29 Brandenburg Q1208 0.29 Paris Q90 0.27 Vienna (Q1741) location wikiID score Austria Q40 1.00 Berlin Q64 0.25 Prague Q1085 0.23 Paris Q90 0.19 Munich Q1726 0.16 Austria-Hungary Q28513 0.15 Graz Q13298 0.14

Extracting Descriptions of Location Relations Andreas Spitz 12 of 26

slide-20
SLIDE 20

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Coverage Estimation Data

Input location data (Wikipedia):

  • List of largest German cities (79 locations)
  • List of international capitals (250 locations)

Knowledge Base:

  • Wikidata

⇒ Inverse evaluation: How “poorly” does the ranking reflect Wikidata properties?

Extracting Descriptions of Location Relations Andreas Spitz 13 of 26

slide-21
SLIDE 21

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Coverage of Location Relations

German cities World capitals 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 10 20 30 40 50 60 70 80 90 100 position in ranking (k) metric@k metric precision recall

  • Precision

Fraction of location pairs in ranking that are connected by a property in Wikidata

  • Recall

Fraction of Wikidata proper- ties that are in the ranked list of location relations

Extracting Descriptions of Location Relations Andreas Spitz 14 of 26

slide-22
SLIDE 22

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Sentence Extraction: Intuition

Extracting Descriptions of Location Relations Andreas Spitz 15 of 26

slide-23
SLIDE 23

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Basic Sentence Ranking Methods

Rank a sentence s by a set of query entities Q (here: locations), based

  • n its neighbourhood N(s) and a number n of relevant terms Tn(Q).

Extracting Descriptions of Location Relations Andreas Spitz 16 of 26

slide-24
SLIDE 24

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Basic Sentence Ranking Methods

Rank a sentence s by a set of query entities Q (here: locations), based

  • n its neighbourhood N(s) and a number n of relevant terms Tn(Q).

M1 Entity count (baseline) r1(s, Q) := |N(s) ∩ Q|

  • Rank by adjacent query entities

Extracting Descriptions of Location Relations Andreas Spitz 16 of 26

slide-25
SLIDE 25

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Basic Sentence Ranking Methods

Rank a sentence s by a set of query entities Q (here: locations), based

  • n its neighbourhood N(s) and a number n of relevant terms Tn(Q).

M1 Entity count (baseline) r1(s, Q) := |N(s) ∩ Q|

  • Rank by adjacent query entities

M2 Term influence r2(s, Q, n) := |N(s) ∩ Q| + |N(s) ∩ Tn(Q)| |Tn(Q)| + 1

  • Rank first by entity count
  • Then rank by number of contained relevant terms

Extracting Descriptions of Location Relations Andreas Spitz 16 of 26

slide-26
SLIDE 26

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Normalized Sentence Ranking Methods

Rank a sentence s by a set of query entities Q (here: locations), based

  • n its neighbourhood N(s) and a number n of relevant terms Tn(Q).

Extracting Descriptions of Location Relations Andreas Spitz 17 of 26

slide-27
SLIDE 27

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Normalized Sentence Ranking Methods

Rank a sentence s by a set of query entities Q (here: locations), based

  • n its neighbourhood N(s) and a number n of relevant terms Tn(Q).

M3 Normalization by length r3(s, Q, n) := 1 log len(s)

  • |N(s) ∩ Q| + |N(s) ∩ Tn(Q)|

|Tn(Q)| + 1

  • Penalize term influence logarithmically with sentence length

Extracting Descriptions of Location Relations Andreas Spitz 17 of 26

slide-28
SLIDE 28

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Normalized Sentence Ranking Methods

Rank a sentence s by a set of query entities Q (here: locations), based

  • n its neighbourhood N(s) and a number n of relevant terms Tn(Q).

M3 Normalization by length r3(s, Q, n) := 1 log len(s)

  • |N(s) ∩ Q| + |N(s) ∩ Tn(Q)|

|Tn(Q)| + 1

  • Penalize term influence logarithmically with sentence length

M4 Normalization by count r4(s, Q, n) := |N(s) ∩ Q| |N(s) ∩ E| + |N(s) ∩ Tn(Q)| |Tn(Q)| · (|N(s) ∩ T | + 1)

  • Normalize contained query entities by total entity count
  • Normalize relevant terms by total term count

Extracting Descriptions of Location Relations Andreas Spitz 17 of 26

slide-29
SLIDE 29

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Evaluation Data

Wikipedia glossary pages on

  • astronomy (18)
  • biology (167)
  • chemistry (177)
  • geology (225)

Example glossary entries (Geology) entity wikidata description archipelago Q33837 a chain or cluster of islands tectonics Q193343 large-scale processes affecting the structure

  • f the earth’s crust

Extracting Descriptions of Location Relations Andreas Spitz 18 of 26

slide-30
SLIDE 30

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Evaluation Results (1)

set M1 M2 p r F1 p r F1 astronomy 0.069 0.207 0.099 0.064 0.248 0.096 biology 0.086 0.181 0.105 0.075 0.302 0.106 chemistry 0.039 0.180 0.062 0.044 0.316 0.074 geology 0.053 0.144 0.072 0.061 0.215 0.090 all 0.059 0.167 0.079 0.060 0.271 0.090 set M3 M4 p r F1 p r F1 astronomy 0.078 0.184 0.097 0.084 0.199 0.109 biology 0.212 0.133 0.127 0.160 0.179 0.151 chemistry 0.082 0.149 0.093 0.084 0.187 0.107 geology 0.114 0.129 0.100 0.105 0.150 0.111 all 0.131 0.138 0.105 0.113 0.171 0.121

Extracting Descriptions of Location Relations Andreas Spitz 19 of 26

slide-31
SLIDE 31

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Evaluation Results (2)

F−score precision recall 0.08 0.09 0.10 0.11 0.12 0.13 0.050 0.075 0.100 0.125 0.10 0.15 0.20 0.25 0.30 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 number of relevant terms (n) score method M1 M2 M3 M4

Performance of sentence extraction methods for varying numbers of relevant terms.

Extracting Descriptions of Location Relations Andreas Spitz 20 of 26

slide-32
SLIDE 32

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Example: Athens and Sparta

Athens (Q1524) – Sparta (Q5690) (1) Although Thebes had traditionally been antagonistic to whichever state led the Greek world, siding with the Persians when they invaded against the Athenian-Spartan alliance, siding with Sparta when Athens seemed omnipotent, and famously derailing the Spartan invasion of Persia by Agesilaus. (2) The Greek historian Thucydides wrote in his History of the Peloponnesian War of how, in 416 BC, Athens attacked Milos for refusing to submit tribute and refusing to join Athens’ alliance against Sparta. (3) In the wake of this battle, Athens, Thebes, Corinth, and Argos joined together to form an anti-Spartan alliance, with its forces commanded by a council at Corinth.

Extracting Descriptions of Location Relations Andreas Spitz 21 of 26

slide-33
SLIDE 33

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Example: Rome and Milan

Rome (Q220) – Milan (Q490) (1) It was set up in 1958 in Rome and now is settled in Milan and represents all the highest cultural values of Italian Fashion. (2) Italian fashion is dominated by Milan, Rome, and to a lesser extent, Florence, with the former two being included in the top 30 fashion capitals of the world. (3) Alberico Archinto (born November 8, 1698, Milan, died September 30, 1758, Rome) was an Italian cardinal and papal diplomat.

Extracting Descriptions of Location Relations Andreas Spitz 22 of 26

slide-34
SLIDE 34

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Issues and Challenges

  • Interactions between entity types in different domains
  • Extension to other entity types
  • Extension to data from the news domain

Extracting Descriptions of Location Relations Andreas Spitz 23 of 26

slide-35
SLIDE 35

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Berlin and Vienna

Berlin Q64 – Vienna Q1741 (1) In the same way that Vienna was the center of Austrian

  • peretta, Berlin was the

center of German operetta.

Vienna’s Operetta Theater, www.theater-wien.at Extracting Descriptions of Location Relations Andreas Spitz 24 of 26

slide-36
SLIDE 36

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Implicit network exploration online

  • Uses Wikipedia implicit entity network
  • Location ranking
  • Descriptive sentence extraction
  • Subgraph exploration

http://evelin.ifi.uni-heidelberg.de

Extracting Descriptions of Location Relations Andreas Spitz 25 of 26

slide-37
SLIDE 37

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Implicit network exploration online

  • Uses Wikipedia implicit entity network
  • Location ranking
  • Descriptive sentence extraction
  • Subgraph exploration

http://evelin.ifi.uni-heidelberg.de

Extracting Descriptions of Location Relations Andreas Spitz 25 of 26

slide-38
SLIDE 38

Motivation Implicit Networks Location Relations Sentence Extraction Exploration and Discussion Summary

Bibliography I

Fran¸ cois Rousseau and Michalis Vazirgiannis. Graph-of-word and TW-IDF: New Approach to Ad Hoc IR. In CIKM, 2013. Andreas Spitz, Satya Almasian, and Michael Gertz. EVELIN: Exploration of Event and Entity Links in Implicit Networks. In WWW, 2017. Andreas Spitz and Michael Gertz. Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events. In SIGIR, 2016.

Extracting Descriptions of Location Relations Andreas Spitz 26 of 26