Exploring Entity-centric Networks in Entangled News Streams Andreas - - PowerPoint PPT Presentation

exploring entity centric networks in entangled news
SMART_READER_LITE
LIVE PREVIEW

Exploring Entity-centric Networks in Entangled News Streams Andreas - - PowerPoint PPT Presentation

Exploring Entity-centric Networks in Entangled News Streams Andreas Spitz and Michael Gertz April 25, 2018 WWW 2018, Lyon Heidelberg University, Germany Database Systems Research Group Parallel News Streams 1 Crossing Streams 2


slide-1
SLIDE 1

Exploring Entity-centric Networks in Entangled News Streams

Andreas Spitz and Michael Gertz April 25, 2018 — WWW 2018, Lyon

Heidelberg University, Germany Database Systems Research Group

slide-2
SLIDE 2

Parallel News Streams

1

slide-3
SLIDE 3

Crossing Streams

2

slide-4
SLIDE 4

Entangled News Streams

3

slide-5
SLIDE 5

Entangled News Streams

3

slide-6
SLIDE 6

Entangled News Streams

Core idea: entity cooccurrences characterize stitching points between news streams

3

slide-7
SLIDE 7

Implicit Entity Networks

slide-8
SLIDE 8

Implicit Network Extraction

Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR. 2016

4

slide-9
SLIDE 9

Implicit Network Aggregation

Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR. 2016

5

slide-10
SLIDE 10

Implicit Network Aggregation

Andreas Spitz and Michael Gertz. “Terms over LOAD: Leveraging Named Entities for Cross- Document Extraction and Summarization of Events”. In: SIGIR. 2016

5

slide-11
SLIDE 11

Implicit Networks of Text Streams

slide-12
SLIDE 12

Edge Context Extraction

6

slide-13
SLIDE 13

Edge Context Extraction

6

slide-14
SLIDE 14

Context-based Aggregation of Edges

7

slide-15
SLIDE 15

Edge Aggregation Approaches

Streaming aggregation: Static aggregation / clustering:

8

slide-16
SLIDE 16

Edge Aggregation Approaches

Streaming aggregation:

◮ Compare similarity of new edge

(v, w, ·) to existing edges (v, w, ·)

◮ If similarity threshold is exceeded:

merge with existing edge

◮ Otherwise, insert as new parallel edge

Static aggregation / clustering:

8

slide-17
SLIDE 17

Edge Aggregation Approaches

Streaming aggregation:

◮ Compare similarity of new edge

(v, w, ·) to existing edges (v, w, ·)

◮ If similarity threshold is exceeded:

merge with existing edge

◮ Otherwise, insert as new parallel edge

Static aggregation / clustering:

◮ Collect all parallel edges ◮ Cluster parallel edges

(density-based)

◮ Discard “noisy” edges ◮ aggregate edges within clusters 8

slide-18
SLIDE 18

Application Examples

slide-19
SLIDE 19

News Article Data

English news articles from RSS feeds:

◮ 14 news outlets (from US, UK, and AU) ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ 127.5 thousand articles ◮ 5.4 million sentences 9

slide-20
SLIDE 20

News Article Data

English news articles from RSS feeds:

◮ 14 news outlets (from US, UK, and AU) ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ 127.5 thousand articles ◮ 5.4 million sentences

NLP processing pipeline:

◮ Part-of-speech and sentence tagging:

Stanford POS tagger

◮ Temporal tagging: HeidelTime ◮ Entity classification:

YAGO classes (LOC, ORG, PER)

◮ Named entity recognition and linking: 9

slide-21
SLIDE 21

News Article Data

English news articles from RSS feeds:

◮ 14 news outlets (from US, UK, and AU) ◮ 6 months (Jun 1 - Nov 30, 2016) ◮ 127.5 thousand articles ◮ 5.4 million sentences

The resulting implicit network has

◮ 125 thousand entities ◮ 351 thousand terms ◮ 83.4 million edges

NLP processing pipeline:

◮ Part-of-speech and sentence tagging:

Stanford POS tagger

◮ Temporal tagging: HeidelTime ◮ Entity classification:

YAGO classes (LOC, ORG, PER)

◮ Named entity recognition and linking: 9

slide-22
SLIDE 22

Context Sensitive Entity Search

  • A. Spitz, S. Almasian, and M. Gertz. “EVELIN: Exploration of Event and Entity Links in Implicit

Networks”. In: WWW Companion. 2017. url: http://evelin.ifi.uni-heidelberg.de

10

slide-23
SLIDE 23

Evolution of Entity Contexts

Topics for David Cameron (Q192) − UK (Q145)

0.00 0.25 0.50 0.75 1.00 Jun Jul Aug Sep Oct

relative frequency of mentions

brexit nation favour demand govern referendum ukip vote westminst campaign prime minist leader resign pro−brexit

11

slide-24
SLIDE 24

Topic Subgraph Exploration

Andreas Spitz and Michael Gertz. “Entity-Centric Topic Extraction and Exploration: A Network- Based Approach”. In: ECIR. 2018

12

slide-25
SLIDE 25

Further Applications

News analysis and exploration:

◮ Contrastive source comparison ◮ Coverage bias ◮ Evolution of news stories ◮ Event description ◮ ... 13

slide-26
SLIDE 26

Further Applications

News analysis and exploration:

◮ Contrastive source comparison ◮ Coverage bias ◮ Evolution of news stories ◮ Event description ◮ ...

NLP and IR applications:

◮ Entity disambiguation ◮ (Extractive) summarization ◮ Relationship extraction ◮ ... 13

slide-27
SLIDE 27

Resources

slide-28
SLIDE 28

Resources

Data and implementation are available online:

◮ [data] Implicit news stream network ◮ [code] Implicit network extraction ◮ [code] Entity query and topic extraction

https://dbs.ifi.uni-heidelberg.de/resources/newsstream/

14

slide-29
SLIDE 29

Resources

Data and implementation are available online:

◮ [data] Implicit news stream network ◮ [code] Implicit network extraction ◮ [code] Entity query and topic extraction

https://dbs.ifi.uni-heidelberg.de/resources/newsstream/

14