Research of Event Detection Techniques for Twitter Andreas Weiler, - - PowerPoint PPT Presentation

research of event
SMART_READER_LITE
LIVE PREVIEW

Research of Event Detection Techniques for Twitter Andreas Weiler, - - PowerPoint PPT Presentation

Towards Reproducible Research of Event Detection Techniques for Twitter Andreas Weiler, Harry Schilling, Lukas Kircher, Michael Grossniklaus June 14, 2019 What is an Event? 1. Papal Election habemus, papam, fumata 2. Boston marathon


slide-1
SLIDE 1

Towards Reproducible Research of Event Detection Techniques for Twitter

Andreas Weiler, Harry Schilling, Lukas Kircher, Michael Grossniklaus June 14, 2019

slide-2
SLIDE 2

What is an Event?

  • 1. Papal Election
  • habemus, papam, fumata
  • 2. Boston marathon attack
  • boston, marathon, explosion

2

slide-3
SLIDE 3

Motivation

  • Analysis of 48 event detection techniques
  • 1. Implementation issues
  • Approx. 20% provide source code
  • Approx. 20% provide pseudo code
  • 2. Lack of twitter data
  • 3. Evaluation issues
  • Comparative, case study, stand-alone, user study

3

slide-4
SLIDE 4

Approach

  • 1. Implementation Issues
  • Event detection modules based on a Data Stream Management System
  • 2. Lack of twitter data
  • Twitter Stream Simulator: Twistor
  • 3. Evaluation Issues
  • Evaluation module

4

slide-5
SLIDE 5

Approach

5

slide-6
SLIDE 6

Twistor

  • 1. Simulation of the twitter stream
  • 2. Embedding of events

6

slide-7
SLIDE 7

10% Garden- Hose

  • 1. Simulation of the twitter stream

7

1-minute windows 24 h original twitter stream Distribution of term amount in tweets Frequency of every term Basis information

Twistor

slide-8
SLIDE 8
  • 1. Simulation of the twitter stream
  • Map term distribution of real twitter stream to simulated one (per 1-minute

window)

  • Replace terms of real twitter stream with random terms from the Leipzig

Corpora Collection

  • No simulation of
  • Hashtags
  • Users
  • Semantics

8

Twistor

slide-9
SLIDE 9

Twistor

  • 2. Embedding of events
  • Overall 10 events
  • Based on original data
  • Representation of event by IDF values of event terms
  • IDF value of a word 𝑥 per second
  • idf 𝑥 = log

𝑂 𝑜𝑥

9

slide-10
SLIDE 10

Twistor

  • 2. Embedding of events
  • idf 𝑥 = log

𝑂 𝑜𝑥

10

slide-11
SLIDE 11

Twistor

  • 2. Embedding of events
  • idf 𝑥 = log

𝑂 𝑜𝑥

⇔ 𝑜𝑥 =

𝑂 𝑓idf(𝑥)

11

slide-12
SLIDE 12

Approach

12

slide-13
SLIDE 13

Event Detection Modules

  • Data Stream Management System
  • Shifty
  • Log-Likelihood Ratio (LLH)

13

slide-14
SLIDE 14

Approach

14

slide-15
SLIDE 15

Evaluation Module

  • Analyzes events from event detection modules
  • Against ground truth (events from Twistor)
  • Measures
  • 1. Quality (precision, recall, 𝐺

1)

  • 2. Throughput (tweets per second)
  • 3. Latency

15

slide-16
SLIDE 16

Toolkit Evaluation

  • Generation of 60 minutes 10% Twitter stream
  • 1.5 million tweets
  • 25,000 tweets per minute
  • Embedded 10 events into the artificial Twitter stream
  • TopN (baseline), LLH, Shifty
  • Different parameter configuration  61 result sets for each technique
  • Measures (𝐺

1, Throughput, Latency)

  • Throughput and latency normalized between 0 and 1

16

slide-17
SLIDE 17

Results

17