GeoBurst:*Real.time*Local*Event* - - PowerPoint PPT Presentation

geoburst real time local event detection in geo tagged
SMART_READER_LITE
LIVE PREVIEW

GeoBurst:*Real.time*Local*Event* - - PowerPoint PPT Presentation

GeoBurst:*Real.time*Local*Event* Detection*in*Geo.Tagged*Tweet*Streams Chao*Zhang 1 ,*Guangyu*Zhou 1 ,*Quan*Yuan 1 ,*Honglei*Zhuang 1 ,** Yu*Zheng 2 ,*Lance*Kaplan 3 ,*Shaowen*Wang 1 ,*Jiawei*Han 1* 1 UIUC* 2 Microsoft*Research* 3


slide-1
SLIDE 1

GeoBurst:*Real.time*Local*Event* Detection*in*Geo.Tagged*Tweet*Streams

Chao*Zhang1,*Guangyu*Zhou1,*Quan*Yuan1,*Honglei*Zhuang1,** Yu*Zheng2,*Lance*Kaplan3,*Shaowen*Wang1,*Jiawei*Han1*

1UIUC* 2Microsoft*Research* 3U.S.*Army*Research*Lab

1

slide-2
SLIDE 2

What*is*a*Local*Event?

  • A*local*events*is*an*unusual&activity*bursted*within*a*local&

area*and*specific&duration*while*engaging*a*considerable* number*of*participants.*

  • E.g.,*parade,*riot,*sport*game,*concert,*accident,*disaster.

2

slide-3
SLIDE 3

Local*Event*Detection

  • Real.time*local*event*detection*is*important*for*various*

applications*

  • disaster*monitoring*
  • crime*alarming*
  • activity*recommendation*

3

slide-4
SLIDE 4

Why*Geo.Tagged*Tweet*Stream?

  • Real.time*local*event*detection*is*nearly*impossible*years*ago*

due*to*the*lack*of*timely*and*reliable*data*sources.**

  • The*geo.tagged*tweet*stream*brings*new*opportunities*to*

this*problem*because*of*its*(1)*sheer*size;*(2)*multi. dimensional*information;*and*(3)*real.time*nature.

4

slide-5
SLIDE 5

Our*Goal

  • Given*the*geo.tagged*tweet*stream,*we*aim*to**
  • detect*all*local*events*in*any*query*time*window*(batch&mode);*
  • update*the*result*list*in*real*time*as*the*query*window*shifts*

continuously*(online&mode).

5

query window Q time

slide-6
SLIDE 6

Challenges

1.Integrate&multiple&types&of&data.&

  • Location,*time*and*text*have*totally*different*representations.*

2.Extracting&interpretable&events&from&massive&noise.&

  • Raw*tweets*are*extremely*noisy*and*short.*

3.&On>line&and&real>time&detection.&

  • To*allow*for*timely*actions,*local*events*should*be*detected*in*real*

time.

6

slide-7
SLIDE 7

Previous*Studies

  • Most*existing*event*detection*methods*are*designed*for*

detecting*global&events*

  • They*can*successfully*detect*events*that*are*bursty*in*the*entire*

stream;*

  • But*local*events*are*“bursty”*in*a*small*region*and*involve*a*limited*

number*of*tweets.*

  • A*few*methods*for*local*event*detection*have*been*proposed*
  • They*either*do*not*model*the*correlations*between*keywords;*or*

are*incapable*of*detecting*local*events*in*real*time.

7

2011 ICWSM. Event detection in twitter. 2012 CIKM. Twevent: segment-based event detection from tweets. 2009 CIKM. Event detection from Flickr data through wavelet-based spatial analysis. 2013 PVLDB. EventTweet: Online localized event detection in the twitter stream.

slide-8
SLIDE 8

Our*Insight

  • A*local*event*usually*leads*to*many*related*tweets*around*the*

location*(a&geo>topic&cluster).*

  • But*a&geo>topic&cluster&is&not&necessarily&a&local&event:*
  • It*may*be*a*routine*activity*in*that*region*(e.g.,*shopping).*
  • It*may*be*a*global*event*rather*than*a*local*one*(e.g.,*TV*show).

8

We define a local event as a geo-topic cluster that shows clear spatiotemporal burstiness.

slide-9
SLIDE 9

Overview*of*GeoBurst

  • We*propose*GeoBurst,*a*reference.based*method*for*local*

event*detection.*It*consists*of*three*key*components:*

  • a&candidate&generator*that*finds*geo.topic*clusters*in*the*query*

time*frame,*and*regard*them*as*candidate*events;*

  • a&ranking&module*that*summarizes*the*routine*activities*in*different*

regions*to*filter*non.event*candidates.*

  • an&updater*that*updates*local*events*in*real*time*as*the*query*

window*shifts.

9

slide-10
SLIDE 10

Candidate*Event*Generation

  • The*candidate*generator*finds*geo.topic*clusters*in*the*query*

time*frame*as*candidate*events.*

  • Geo.topic*cluster:*a*group*of*tweets*that*are*geographically*

close*and*semantically*relevant.*

  • Challenges*for*finding*geo.topic*clusters:*
  • How*to*combine*geographical*and*semantic*similarities?*
  • How*to*capture*the*correlations*between*different*keywords?*
  • How*to*cluster*without*knowing*the*number*of*clusters*in*advance?

10

slide-11
SLIDE 11

Candidate*Event*Generation

  • Intuition:*the*spot*where*the*event*occurs*is*acting*as*a*pivot*

that*produces*relevant*tweets*around*it.*

  • Our*clustering*algorithm*is*based*on:*
  • a*geo.topic*authority*score*for*each*tweet*
  • an*authority*ascent*process*to*find*authority*maxima*as*pivots

11

slide-12
SLIDE 12

Geo.topic*Authority

  • A*tweet*gets*an&authority&score*from*neighbor*tweets*where*
  • the*geographical*impact*is*captured*by*kernel*function;*
  • the*semantic*impact*is*captured*by*random*walk*on*the*keyword*

co.occurrence*graph.

12

authority

music, show music band music, band music shop A B C D E

geo-impact semantic impact

Authority can be interpreted as the total amount of energy received from the neighbors.

slide-13
SLIDE 13

Pivot

  • A*pivot*is*an*authority*maximum:*a*prominent*tweet*that*is*

surrounded*by*many*relevant*tweets.

13

music, show music band music, band music shop A B C D E

slide-14
SLIDE 14

Authority*Ascent

  • Now*the*task*is*to*find*all*the*pivots*in*the*geo.topic*space.*
  • We*design*an*authority&ascent*process*to*find*all*pivots.*
  • A&pivot&attracts&similar&tweets*to*form*geo.topic*clusters.

14

neighborhood

d1 d2 d3

neighbor local pivot pivot

slide-15
SLIDE 15

The*Ranking*Module

  • We*design*the*activity&timeline&structure*to*summarize*the*

activities*in*different*spatial*regions*and*time*periods.**

  • The*summaries*in*the*activity*timeline*serve*as*background*

knowledge*to*quantify*the*spatiotemporal*burstiness*of* candidates.

15

time snapshot activity timeline

Each snapshot is a set of micro-clusters. Each cluster is an activity summary for a region.

slide-16
SLIDE 16

The*Ranking*Module

  • Retrieve*the*snapshots*in*a*reference*window*as*background*

knowledge.*

  • Compute*z.score*for*each*candidate*as*its*ranking*score.*

16

slide-17
SLIDE 17

The*Update*Module

  • In*the*entire*process*of*GeoBurst,*the*most*time.consuming*

step*is*pivot*finding.**

  • How*to*avoid*finding*pivots*from*scratch*as*the*query*

window*shifts?*

  • The*key*is*to*maintain*the*local*pivot*for*each*tweet.

17

neighborhood

d1 d2 d3

neighbor local pivot pivot

slide-18
SLIDE 18

The*Update*Module

  • We*design*an*updating*strategy*based*on*the*additive*

property*of*authority*score:*

  • subtracting*the*contributions*of*outdated*tweets*
  • emphasizing*the*contributions*of*new*tweets.

18

neighborhood

d1 d2 d3

neighbor local pivot pivot

slide-19
SLIDE 19

Experimental*Settings

  • Data:**
  • NY:*9M*geo.tagged*tweets*in*New*York*during*3*months.*
  • LA:*8M*geo.tagged*tweets*in*Los*Angeles*during*3*months.*
  • Task:*80*queries*with*different*durations*(3h,*4h,*5h,*6h),*find*

top.5*local*events*in*each*query*window.*

  • Compared*Method:*EvenTweet*(PVLDB’13),*Wavelet*(CIKM’09)*
  • Evaluation:*The*crowdsourcing*platform*CrowdFlower*
  • Ask*the*workers*to*judge*whether*the*result*is*a*local*event*or*not.

19

slide-20
SLIDE 20

Illustrative*Cases

20

slide-21
SLIDE 21

Precision

21

slide-22
SLIDE 22

Running*Time

22

  • 1. GeoBurst*is*more*efficient*than*the*compared*methods*even*when*in*batch*mode.*

2.The*online*mode*of*GeoBurst*is*more*efficient.

slide-23
SLIDE 23

Summary

  • We*study*the*problem*of*detecting*local*events*from*the*geo.tagged*

tweet*stream.*

  • We*proposed*the*GeoBurst*method.*
  • It*first*detects*candidate*events*based*on*authority*ascent,*and*then*ranks*the*

candidates*based*on*background*knowledge.*

  • It*also*features*an*updating*module*to*continuously*monitor*the*stream.*
  • Experiments*demonstrate*the*effectiveness*and*efficiency*of*GeoBurst.*
  • For*future*work,*we*plan*to*extend*GeoBurst*to*handle*the*tweets*that*

mention*geo.location*names*but*do*not*have*GPS*information.

23

slide-24
SLIDE 24

Thanks!

24