geoburst real time local event detection in geo tagged
play

GeoBurst:*Real.time*Local*Event* - PowerPoint PPT Presentation

GeoBurst:*Real.time*Local*Event* Detection*in*Geo.Tagged*Tweet*Streams Chao*Zhang 1 ,*Guangyu*Zhou 1 ,*Quan*Yuan 1 ,*Honglei*Zhuang 1 ,** Yu*Zheng 2 ,*Lance*Kaplan 3 ,*Shaowen*Wang 1 ,*Jiawei*Han 1* 1 UIUC* 2 Microsoft*Research* 3


  1. GeoBurst:*Real.time*Local*Event* Detection*in*Geo.Tagged*Tweet*Streams Chao*Zhang 1 ,*Guangyu*Zhou 1 ,*Quan*Yuan 1 ,*Honglei*Zhuang 1 ,** Yu*Zheng 2 ,*Lance*Kaplan 3 ,*Shaowen*Wang 1 ,*Jiawei*Han 1* 1 UIUC* 2 Microsoft*Research* 3 U.S.*Army*Research*Lab 1

  2. What*is*a*Local*Event? • A*local*events*is*an* unusual&activity *bursted*within*a* local& area *and* specific&duration *while*engaging*a*considerable* number*of*participants.* E.g.,*parade,*riot,*sport*game,*concert,*accident,*disaster. ‣ 2

  3. Local*Event*Detection • Real.time*local*event*detection*is*important*for*various* applications* disaster*monitoring* ‣ crime*alarming* ‣ activity*recommendation* ‣ … ‣ 3

  4. Why*Geo.Tagged*Tweet*Stream? • Real.time*local*event*detection*is*nearly*impossible*years*ago* due*to*the*lack*of*timely*and*reliable*data*sources.** • The*geo.tagged*tweet*stream*brings*new*opportunities*to* this*problem*because*of*its*(1)*sheer*size;*(2)*multi. dimensional*information;*and*(3)*real.time*nature. 4

  5. Our*Goal • Given*the*geo.tagged*tweet*stream,*we*aim*to** detect*all*local*events*in*any*query*time*window*( batch&mode );* ‣ update*the*result*list*in*real*time*as*the*query*window*shifts* ‣ continuously*( online&mode ). query window Q time 5

  6. Challenges 1.Integrate&multiple&types&of&data.& ‣ Location,*time*and*text*have*totally*different*representations.* 2.Extracting&interpretable&events&from&massive&noise.& ‣ Raw*tweets*are*extremely*noisy*and*short.* 3.&On>line&and&real>time&detection.& ‣ To*allow*for*timely*actions,*local*events*should*be*detected*in*real* time. 6

  7. Previous*Studies • Most*existing*event*detection*methods*are*designed*for* detecting* global&events * ‣ They*can*successfully*detect*events*that*are*bursty*in*the*entire* stream;* ‣ But*local*events*are*“bursty”*in*a*small*region*and*involve*a*limited* number*of*tweets.* • A*few*methods*for*local*event*detection*have*been*proposed* They*either*do*not*model*the*correlations*between*keywords;*or* ‣ are*incapable*of*detecting*local*events*in*real*time. 2011 ICWSM. Event detection in twitter. 2012 CIKM. Twevent: segment-based event detection from tweets. 2009 CIKM. Event detection from Flickr data through wavelet-based spatial analysis. 2013 PVLDB. EventTweet: Online localized event detection in the twitter stream. 7

  8. Our*Insight • A*local*event*usually*leads*to*many*related*tweets*around*the* location* (a&geo>topic&cluster) .* • But* a&geo>topic&cluster&is&not&necessarily&a&local&event :* It*may*be*a*routine*activity*in*that*region*(e.g.,*shopping).* ‣ It*may*be*a*global*event*rather*than*a*local*one*(e.g.,*TV*show). ‣ We define a local event as a geo-topic cluster that shows clear spatiotemporal burstiness. 8

  9. Overview*of*GeoBurst • We*propose*GeoBurst,*a*reference.based*method*for*local* event*detection.*It*consists*of*three*key*components:* a&candidate&generator *that*finds*geo.topic*clusters*in*the*query* ‣ time*frame,*and*regard*them*as*candidate*events;* a&ranking&module *that*summarizes*the*routine*activities*in*different* ‣ regions*to*filter*non.event*candidates.* an&updater *that*updates*local*events*in*real*time*as*the*query* ‣ window*shifts. 9

  10. Candidate*Event*Generation • The*candidate*generator*finds*geo.topic*clusters*in*the*query* time*frame*as*candidate*events.* • Geo.topic*cluster:*a*group*of*tweets*that*are*geographically* close*and*semantically*relevant.* • Challenges*for*finding*geo.topic*clusters:* How*to*combine*geographical*and*semantic*similarities?* ‣ How*to*capture*the*correlations*between*different*keywords?* ‣ How*to*cluster*without*knowing*the*number*of*clusters*in*advance? ‣ 10

  11. Candidate*Event*Generation • Intuition:*the*spot*where*the*event*occurs*is*acting*as*a* pivot * that*produces*relevant*tweets*around*it.* • Our*clustering*algorithm*is*based*on:* a*geo.topic*authority*score*for*each*tweet* ‣ an*authority*ascent*process*to*find*authority*maxima*as*pivots ‣ 11

  12. Geo.topic*Authority • A*tweet*gets* an&authority&score *from*neighbor*tweets*where* • the*geographical*impact*is*captured*by*kernel*function;* • the*semantic*impact*is*captured*by*random*walk*on*the*keyword* co.occurrence*graph. music, show music B music semantic A authority geo-impact E impact D shop music, band C Authority can be interpreted as the total amount of energy received from the neighbors. band 12

  13. Pivot • A* pivot *is*an*authority*maximum:*a*prominent*tweet*that*is* surrounded*by*many*relevant*tweets. music, show music B music A E D shop music, band C band 13

  14. Authority*Ascent • Now*the*task*is*to*find*all*the*pivots*in*the*geo.topic*space.* • We*design*an* authority&ascent *process*to*find*all*pivots.* • A&pivot&attracts&similar&tweets *to*form*geo.topic*clusters. neighborhood local pivot d 1 d 2 d 3 neighbor pivot 14

  15. The*Ranking*Module • We*design*the* activity&timeline&structure *to*summarize*the* activities*in*different*spatial*regions*and*time*periods.** • The*summaries*in*the*activity*timeline*serve*as*background* knowledge*to*quantify*the*spatiotemporal*burstiness*of* candidates. snapshot activity timeline time Each snapshot is a set of micro-clusters. Each cluster is an activity summary for a region. 15

  16. The*Ranking*Module • Retrieve*the*snapshots*in*a*reference*window*as*background* knowledge.* • Compute*z.score*for*each*candidate*as*its*ranking*score.* 16

  17. The*Update*Module • In*the*entire*process*of*GeoBurst,*the*most*time.consuming* step*is*pivot*finding.** • How*to*avoid*finding*pivots*from*scratch*as*the*query* window*shifts?* The*key*is*to*maintain*the*local*pivot*for*each*tweet. ‣ neighborhood local pivot d 1 d 2 d 3 neighbor pivot 17

  18. The*Update*Module • We*design*an*updating*strategy*based*on*the*additive* property*of*authority*score:* subtracting*the*contributions*of*outdated*tweets* ‣ emphasizing*the*contributions*of*new*tweets. ‣ neighborhood local pivot d 1 d 2 d 3 neighbor pivot 18

  19. Experimental*Settings • Data:** • NY:*9M*geo.tagged*tweets*in*New*York*during*3*months.* • LA:*8M*geo.tagged*tweets*in*Los*Angeles*during*3*months.* • Task:*80*queries*with*different*durations*(3h,*4h,*5h,*6h),*find* top.5*local*events*in*each*query*window.* • Compared*Method:*EvenTweet*(PVLDB’13),*Wavelet*(CIKM’09)* • Evaluation:*The*crowdsourcing*platform*CrowdFlower* Ask*the*workers*to*judge*whether*the*result*is*a*local*event*or*not. ‣ 19

  20. Illustrative*Cases 20

  21. Precision 21

  22. Running*Time 1. GeoBurst*is*more*efficient*than*the*compared*methods*even*when*in*batch*mode.* 2.The*online*mode*of*GeoBurst*is*more*efficient. 22

  23. Summary • We*study*the*problem*of*detecting*local*events*from*the*geo.tagged* tweet*stream.* • We*proposed*the*GeoBurst*method.* It*first*detects*candidate*events*based*on*authority*ascent,*and*then*ranks*the* ‣ candidates*based*on*background*knowledge.* It*also*features*an*updating*module*to*continuously*monitor*the*stream.* ‣ • Experiments*demonstrate*the*effectiveness*and*efficiency*of*GeoBurst.* • For*future*work,*we*plan*to*extend*GeoBurst*to*handle*the*tweets*that* mention*geo.location*names*but*do*not*have*GPS*information. 23

  24. Thanks! 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend