improving twitter retrieval by exploiting structural
play

Improving Twitter Retrieval by Exploiting Structural Information - PowerPoint PPT Presentation

Improving Twitter Retrieval by Exploiting Structural Information Zhunchen Luo, Miles Osborne, Sasa Petrovic and Ting Wang Twitter Retrieval Most Twitter search systems treat a tweet as a plain text. A tweet


  1. Improving Twitter Retrieval by Exploiting Structural Information Zhunchen ¡Luo, ¡ Miles ¡Osborne, ¡ Sasa ¡Petrovic ¡and ¡Ting ¡Wang

  2. Twitter Retrieval • Most Twitter search systems treat a tweet as a plain text. • A tweet can be seen as structured text. • Goal: Improve Twitter retrieval by exploiting structural information.

  3. Structured Tweets

  4. Structured Tweets Plan Text:

  5. Structured Tweets Plan Text:

  6. Structured Tweets Plan Text: Text+Link:

  7. Structured Tweets Plan Text: Text+Link:

  8. Structured Tweets Plan Text: Text+Link: Complex Structures (include hashtag, mention, etc):

  9. Structured Tweets Plan Text: Text+Link: Complex Structures (include hashtag, mention, etc):

  10. Our Work

  11. Our Work • We propose Twitter Building Blocks (TBBs ) to capture the structural information of tweets.

  12. Our Work • We propose Twitter Building Blocks (TBBs ) to capture the structural information of tweets. • Learning-to-rank for Twitter retrieval • Structural information features ( TBB features ). • Social media features (e.g, author social network information).

  13. Twitter Building Blocks (TBBs)

  14. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens.

  15. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs:

  16. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs: • TAG: hashtag, e.g., #keywords.

  17. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs: • TAG: hashtag, e.g., #keywords. • MET: mention symbols e.g., @username.

  18. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs: • TAG: hashtag, e.g., #keywords. • MET: mention symbols e.g., @username. • RWT: retweet symbols, e.g., RT @username, RT, via @username.

  19. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs: • TAG: hashtag, e.g., #keywords. • MET: mention symbols e.g., @username. • RWT: retweet symbols, e.g., RT @username, RT, via @username. • URL: links.

  20. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs: • TAG: hashtag, e.g., #keywords. • MET: mention symbols e.g., @username. • RWT: retweet symbols, e.g., RT @username, RT, via @username. • URL: links. • COM: comment.

  21. Twitter Building Blocks (TBBs) • TBB is a sequence of tokens. • Six types of TBBs: • TAG: hashtag, e.g., #keywords. • MET: mention symbols e.g., @username. • RWT: retweet symbols, e.g., RT @username, RT, via @username. • URL: links. • COM: comment. • MSG: content.

  22. TBB Structures • TBB structure is a combination of TBBs

  23. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :(

  24. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :(

  25. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :(

  26. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM

  27. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM

  28. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM

  29. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT COM

  30. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT COM

  31. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT COM

  32. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT MET COM

  33. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT MET COM

  34. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT MET COM

  35. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT MET MSG COM

  36. TBB Structures • TBB structure is a combination of TBBs U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( RWT MET MSG COM • TBB Structure is “ COM RWT MET MSG ”.

  37. TBB Structures • TBB structure is a combinations of TBB

  38. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple

  39. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple

  40. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple

  41. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG

  42. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG

  43. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG

  44. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL

  45. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL

  46. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL

  47. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple TAG MSG URL

  48. TBB Structures • TBB structure is a combinations of TBB New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple TAG MSG URL • TBB Structure is “ MSG URL TAG ”.

  49. TBB Structures Distribution

  50. TBB Structures Distribution • 14 most frequent TBB Structures in Twitter. • “OTHERS” accounts for all other TBB Structures.

  51. TBB Structures Distribution • 14 most frequent TBB Structures in Twitter. • “OTHERS” accounts for all other TBB Structures. TBB Structures (%) TBB Structures (%) MSG 30.25 TAG MSG 1.55 MET MSG TAG MSG URL 20.70 1.20 MSG URL RWT MSG URL 18.40 0.95 OTHERS 13.20 COM RWT MSG 0.85 COM URL 4.10 MET MSG URL 0.85 MSG TAG MSG MET MSG 2.65 0.70 MSG URL TAG RWT MSG TAG 2.10 0.70 RWT MSG 1.75

  52. TBB Structures Distribution • 14 most frequent TBB Structures in Twitter. • “OTHERS” accounts for all other TBB Structures. TBB Structures (%) TBB Structures (%) MSG 30.25 TAG MSG 1.55 MET MSG TAG MSG URL 20.70 1.20 MSG URL RWT MSG URL 18.40 0.95 OTHERS 13.20 COM RWT MSG 0.85 COM URL 4.10 MET MSG URL 0.85 MSG TAG MSG MET MSG 2.65 0.70 MSG URL TAG RWT MSG TAG 2.10 0.70 RWT MSG 1.75 • People use simple and fixed structures to tweet.

  53. Automatic TBB Tagger • Sequence labeling approach (Conditional Random Field). • Features for TBB tagger: • Token type; Pos; Length; Prefix and suffix; Twitter orthography (e.g, the preceding of “RWT” is more likely to be “COM” ). • TBB structure identification achieves an accuracy of 82.60%. • #(Train dataset)=1000; #(Dev dataset)=500; #(Test dataset)=500;

  54. TBB Analysis

  55. TBB Analysis • Clustering tweets by TBB structures.

  56. TBB Analysis • Clustering tweets by TBB structures. • Each cluster has similar characteristics:

  57. TBB Analysis • Clustering tweets by TBB structures. • Each cluster has similar characteristics: • Public Broadcast: MSG URL; MSG URL TAG • E.g., Apple brings new iPad to China http://bbc.in/Nl2HW9

  58. TBB Analysis • Clustering tweets by TBB structures. • Each cluster has similar characteristics: • Public Broadcast: MSG URL; MSG URL TAG • E.g., Apple brings new iPad to China http://bbc.in/Nl2HW9 • Subjective Text: COM RWT MSG,(Opinion Retrieval in Twitter. Luo et al, ICWSM-12) • E.g, I thought we were isolated and no one would want to invest here! RT @UserA: Honda announces 500 new jobs in Swindon

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend