Improving Twitter Retrieval by Exploiting Structural Information - - PowerPoint PPT Presentation

improving twitter retrieval by exploiting structural
SMART_READER_LITE
LIVE PREVIEW

Improving Twitter Retrieval by Exploiting Structural Information - - PowerPoint PPT Presentation

Improving Twitter Retrieval by Exploiting Structural Information Zhunchen Luo, Miles Osborne, Sasa Petrovic and Ting Wang Twitter Retrieval Most Twitter search systems treat a tweet as a plain text. A tweet


slide-1
SLIDE 1

Improving Twitter Retrieval by Exploiting Structural Information

Zhunchen ¡Luo, ¡Miles ¡Osborne, ¡Sasa ¡Petrovic ¡and ¡Ting ¡Wang

slide-2
SLIDE 2

Twitter Retrieval

  • Most Twitter search systems

treat a tweet as a plain text.

  • A tweet can be seen as

structured text.

  • Goal: Improve Twitter retrieval

by exploiting structural information.

slide-3
SLIDE 3

Structured Tweets

slide-4
SLIDE 4

Structured Tweets

Plan Text:

slide-5
SLIDE 5

Structured Tweets

Plan Text:

slide-6
SLIDE 6

Structured Tweets

Plan Text: Text+Link:

slide-7
SLIDE 7

Structured Tweets

Plan Text: Text+Link:

slide-8
SLIDE 8

Structured Tweets

Plan Text: Text+Link: Complex Structures (include hashtag, mention, etc):

slide-9
SLIDE 9

Structured Tweets

Plan Text: Text+Link: Complex Structures (include hashtag, mention, etc):

slide-10
SLIDE 10

Our Work

slide-11
SLIDE 11

Our Work

  • We propose Twitter Building Blocks

(TBBs) to capture the structural information of tweets.

slide-12
SLIDE 12

Our Work

  • We propose Twitter Building Blocks

(TBBs) to capture the structural information of tweets.

  • Learning-to-rank for Twitter retrieval
  • Structural information features (TBB features).
  • Social media features (e.g, author social network

information).

slide-13
SLIDE 13

Twitter Building Blocks (TBBs)

slide-14
SLIDE 14

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
slide-15
SLIDE 15

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • Six types of TBBs:
slide-16
SLIDE 16

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • TAG: hashtag, e.g., #keywords.
  • Six types of TBBs:
slide-17
SLIDE 17

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • TAG: hashtag, e.g., #keywords.
  • MET: mention symbols e.g., @username.
  • Six types of TBBs:
slide-18
SLIDE 18

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • TAG: hashtag, e.g., #keywords.
  • MET: mention symbols e.g., @username.
  • RWT: retweet symbols, e.g., RT @username, RT, via

@username.

  • Six types of TBBs:
slide-19
SLIDE 19

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • TAG: hashtag, e.g., #keywords.
  • MET: mention symbols e.g., @username.
  • RWT: retweet symbols, e.g., RT @username, RT, via

@username.

  • URL: links.
  • Six types of TBBs:
slide-20
SLIDE 20

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • TAG: hashtag, e.g., #keywords.
  • MET: mention symbols e.g., @username.
  • RWT: retweet symbols, e.g., RT @username, RT, via

@username.

  • URL: links.
  • COM: comment.
  • Six types of TBBs:
slide-21
SLIDE 21

Twitter Building Blocks (TBBs)

  • TBB is a sequence of tokens.
  • TAG: hashtag, e.g., #keywords.
  • MET: mention symbols e.g., @username.
  • RWT: retweet symbols, e.g., RT @username, RT, via

@username.

  • URL: links.
  • COM: comment.
  • MSG: content.
  • Six types of TBBs:
slide-22
SLIDE 22

TBB Structures

  • TBB structure is a combination of TBBs
slide-23
SLIDE 23

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :(

slide-24
SLIDE 24

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :(

slide-25
SLIDE 25

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :(

slide-26
SLIDE 26

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM

slide-27
SLIDE 27

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM

slide-28
SLIDE 28

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM

slide-29
SLIDE 29

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT

slide-30
SLIDE 30

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT

slide-31
SLIDE 31

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT

slide-32
SLIDE 32

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT MET

slide-33
SLIDE 33

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT MET

slide-34
SLIDE 34

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT MET

slide-35
SLIDE 35

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT MET MSG

slide-36
SLIDE 36

TBB Structures

  • TBB structure is a combination of TBBs

U need an iphone lol ==> RT @UserB: @UserA i nearly dropped my blackberry in that poool :( COM RWT MET MSG

  • TBB Structure is “COM RWT MET MSG”.
slide-37
SLIDE 37

TBB Structures

  • TBB structure is a combinations of TBB
slide-38
SLIDE 38

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple

  • TBB structure is a combinations of TBB
slide-39
SLIDE 39

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple

  • TBB structure is a combinations of TBB
slide-40
SLIDE 40

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple

  • TBB structure is a combinations of TBB
slide-41
SLIDE 41

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG

  • TBB structure is a combinations of TBB
slide-42
SLIDE 42

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG

  • TBB structure is a combinations of TBB
slide-43
SLIDE 43

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG

  • TBB structure is a combinations of TBB
slide-44
SLIDE 44

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL

  • TBB structure is a combinations of TBB
slide-45
SLIDE 45

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL

  • TBB structure is a combinations of TBB
slide-46
SLIDE 46

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL

  • TBB structure is a combinations of TBB
slide-47
SLIDE 47

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL TAG

  • TBB structure is a combinations of TBB
slide-48
SLIDE 48

TBB Structures

New IPhone in Semptember ---- http: //buswk.co/jbyCo #iphone #apple MSG URL TAG

  • TBB Structure is “MSG URL TAG”.
  • TBB structure is a combinations of TBB
slide-49
SLIDE 49

TBB Structures Distribution

slide-50
SLIDE 50

TBB Structures Distribution

  • 14 most frequent TBB Structures in Twitter.
  • “OTHERS” accounts for all other TBB Structures.
slide-51
SLIDE 51

TBB Structures Distribution

TBB Structures (%) TBB Structures (%) MSG MET MSG MSG URL OTHERS COM URL MSG TAG MSG URL TAG RWT MSG 30.25 TAG MSG 1.55 20.70 TAG MSG URL 1.20 18.40 RWT MSG URL 0.95 13.20 COM RWT MSG 0.85 4.10 MET MSG URL 0.85 2.65 MSG MET MSG 0.70 2.10 RWT MSG TAG 0.70 1.75

  • 14 most frequent TBB Structures in Twitter.
  • “OTHERS” accounts for all other TBB Structures.
slide-52
SLIDE 52

TBB Structures Distribution

  • People use simple and fixed structures to tweet.

TBB Structures (%) TBB Structures (%) MSG MET MSG MSG URL OTHERS COM URL MSG TAG MSG URL TAG RWT MSG 30.25 TAG MSG 1.55 20.70 TAG MSG URL 1.20 18.40 RWT MSG URL 0.95 13.20 COM RWT MSG 0.85 4.10 MET MSG URL 0.85 2.65 MSG MET MSG 0.70 2.10 RWT MSG TAG 0.70 1.75

  • 14 most frequent TBB Structures in Twitter.
  • “OTHERS” accounts for all other TBB Structures.
slide-53
SLIDE 53

Automatic TBB Tagger

  • Sequence labeling approach (Conditional

Random Field).

  • Features for TBB tagger:
  • Token type; Pos; Length; Prefix and suffix; Twitter
  • rthography (e.g, the preceding of “RWT” is

more likely to be “COM” ).

  • TBB structure identification achieves an

accuracy of 82.60%.

  • #(Train dataset)=1000; #(Dev dataset)=500;

#(Test dataset)=500;

slide-54
SLIDE 54

TBB Analysis

slide-55
SLIDE 55

TBB Analysis

  • Clustering tweets by TBB structures.
slide-56
SLIDE 56

TBB Analysis

  • Clustering tweets by TBB structures.
  • Each cluster has similar characteristics:
slide-57
SLIDE 57

TBB Analysis

  • Clustering tweets by TBB structures.
  • Public Broadcast: MSG URL; MSG URL TAG
  • E.g., Apple brings new iPad to China http://bbc.in/Nl2HW9
  • Each cluster has similar characteristics:
slide-58
SLIDE 58

TBB Analysis

  • Clustering tweets by TBB structures.
  • Public Broadcast: MSG URL; MSG URL TAG
  • E.g., Apple brings new iPad to China http://bbc.in/Nl2HW9
  • Subjective Text: COM RWT MSG,(Opinion Retrieval in
  • Twitter. Luo et al, ICWSM-12)
  • E.g, I thought we were isolated and no one would want to invest here! RT

@UserA: Honda announces 500 new jobs in Swindon

  • Each cluster has similar characteristics:
slide-59
SLIDE 59

TBB Analysis

  • Clustering tweets by TBB structures.
  • Public Broadcast: MSG URL; MSG URL TAG
  • E.g., Apple brings new iPad to China http://bbc.in/Nl2HW9
  • Subjective Text: COM RWT MSG,(Opinion Retrieval in
  • Twitter. Luo et al, ICWSM-12)
  • E.g, I thought we were isolated and no one would want to invest here! RT

@UserA: Honda announces 500 new jobs in Swindon

  • Messy: OTHERS
  • E.g, RT @UserA: Forreal doeee? (Wanda voic) #Icant cut it out

#Newark http://twipic.com/2u15xa...lmao!!WOW ... http://tmi.me/

  • Each cluster has similar characteristics:
slide-60
SLIDE 60

TBB Analysis: OOV

TBB Structures O.(%) TBB Structures O.(%) OTHERS TAG MSG URL MSG URL MSG URL TAG COM RWT MSG TAG MSG URL MSG MET MSG TAG MSG 4.30 MET MSG URL 1.42 3.42 MSG 1.32 1.93 MSG TAG 1.31 1.91 RWT MSG URL 1.30 1.80 MET MSG 1.15 1.78 RWT MSG 0.82 1.64 RWT MSG TAG 0.58 1.63

  • People retweet high quality text.
  • More blocks = More OOV words.
  • Out-of-Vocabulary

Value for TBB Structures:

slide-61
SLIDE 61

TBB for Learning-to-Rank Tweets

  • TBB features (for example):
  • TBB structure of a tweet (TBB Structure

Type).

  • The positional information of the query in the

corresponding TBB.

  • The context information of the TBB containing

the query.

  • The number of blocks in a tweet.
slide-62
SLIDE 62

Rank Approaches

  • Baseline (Duan et al,Coling-10):
  • Features: Length; BM25 score; Link (the most

effective feature).

  • SM_Rank:
  • Features: More social media features (e.g, number
  • f followers).
  • TBB_Rank:
  • Features: Our TBB features.
slide-63
SLIDE 63

Experiment

  • Dataset: 100 queries and 936 judged tweets
  • Leaning to rank model: SVMRank
  • Evaluation: Mean Average Precision (MAP)
  • Ten-fold cross-validation
slide-64
SLIDE 64

Experimental Result

MAP MAP Baseline SM_Rank TBB_Rank 0.4197 Baseline+TBB_Rank 0.4326 0.4338 SM+TBB_Rank 0.4710 0.4235 All 0.4712

  • TBB is effective for Twitter Retrieval!
slide-65
SLIDE 65

The Most Important TBB Structure for Twitter Retrieval

  • Link feature is the most important feature in

Baseline (Duan et al,Coling-10).

  • Replace the Link feature by TBB Structure

Type feature related “URL” block in Baseline.

  • “MSG URL” is the most important structure!

MAP MAP MSG URL MSG URL TAG RWT MSG URL 0.4019 TAG MSG URL 0.3245 0.3327 COM URL 0.3191 0.3289 MET MSG URL 0.1932

slide-66
SLIDE 66

Conclusion

  • We propose Twitter Building Blocks (TBBs)

to capture the structural information of tweets.

  • The structural information of tweets can help

Twitter retrieval.

  • “MSG URL” is the most important structure

for Twitter retrieval.

slide-67
SLIDE 67

Thanks!