Uncovering News-Twitter Reciprocity via Interaction Patterns Yue - - PowerPoint PPT Presentation

uncovering news twitter reciprocity via interaction
SMART_READER_LITE
LIVE PREVIEW

Uncovering News-Twitter Reciprocity via Interaction Patterns Yue - - PowerPoint PPT Presentation

Uncovering News-Twitter Reciprocity via Interaction Patterns Yue Ning 1 Sathappan Muthiah 1 Ravi Tandon 2 Naren Ramakrishnan 1 1 Discovery Analytics Center, Department of Computer Science, Virginia Tech 2 Now with Department of Electrical and


slide-1
SLIDE 1

Uncovering News-Twitter Reciprocity via Interaction Patterns

Yue Ning1 Sathappan Muthiah1 Ravi Tandon2 Naren Ramakrishnan1

1Discovery Analytics Center, Department of Computer Science, Virginia Tech 2Now with Department of Electrical and Computer Engineering, The University of Arizona

slide-2
SLIDE 2

Outline

Introduction Problem Definition Methodology Story Chaining Retrieval of Tweets Identify Interaction Patterns Clustering Topic Modeling Experiments and Results Dataset Results Conclusion

slide-3
SLIDE 3

Problem Introduction

Social Media News Media

slide-4
SLIDE 4

Problem Introduction

Social Media News Media

slide-5
SLIDE 5

Problem Introduction

Social Media News Media

slide-6
SLIDE 6

Motivation

◮ News -> Twitter ◮ Twitter -> News Media ◮ Explosion of information to comment/feed upon ◮ Cause for variations in such interdependencies

◮ Temporal popularity of a "topic" ◮ Geo-location (Africa vs Asia)

slide-7
SLIDE 7

Motivation

◮ News -> Twitter ◮ Twitter -> News Media ◮ Explosion of information to comment/feed upon ◮ Cause for variations in such interdependencies

◮ Temporal popularity of a "topic" ◮ Geo-location (Africa vs Asia)

slide-8
SLIDE 8

Motivation

◮ News -> Twitter ◮ Twitter -> News Media ◮ Explosion of information to comment/feed upon ◮ Cause for variations in such interdependencies

◮ Temporal popularity of a "topic" ◮ Geo-location (Africa vs Asia)

slide-9
SLIDE 9

Motivation

◮ News -> Twitter ◮ Twitter -> News Media ◮ Explosion of information to comment/feed upon ◮ Cause for variations in such interdependencies

◮ Temporal popularity of a "topic" ◮ Geo-location (Africa vs Asia)

slide-10
SLIDE 10

Motivation

◮ News -> Twitter ◮ Twitter -> News Media ◮ Explosion of information to comment/feed upon ◮ Cause for variations in such interdependencies

◮ Temporal popularity of a "topic" ◮ Geo-location (Africa vs Asia)

slide-11
SLIDE 11

Motivation

◮ News -> Twitter ◮ Twitter -> News Media ◮ Explosion of information to comment/feed upon ◮ Cause for variations in such interdependencies

◮ Temporal popularity of a "topic" ◮ Geo-location (Africa vs Asia)

slide-12
SLIDE 12

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput
slide-13
SLIDE 13

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N

slide-14
SLIDE 14

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS)

slide-15
SLIDE 15

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T

slide-16
SLIDE 16

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T D3 1/27/2013: Bodies of fire victims in Santa Maria began to be veiled

slide-17
SLIDE 17

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T D3 1/27/2013: Bodies of fire victims in Santa Maria began to be veiled N

slide-18
SLIDE 18

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T D3 1/27/2013: Bodies of fire victims in Santa Maria began to be veiled N D4 1/28/2013: Rel- atives fill for in- formation about fire victims

slide-19
SLIDE 19

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T D3 1/27/2013: Bodies of fire victims in Santa Maria began to be veiled N D4 1/28/2013: Rel- atives fill for in- formation about fire victims T

slide-20
SLIDE 20

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T D3 1/27/2013: Bodies of fire victims in Santa Maria began to be veiled N D4 1/28/2013: Rel- atives fill for in- formation about fire victims T D5 1/28/2013: Gaucho police arrest owner of Kiss nightclub and two band members

slide-21
SLIDE 21

One Example

D1 1/27/2013: Fire at Kiss night- club: security guards tried to stop nightclub

  • utput

N D2 1/27/2013: Rises to 233 the number killed in the fire nightclub in Santa Maria (RS) T D3 1/27/2013: Bodies of fire victims in Santa Maria began to be veiled N D4 1/28/2013: Rel- atives fill for in- formation about fire victims T D5 1/28/2013: Gaucho police arrest owner of Kiss nightclub and two band members B

slide-22
SLIDE 22

Goals

  • 1. Understanding the type of information flow between

news and Twitter.

  • 2. Chaining similar news articles together.
  • 3. Identifying major interaction patterns

◮ Cluster story chains and understanding their differences ◮ Identify main topics of interest within such clusters.

slide-23
SLIDE 23

System Framework

Ne ws re ports (docume nts)

slide-24
SLIDE 24

System Framework

Ne ws re ports (docume nts) Story Chaining

slide-25
SLIDE 25

System Framework

Ne ws re ports (docume nts) Story Chaining Twe e t Re trie val

slide-26
SLIDE 26

System Framework

Ne ws re ports (docume nts) Story Chaining Twe e t Re trie val Inte raction Patte rn Mining

slide-27
SLIDE 27

System Framework

Ne ws re ports (docume nts) Story Chaining Twe e t Re trie val Inte raction Patte rn Mining

...

Inte raction Patte rn Base d Cluste ring C0 C1 CK

slide-28
SLIDE 28

System Framework

Ne ws re ports (docume nts) Story Chaining Twe e t Re trie val Inte raction Patte rn Mining Topic Mode ling

...

Inte raction Patte rn Base d Cluste ring C0 C1 CK

Cluste r Topic Distributions Story-chain Cluste rs

slide-29
SLIDE 29

Story Chaining Algorithm

1 Goal: identifying all documents related to a news story and to

keep track of the news story as new documents arrive. Method: To assess if two documents are referring to the same underlying context, we calculate their similarity scores with respect to three features:

◮ - textual features, denoted by T(Di) ◮ - spatial features, denoted by L(Di), e.g. city, state, country ◮ - actors, denoted by A(Di), e.g. Hillary Clinton.

  • 1J. Schlachter, A. Ruvinskya, L. Asencios Reynoso, S. Muthiah, and N.

Ramakrishnan, “Leveraging topic models to develop metrics for evaluating the quality of narrative threads extracted from news stories”, in Proc. of the 6th International Conference on Applied Human Factors and Ergonomics, AHFE, Elsevier, 2015.

slide-30
SLIDE 30

Story Chaining Algorithm (Cont.)

The total weighted similarity measure between two documents, Di and Dj , is then defined as follows: sim(Di, Dj) α f(T (Di), T (Dj))

  • textul features

+β f(L(Di), L(Dj))

  • spatial features

+ η f(A(Di), A(Dj))

  • actor features

The coherence between a chain Cj and document Di is defined as coh(Di, Cj) = θg(L(Di), L(Cj)) + φg(A(Di), A(Cj)) where g is any similarity measure and the coefficients θ, φ are chosen such that θ + φ = 1.

slide-31
SLIDE 31

Twitter Profile for News

  • 1. Collect tweets based on

URL.

  • 2. Extract entity keywords

from news.

  • 3. Filter keywords

together.

  • 4. Download hourly count

metrics.

slide-32
SLIDE 32

Interaction Patterns

◮ Peak detection 2 ◮ Incoming influence (Wpre) and outgoing influence (Wpost):

Wpre =

  • s∈Spre

vs tA − ts , Wpost =

  • s∈Spost

vs ts − tA (1)

vs1 tA ts1 tA-ts1 Post Peaks tA-ts2 ts2 tA-ts3 ts3 Pre Peaks

  • 2M. Duarte, “Notes on scientific computing for biomechanics and

motor control”, 2015.

slide-33
SLIDE 33

Interaction States

N T B E

slide-34
SLIDE 34

Interaction States (Cont.)

State(Di) =            N, if Wpre < ρ, Wpost ≥ (1 + λ)Wpre E, if Wpre < ρ, Wpost < (1 + λ)Wpre T, if Wpre ≥ ρ, Wpost < (1 + λ)Wpre B, if Wpre ≥ ρ, Wpost ≥ (1 + λ)Wpre

slide-35
SLIDE 35

Interaction States (Cont.)

N Wpre Wpost ρ E

Wpost = ( 1 + λ ) Wpre

B T

article publish time Wpre Wpost

tA : News → Twitter

Twitter → News Twitter ↔ News

Figure: Geometric Interpretation of States

slide-36
SLIDE 36

Clustering on Encoded Chains

Clustering via qualitative encoding (e.g. “NNTNTBBE”)

◮ Levenshtein distance ◮ Jaro-Winkler distance ◮ Ratcliff-Obershelp pattern

recognition Clustering via quantitative encoding ( e.g. “0.5, -0.9, 0.88,0.3,-0.4”)

◮ Multi-dimensional

Dynamic Time Warping(DTW)

slide-37
SLIDE 37

Interpretation from a Different Dimension

◮ Are sports events always related with Bi-directional

Interactions?

◮ Do Twitter users focus more on sports and entertainment? ◮ Latent Dirichlet Allocation (LDA) for hidden topic analysis

  • n clusters. The topic distributions for one cluster is

defined by: Cj,k =

  • dij∈cj ndijθ(dij, k)
  • dij ndij

, (2) where

◮ ndij refers to the frequency of di in cluster Cj. ◮ θ(dij, k) refers to the topic proportions for this document. ◮ k is the topic index.

slide-38
SLIDE 38

Interpretation from a Different Dimension

◮ Are sports events always related with Bi-directional

Interactions?

◮ Do Twitter users focus more on sports and entertainment? ◮ Latent Dirichlet Allocation (LDA) for hidden topic analysis

  • n clusters. The topic distributions for one cluster is

defined by: Cj,k =

  • dij∈cj ndijθ(dij, k)
  • dij ndij

, (2) where

◮ ndij refers to the frequency of di in cluster Cj. ◮ θ(dij, k) refers to the topic proportions for this document. ◮ k is the topic index.

slide-39
SLIDE 39

Dataset

Real data from Brazil during the period from Nov. 2012 to Sep. 2013:

◮ Protest related articles: GSR ◮ Other articles: NON-GSR

Table: Statistical properties of GSR and Non-GSR chains. Category % of Twitter starts Avg-Time-Lag(hour) GSR Chains 40% 10.95 Non-GSR Chains 73% 5.26

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Business Medical Legal General Ethnic Labor Agricultural Education Refugees 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Government Energy Economics Other Wages Housing

slide-40
SLIDE 40

Dataset (Cont.)

Real data from Brazil during the period from Nov. 2012 to Sep. 2013:

◮ Protest related articles: GSR ◮ Other articles: NON-GSR

Table: Statistical properties of GSR and Non-GSR chains. Category % of Twitter starts Avg-Time-Lag(hour) GSR Chains 40% 10.95 Non-GSR Chains 73% 5.26

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Business Medical Legal General Ethnic Labor Agricultural Education Refugees 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Government Energy Economics Other Wages Housing

slide-41
SLIDE 41

GSR Dataset

Category % News starts % of Twitter starts Housing related protests 100% 0% Agriculture 100% 0% Medical 74% 26% Other (religious & cultural) 60% 40% General Population 30% 70%

  • Govt. Policies

23% 77% Table: % of Twitter, News starts for GSR story-chains

slide-42
SLIDE 42

Cluster Results

1 2 3 4 50 100 150 200

Agricultural Business Education Ethnic General Labor Legal Medical

Figure: Population Distribution of Clusters (K-Medoids)

C C 1 C 2 C 3 C 4 50 100 150 200

Economic Policies Wages Energy Gov Housing Other

Figure: Event Type Distribution of Clusters (K-Medoids)

slide-43
SLIDE 43

Topics in Clusters

ID Frequent Sub-patterns Top Topics C0 “NBNBTNTN”, “NTNTN” Local Events C1 “NT”, “NTNT” Local Events C2 “TNT” Local Events, Ads, Technology C3 “T”, “TB” Others, Protest, Sports C4 “TNENT”, “TEB” Protest, Government, Entertainment Table: Top topics for clusters

slide-44
SLIDE 44

Topic Distributions for Clusters

Figure: Topic distributions of 2 interaction pattern clusters. The X-axis labels refer to topic numbers

slide-45
SLIDE 45

Main Influencer

We define the influence weight of a story chain as the average

  • f the difference of pre- and post- influence weights:
  • i(Wpre

i

− Wpost

i

) n where the summation is over n, the number of articles in a chain.

slide-46
SLIDE 46

Main Influencer (Cont.)

Table: Story Chains with Interaction Patterns and Main Influencer

ID IP IW MI Story Summary SC1 TT 0.514 Twitter “Marco Feliciano protest at church door” SC2 TN 0.48 Twitter “25%Teachers are on strike.” SC3 NNNNBNTBN

  • 0.422

News “Fire in Kiss Nightclub in Santa Maria ” SC4 NBNNTN

  • 0.405

News “Governor decree

  • fficial

mourning” SC5 TTTNN 5.0e-05 Both “Nadal back to Brazil” SC6 NNTNTTNTN

  • 1.7e-04

Both “Nissan sells more than 100 thousand”

slide-47
SLIDE 47

Conclusion

◮ A new framework for

discovering the direction of information flow over time across news and Twitter.

◮ Uncover the

interaction patterns

  • ver stories and test
  • ur proposed method
  • n real data.

◮ Cluster on encoded

story chains and discover topics

Observation 1

Twitter as a social network platform serves as a fast way to draw attention from public for many social events such as sports news.

Observation 2

News media is quicker to report events regarding political, economical and business issues.

slide-48
SLIDE 48

Thank you! Q&A