Intertemporal topic correlations in online media A comparative study - - PowerPoint PPT Presentation

intertemporal topic correlations in online media
SMART_READER_LITE
LIVE PREVIEW

Intertemporal topic correlations in online media A comparative study - - PowerPoint PPT Presentation

Rationale Methodology Dataset Results Intertemporal topic correlations in online media A comparative study on weblogs and news websites Jean-Philippe Cointet*, Emmanuel Faure*, Camille Roth** *CREA, CNRS/Ecole Polytechnique, Paris, France


slide-1
SLIDE 1

Rationale Methodology Dataset Results

Intertemporal topic correlations in online media

A comparative study on weblogs and news websites Jean-Philippe Cointet*, Emmanuel Faure*, Camille Roth**

*CREA, CNRS/Ecole Polytechnique, Paris, France **CRESS, Department of Sociology, University of Surrey, Guildford, UK

March 28, 2007 — First ICWSM, Boulder, Col., USA

slide-2
SLIDE 2

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?
slide-3
SLIDE 3

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 1

slide-4
SLIDE 4

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 2

slide-5
SLIDE 5

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 3

slide-6
SLIDE 6

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 4

slide-7
SLIDE 7

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 5

slide-8
SLIDE 8

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 6

slide-9
SLIDE 9

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 7

slide-10
SLIDE 10

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 8

slide-11
SLIDE 11

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 9

slide-12
SLIDE 12

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 10

slide-13
SLIDE 13

Rationale Methodology Dataset Results

Context

Mimicking behaviors Are there some regularities in the manner in which some group(s) of agents address and discuss issues, after some

  • ther group(s) of agents did?

press blogs t = 11

slide-14
SLIDE 14

Rationale Methodology Dataset Results

Context

Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources

slide-15
SLIDE 15

Rationale Methodology Dataset Results

Context

Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample.

slide-16
SLIDE 16

Rationale Methodology Dataset Results

Context

Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint

Realism of studying information diffusion within blog networks (systems) questionable in some instances...

media links blogs links personal links

slide-17
SLIDE 17

Rationale Methodology Dataset Results

Context

Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint

Realism of studying information diffusion within blog networks (systems) questionable in some instances...

personal links blogs links

slide-18
SLIDE 18

Rationale Methodology Dataset Results

Context

Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint

Realism of studying information diffusion within blog networks (systems) questionable in some instances...

blogs links

slide-19
SLIDE 19

Rationale Methodology Dataset Results

Context

Intertemporal correlations We are interested in generalized patterns of intertemporal topic correlations between various information sources Weaker hypothesis: bloggers are part of a larger system of which they are an “easily” observable sample. Global, macro-level viewpoint

Realism of studying information diffusion within blog networks (systems) questionable in some instances... ...but we may always focus on dynamic patterns by creating a map of systematic topic correlations

slide-20
SLIDE 20

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-21
SLIDE 21

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-22
SLIDE 22

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-23
SLIDE 23

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-24
SLIDE 24

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-25
SLIDE 25

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-26
SLIDE 26

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-27
SLIDE 27

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-28
SLIDE 28

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-29
SLIDE 29

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-30
SLIDE 30

Rationale Methodology Dataset Results

Context

press blogs β blogs α blogs γ

slide-31
SLIDE 31

Rationale Methodology Dataset Results

Causal-states models

Signal press 1 1 1 1 1 ... blogs 1 1 1 1 1 1 ...

slide-32
SLIDE 32

Rationale Methodology Dataset Results

Causal-states models

Signal press 1 1 1 1 1 ... blogs 1 1 1 1 1 1 ... signal a a b b c c c d d d a ...

slide-33
SLIDE 33

Rationale Methodology Dataset Results

Causal-states models

Signal press 1 1 1 1 1 ... blogs 1 1 1 1 1 1 ... signal a a b b c c c d d d a ... Reconstructing a state-based dynamics

c|.5

1 2 3 4

b|.5 a|.5 b|.5 c|.67 a|.33 d|.67 d|.33

slide-34
SLIDE 34

Rationale Methodology Dataset Results

Causal-states models

More complicated signal and alphabet...

blogs α 1 1 1 1 1 1 1 1 blogs β 1 1 1 1 1 1 1 1 blogs γ 1 1 1 1 1 1 1 1 press 1 1 1 1 1 1 1 1 alphabet a b c d e f g h A B C D E F G H signal ...A H H f d c a e F f b c b d e F...

· · · →

press blogs β blogs α blogs γ

press blogs β blogs α blogs γ

press blogs β blogs α blogs γ

→ · · ·

slide-35
SLIDE 35

Rationale Methodology Dataset Results

Causal-states models

Causal-state machine

(Crutchfield & Young, 1989; Shalizi, 2001)

automatically inferring (variable-length) hidden states... ...made of equivalence classes of signal histories... ...along with transition probabilities.

slide-36
SLIDE 36

Rationale Methodology Dataset Results

Causal-states models

Causal-state machine

(Crutchfield & Young, 1989; Shalizi, 2001)

automatically inferring (variable-length) hidden states... ...made of equivalence classes of signal histories... ...along with transition probabilities.

signal A H H a a a h H H a a a A H H... A|0.17

H A;h a

a|0.66 H|0.5 H|1 a|0.5 h|0.17

slide-37
SLIDE 37

Rationale Methodology Dataset Results

Data

Hand-made selection

Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.

slide-38
SLIDE 38

Rationale Methodology Dataset Results

Data

Hand-made selection

Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.

Practical matters

Creation of blog groups

Classical Salton (1975) categorization Three groups: α, β, γ, plus the press

Roughly left-, right-, indep.-leaning (α, β, γ)

slide-39
SLIDE 39

Rationale Methodology Dataset Results

Data

Hand-made selection

Sample of 33 very active political blogs, 6 press sources. Daily collection of posts during November 2006: presidential primary for the French Parti Socialiste (center-left). Selection of 75 (lemmatized) terms — this set makes our “topics”.

Practical matters

Creation of blog groups

Classical Salton (1975) categorization Three groups: α, β, γ, plus the press

Roughly left-, right-, indep.-leaning (α, β, γ)

Signal creation

For each term: evolution of

  • ccurrences in each blog group

transformed into a signal vector. (...A B d c...)

slide-40
SLIDE 40

Rationale Methodology Dataset Results

Causal-state machine

S 0 : {a; G} S 1 : {b; c; d; f; g; A; C; E; b} S 2 : {B; D; F; H} S 3 : {h} S 4 : {e}

slide-41
SLIDE 41

Rationale Methodology Dataset Results

Thanks!

e-mails c.roth@surrey.ac.uk cointet@shs.polytechnique.fr faure@shs.polytechnique.fr