The Structural Topic Model and Applied Social Science Molly Roberts, - - PowerPoint PPT Presentation

the structural topic model and applied social science
SMART_READER_LITE
LIVE PREVIEW

The Structural Topic Model and Applied Social Science Molly Roberts, - - PowerPoint PPT Presentation

The Structural Topic Model and Applied Social Science Molly Roberts, Brandon Stewart, Dustin Tingley, Edoardo Airoldi Harvard University, Departments of Government and Statistics December 10, 2013 Roberts Et. Al (Harvard) STM 12/10/2013 1 /


slide-1
SLIDE 1

The Structural Topic Model and Applied Social Science

Molly Roberts, Brandon Stewart, Dustin Tingley, Edoardo Airoldi

Harvard University, Departments of Government and Statistics

December 10, 2013

Roberts Et. Al (Harvard) STM 12/10/2013 1 / 20

slide-2
SLIDE 2

Related Work

Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20

slide-3
SLIDE 3

Related Work

Roberts ME, Stewart BM, Airoldi EM. A Topic Model for Experimentation in the Social Sciences.

Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20

slide-4
SLIDE 4

Related Work

Roberts ME, Stewart BM, Airoldi EM. A Topic Model for Experimentation in the Social Sciences. Roberts ME, Stewart BM, Tingley D, Lucas C, Leder-Luis J, Gadarian S, Albertson B, Rand D. Structural topic models for open-ended survey responses. Forthcoming at American Journal of Political Science.

Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20

slide-5
SLIDE 5

How Do Senators Relate to Constituents?

Difference −0.04 −0.02 0.02 0.04

Press Attention − Speech Attention

fund 000 depart million busi water program guard violenc farm disast secur veteran energi land school climat drug histor health beef student nuclear drug trade secur cell honor children social public tax

  • il

border vote war iraq bankruptci brac immigr judg budget academi serv

Grimmer (2010, 2013)

Roberts Et. Al (Harvard) STM 12/10/2013 3 / 20

slide-6
SLIDE 6

Why do some Muslim clerics support violent Jihad?

100 Topics Occuring in "Normal" Fatwas (Jihad Score < 0 )

Non−Jihadi Clerics <−−− topic used more by −−−> Jihadi Clerics (Difference in Topic Frequencies) −0.04 −0.02 0.00 0.02 0.04

  • ● ●
  • ● ●
  • ● ● ● ●
  • ● ●
  • ● ● ● ●
  • Shariah

The Prophet Ibn Taymiyya Ablutions Money Prayer Permissibility Heaven and Hell Hajj Duty Favorite Jihadi Topics Hadeeth Dating Zakat Surahs and Verses Hadeeth Ramadan Fasting Hadeeth Divorce, Marriage, Sex Fatwa Greeting Formula Favorite Non−Jihadi Topics Sin Sheikh Uthaymeen God's Oneness Quran Knowledge Apostasy Quran Ulama Heaven and Earth Knowledge Evenly Split Topics Bootstrapped 95% Confidence Interval

Nielsen (2013)

Roberts Et. Al (Harvard) STM 12/10/2013 4 / 20

slide-7
SLIDE 7

How do we analyze open-ended survey response?

  • 0.0

0.2 0.4 0.6 0.8 1.0

Topic 1 and Party ID

Mean Topic Proportions Strong Democrat Moderate Strong Republican Treated Control Roberts Et. Al (Harvard) STM 12/10/2013 5 / 20

slide-8
SLIDE 8

Social Sciences Applications

These problems share a common structure:

Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

slide-9
SLIDE 9

Social Sciences Applications

These problems share a common structure: Topic models as a tool of measurement

Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

slide-10
SLIDE 10

Social Sciences Applications

These problems share a common structure: Topic models as a tool of measurement

◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014)

Extensive “metadata” in documents

Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

slide-11
SLIDE 11

Social Sciences Applications

These problems share a common structure: Topic models as a tool of measurement

◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014)

Extensive “metadata” in documents Topical Prevalence and Topical Content

Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

slide-12
SLIDE 12

Social Sciences Applications

These problems share a common structure: Topic models as a tool of measurement

◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014)

Extensive “metadata” in documents Topical Prevalence and Topical Content Primary QOI is how external variable drives topics.

Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

slide-13
SLIDE 13

In Practice

Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

slide-14
SLIDE 14

In Practice

‘Vanilla” LDA with post-hoc comparison

Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

slide-15
SLIDE 15

In Practice

‘Vanilla” LDA with post-hoc comparison The exchangeability paradox.

Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

slide-16
SLIDE 16

In Practice

‘Vanilla” LDA with post-hoc comparison The exchangeability paradox. Custom Models vs. Off the Shelf

Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

slide-17
SLIDE 17

Our Approach

General framework for including covariates

Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

slide-18
SLIDE 18

Our Approach

General framework for including covariates General framework for including covariates

Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

slide-19
SLIDE 19

Our Approach

General framework for including covariates General framework for including covariates Two types of covariates:

Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

slide-20
SLIDE 20

Our Approach

General framework for including covariates General framework for including covariates Two types of covariates:

◮ Topical Prevalence: Logistic Normal GLM Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

slide-21
SLIDE 21

Our Approach

General framework for including covariates General framework for including covariates Two types of covariates:

◮ Topical Prevalence: Logistic Normal GLM ◮ Topical Content: Multinomial Logit on Words Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

slide-22
SLIDE 22

Our Approach

General framework for including covariates General framework for including covariates Two types of covariates:

◮ Topical Prevalence: Logistic Normal GLM ◮ Topical Content: Multinomial Logit on Words

Builds off: DMR (Mimno and McCallum 2008), SAGE (Eisenstein et al 2011) and the CTM (Blei and Lafferty 2007)

Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

slide-23
SLIDE 23

Latent Dirichlet Allocation

Figure: Plate Notation of Latent Dirichlet Allocation

Graphic from David Blei’s Website: http://www.cs.princeton.edu/ blei/modeling-science.pdf Roberts Et. Al (Harvard) STM 12/10/2013 9 / 20

slide-24
SLIDE 24

Structural Topic Model

Language Model: Topic Prevalence: Topical Content:

µ θ z w β

N

X

γ κ

Y

K D

Σ

θd ∼ LogisticNormal(µd, Σ) zd,n ∼ Mult(θd) wd,n ∼ Mult(βk=zd,n

d

) µd,k = Xdγk γk ∼ N(0, σ2

k)

σ2

k

∼ Gamma(sγ, rγ)

βk

d,v / exp(mv + κ.,k v

+ κy,.

v + κy,k v )

κy,k

v

∼ Laplace(0, τ y,k

v

) τ y,k

v

∼ Gamma(sκ, rκ)

Roberts Et. Al (Harvard) STM 12/10/2013 10 / 20

slide-25
SLIDE 25

A Tale of Two Covariates

Prevalence

Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-26
SLIDE 26

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-27
SLIDE 27

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-28
SLIDE 28

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the

same topics.

Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-29
SLIDE 29

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the

same topics.

Content

Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-30
SLIDE 30

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the

same topics.

Content

◮ Distribution over words is now document-specific Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-31
SLIDE 31

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the

same topics.

Content

◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline

βk,g ∝ exp(m + κ(k) + κ(g) + κ(k,g))

Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-32
SLIDE 32

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the

same topics.

Content

◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline

βk,g ∝ exp(m + κ(k) + κ(g) + κ(k,g))

◮ Documents which have similar covariates will tend to talk about topics

in the same way.

Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-33
SLIDE 33

A Tale of Two Covariates

Prevalence

◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the

same topics.

Content

◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline

βk,g ∝ exp(m + κ(k) + κ(g) + κ(k,g))

◮ Documents which have similar covariates will tend to talk about topics

in the same way.

Regularizing priors to avoid false positives

Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

slide-34
SLIDE 34

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM

Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-35
SLIDE 35

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM Propagating estimation uncertainty (method of composition)

Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-36
SLIDE 36

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM Propagating estimation uncertainty (method of composition) Forthcoming R package

Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-37
SLIDE 37

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM Propagating estimation uncertainty (method of composition) Forthcoming R package

◮ Various meta-data topic models Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-38
SLIDE 38

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM Propagating estimation uncertainty (method of composition) Forthcoming R package

◮ Various meta-data topic models ◮ Post-estimation tools (labeling, evaluation statistics, plotting) Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-39
SLIDE 39

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM Propagating estimation uncertainty (method of composition) Forthcoming R package

◮ Various meta-data topic models ◮ Post-estimation tools (labeling, evaluation statistics, plotting) ◮ Automated model selection Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-40
SLIDE 40

Inference and Implementation

Semi-collapsed, non-conjugate, mean-field variational EM Propagating estimation uncertainty (method of composition) Forthcoming R package

◮ Various meta-data topic models ◮ Post-estimation tools (labeling, evaluation statistics, plotting) ◮ Automated model selection ◮ Covariate uncertainty calculation Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20

slide-41
SLIDE 41

Applications

In This Paper: Open-Ended Survey Response (1 of 3) Media Coverage of China (short example from longer paper)

Roberts Et. Al (Harvard) STM 12/10/2013 13 / 20

slide-42
SLIDE 42

Open-Ended Response

Researchers opt for closed ended responses.

Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20

slide-43
SLIDE 43

Open-Ended Response

Researchers opt for closed ended responses.This requires, Choosing an arbitrary scale Choosing researcher defined categories. Sometimes putting an “other” open ended option.

Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20

slide-44
SLIDE 44

Open-Ended Response

Researchers opt for closed ended responses.This requires, Choosing an arbitrary scale Choosing researcher defined categories. Sometimes putting an “other” open ended option. A debate exists on whether this is a good idea.

Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20

slide-45
SLIDE 45

Open-Ended Response

Researchers opt for closed ended responses.This requires, Choosing an arbitrary scale Choosing researcher defined categories. Sometimes putting an “other” open ended option. A debate exists on whether this is a good idea. There are workflow advantages to closed ended responses.

Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20

slide-46
SLIDE 46

Open-Ended Response

Researchers opt for closed ended responses.This requires, Choosing an arbitrary scale Choosing researcher defined categories. Sometimes putting an “other” open ended option. A debate exists on whether this is a good idea. There are workflow advantages to closed ended responses. In 10 minutes I can move from a mTurk survey, get 100 closed ended responses to questions, put the data in R and type lm()

Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20

slide-47
SLIDE 47

Open-Ended Response

Researchers opt for closed ended responses.This requires, Choosing an arbitrary scale Choosing researcher defined categories. Sometimes putting an “other” open ended option. A debate exists on whether this is a good idea. There are workflow advantages to closed ended responses. In 10 minutes I can move from a mTurk survey, get 100 closed ended responses to questions, put the data in R and type lm() We want open-ended analysis to be (almost) that easy.

Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20

slide-48
SLIDE 48

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.”

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-49
SLIDE 49

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.” Gut responses are cooperative

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-50
SLIDE 50

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.” Gut responses are cooperative calculated responses lead to defection in prisoner’s dilemma

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-51
SLIDE 51

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.” Gut responses are cooperative calculated responses lead to defection in prisoner’s dilemma Subjects were told to

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-52
SLIDE 52

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.” Gut responses are cooperative calculated responses lead to defection in prisoner’s dilemma Subjects were told to

1

Write about when they have acted out of intuition, or feeling

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-53
SLIDE 53

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.” Gut responses are cooperative calculated responses lead to defection in prisoner’s dilemma Subjects were told to

1

Write about when they have acted out of intuition, or feeling

2

Write about a time when they reflected and thought a lot about something.

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-54
SLIDE 54

Survey Experiment

Rand et al., Nature, “Spontaneous giving and calculated greed.” Gut responses are cooperative calculated responses lead to defection in prisoner’s dilemma Subjects were told to

1

Write about when they have acted out of intuition, or feeling

2

Write about a time when they reflected and thought a lot about something.

Afterward, subjects asked to describe their reasoning.

Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20

slide-55
SLIDE 55

Intuition Priming Effects

Topic 4 : want, monei, more, keep, give, myself, make, gave, group, greedi, put, littl, lose, need, figur, even, gain, kept, less, left Topic 1 : believ, good, feel, felt, go, chanc, right, god, decis, life, greater, reason, base, more, profit, fact, out, get, answer, plai

  • 0.20
  • 0.10

0.00 0.05 0.10 0.15 Difference in Topic Proportions (Treated-Control) Topic 4 Topic 1

Roberts Et. Al (Harvard) STM 12/10/2013 16 / 20

slide-56
SLIDE 56

Different Intuitive Strategy: Women vs. Men

  • god

life make

fair middl

  • n

possibl

still best same decid

thing

someon

decis

seem

go try

worri

  • ther

end

believ

plai choic group

more

benefit

amount team

time

much share hope will

littl

do

  • ut

right feel

keep wai fact

peopl

know

risk

trust

chanc figur self entir think

felt

good

take

Men Women

Roberts Et. Al (Harvard) STM 12/10/2013 17 / 20

slide-57
SLIDE 57

Conclusion

Applied Social Science

Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-58
SLIDE 58

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-59
SLIDE 59

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration ◮ Background covariates on documents Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-60
SLIDE 60

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-61
SLIDE 61

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools

Our Contribution

Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-62
SLIDE 62

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools

Our Contribution

◮ A new topic model for incorporating covariate info Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-63
SLIDE 63

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools

Our Contribution

◮ A new topic model for incorporating covariate info ◮ New software tools (releasing in the next few weeks) Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-64
SLIDE 64

Conclusion

Applied Social Science

◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools

Our Contribution

◮ A new topic model for incorporating covariate info ◮ New software tools (releasing in the next few weeks) ◮ Methods for model selection, labeling topics and others Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20

slide-65
SLIDE 65

Thanks!

Papers at:

scholar.harvard.edu/~bstewart

Roberts Et. Al (Harvard) STM 12/10/2013 19 / 20

slide-66
SLIDE 66

LDA and STM

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

LDA

Covariate Proportion in Topic of Interest 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

STM

Covariate Proportion in Topic of Interest Roberts Et. Al (Harvard) STM 12/10/2013 20 / 20