The Structural Topic Model and Applied Social Science
Molly Roberts, Brandon Stewart, Dustin Tingley, Edoardo Airoldi
Harvard University, Departments of Government and Statistics
December 10, 2013
Roberts Et. Al (Harvard) STM 12/10/2013 1 / 20
The Structural Topic Model and Applied Social Science Molly Roberts, - - PowerPoint PPT Presentation
The Structural Topic Model and Applied Social Science Molly Roberts, Brandon Stewart, Dustin Tingley, Edoardo Airoldi Harvard University, Departments of Government and Statistics December 10, 2013 Roberts Et. Al (Harvard) STM 12/10/2013 1 /
Roberts Et. Al (Harvard) STM 12/10/2013 1 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20
Difference −0.04 −0.02 0.02 0.04
Press Attention − Speech Attention
fund 000 depart million busi water program guard violenc farm disast secur veteran energi land school climat drug histor health beef student nuclear drug trade secur cell honor children social public tax
border vote war iraq bankruptci brac immigr judg budget academi serv
Roberts Et. Al (Harvard) STM 12/10/2013 3 / 20
100 Topics Occuring in "Normal" Fatwas (Jihad Score < 0 )
Non−Jihadi Clerics <−−− topic used more by −−−> Jihadi Clerics (Difference in Topic Frequencies) −0.04 −0.02 0.00 0.02 0.04
The Prophet Ibn Taymiyya Ablutions Money Prayer Permissibility Heaven and Hell Hajj Duty Favorite Jihadi Topics Hadeeth Dating Zakat Surahs and Verses Hadeeth Ramadan Fasting Hadeeth Divorce, Marriage, Sex Fatwa Greeting Formula Favorite Non−Jihadi Topics Sin Sheikh Uthaymeen God's Oneness Quran Knowledge Apostasy Quran Ulama Heaven and Earth Knowledge Evenly Split Topics Bootstrapped 95% Confidence Interval
Roberts Et. Al (Harvard) STM 12/10/2013 4 / 20
0.2 0.4 0.6 0.8 1.0
Topic 1 and Party ID
Mean Topic Proportions Strong Democrat Moderate Strong Republican Treated Control Roberts Et. Al (Harvard) STM 12/10/2013 5 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20
◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014)
Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20
◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014)
Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20
◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014)
Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20
◮ Topical Prevalence: Logistic Normal GLM Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20
◮ Topical Prevalence: Logistic Normal GLM ◮ Topical Content: Multinomial Logit on Words Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20
◮ Topical Prevalence: Logistic Normal GLM ◮ Topical Content: Multinomial Logit on Words
Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20
Graphic from David Blei’s Website: http://www.cs.princeton.edu/ blei/modeling-science.pdf Roberts Et. Al (Harvard) STM 12/10/2013 9 / 20
Language Model: Topic Prevalence: Topical Content:
N
K D
θd ∼ LogisticNormal(µd, Σ) zd,n ∼ Mult(θd) wd,n ∼ Mult(βk=zd,n
d
) µd,k = Xdγk γk ∼ N(0, σ2
k)
σ2
k
∼ Gamma(sγ, rγ)
βk
d,v / exp(mv + κ.,k v
+ κy,.
v + κy,k v )
κy,k
v
∼ Laplace(0, τ y,k
v
) τ y,k
v
∼ Gamma(sκ, rκ)
Roberts Et. Al (Harvard) STM 12/10/2013 10 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the
Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the
Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the
◮ Distribution over words is now document-specific Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the
◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline
Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the
◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline
◮ Documents which have similar covariates will tend to talk about topics
Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N(Xγ, Σ) ◮ Documents which have similar covariates will tend to talk about the
◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline
◮ Documents which have similar covariates will tend to talk about topics
Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
◮ Various meta-data topic models Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
◮ Various meta-data topic models ◮ Post-estimation tools (labeling, evaluation statistics, plotting) Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
◮ Various meta-data topic models ◮ Post-estimation tools (labeling, evaluation statistics, plotting) ◮ Automated model selection Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
◮ Various meta-data topic models ◮ Post-estimation tools (labeling, evaluation statistics, plotting) ◮ Automated model selection ◮ Covariate uncertainty calculation Roberts Et. Al (Harvard) STM 12/10/2013 12 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 13 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 14 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
1
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
1
2
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
1
2
Roberts Et. Al (Harvard) STM 12/10/2013 15 / 20
Topic 4 : want, monei, more, keep, give, myself, make, gave, group, greedi, put, littl, lose, need, figur, even, gain, kept, less, left Topic 1 : believ, good, feel, felt, go, chanc, right, god, decis, life, greater, reason, base, more, profit, fact, out, get, answer, plai
0.00 0.05 0.10 0.15 Difference in Topic Proportions (Treated-Control) Topic 4 Topic 1
Roberts Et. Al (Harvard) STM 12/10/2013 16 / 20
fair middl
possibl
worri
plai choic group
amount team
time
littl
keep wai fact
chanc figur self entir think
take
Roberts Et. Al (Harvard) STM 12/10/2013 17 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration ◮ Background covariates on documents Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools
Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools
◮ A new topic model for incorporating covariate info Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools
◮ A new topic model for incorporating covariate info ◮ New software tools (releasing in the next few weeks) Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
◮ Explanation vs. prediction/exploration ◮ Background covariates on documents ◮ Need off-the-shelf tools
◮ A new topic model for incorporating covariate info ◮ New software tools (releasing in the next few weeks) ◮ Methods for model selection, labeling topics and others Roberts Et. Al (Harvard) STM 12/10/2013 18 / 20
Roberts Et. Al (Harvard) STM 12/10/2013 19 / 20
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
LDA
Covariate Proportion in Topic of Interest 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
STM
Covariate Proportion in Topic of Interest Roberts Et. Al (Harvard) STM 12/10/2013 20 / 20