Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models
Jack Baskin School of Engineering University of California, Santa Cruz
James Foulds Shachi Kumar Lise Getoor
A Versatile Probabilistic Programming Framework for Topic Models - - PowerPoint PPT Presentation
Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models James Foulds Shachi Kumar Lise Getoor Jack Baskin School of Engineering University of California, Santa Cruz Probabilistic latent variable modeling Data
Jack Baskin School of Engineering University of California, Santa Cruz
James Foulds Shachi Kumar Lise Getoor
2
Complicated, noisy, high-dimensional
3
Understand, explore, predict
Complicated, noisy, high-dimensional
4
Understand, explore, predict
Complicated, noisy, high-dimensional Latent variable model
5
Understand, explore, predict
Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model
– Authorship (Rosen-Zvi et al., 2004) – Conversational Influence (Nguyen et al., 2014) – Knowledge base construction (Movshovitz-Attias and Cohen, 2015) – Machine translation (Mimno et al., 2009) – Political analysis (Grimmer, 2010), (Gerrish and Blei, 2011, 2012) – Recommender systems (Wang and Blei, 2011), (Diao et al., 2014) – Scientific impact (Dietz et al. 2007), (Foulds and Smyth, 2013) – Social network analysis (Chang et al., 2009) – Word-sense disambiguation (Boyd-Graber et al., 2007) – …
6
7
8
9
Sparse, stochastic, collapsed, distributed algorithms, …
10
Sparse, stochastic, collapsed, distributed algorithms, … Max Welling There’s no end to speeding up LDA!
11
12
Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model
13
Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model
14
Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model (Algorithm, model) pair carefully co-designed for tractability
15
Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model (Algorithm, model) pair carefully co-designed for tractability Evaluate, iterate
16
Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations General-purpose modeling framework Evaluate, iterate
17
18
19
Z 𝚾 W 𝛊
LDA likelihood
Z 𝛊 𝚾 𝚾 𝚾 W 𝛊 𝛊 𝚾
Networks of dependencies between topics, distributions over topics LDA likelihood
Z 𝛊 𝚾 𝚾 𝚾 W X Observed covariates 𝛊 𝛊 𝚾 X
Networks of dependencies between topics, distributions over topics LDA likelihood
Z 𝛊 𝚾 𝚾 𝚾 W X Observed covariates Y Y Labeled data 𝛊 𝛊 𝚾 X
Networks of dependencies between topics, distributions over topics LDA likelihood
Z 𝛊 𝚾 𝚾 𝚾 W X Observed covariates Y Y Labeled data 𝛊 Z 𝛊 𝚾 X Z Latent variables
Networks of dependencies between topics, distributions over topics LDA likelihood
25
26
27
28
29
30
Shachi Kumar Master’s student, UCSC
31
Correlations / Dependencies Observed Covariates Additional Latent Variables Constraints Probabilistic Programming Systems for Encoding Domain Knowledge, Covariates, and Correlations
CTM (Blei and Lafferty, 2007)
DMR (Mimno & McCallum, 2008)
Dirichlet Forests (Andzejewski et al., 2009
xLDA (Wahabzada et al., 2010)
SAGE (Eisenstein et al., 2011)
STM (Roberts et al., 2013)
Graphical Modeling and Probabilistic Programming Systems
CTRF (Zhu & Xing, 2010)
Fold.all (Andrzejewski et al., 2011)
Logic LDA (Mei et al., 2014)
Latent Topic Networks
32
Correlations / Dependencies Observed Covariates Additional Latent Variables Constraints Probabilistic Programming Systems for Encoding Domain Knowledge, Covariates, and Correlations
CTM (Blei and Lafferty, 2007)
DMR (Mimno & McCallum, 2008)
Dirichlet Forests (Andzejewski et al., 2009
xLDA (Wahabzada et al., 2010)
SAGE (Eisenstein et al., 2011)
STM (Roberts et al., 2013)
Graphical Modeling and Probabilistic Programming Systems
CTRF (Zhu & Xing, 2010)
Fold.all (Andrzejewski et al., 2011)
Logic LDA (Mei et al., 2014)
Latent Topic Networks
33
Foulds and Smyth (2013), EMNLP
34
Foulds and Smyth (2013), EMNLP
35
Foulds and Smyth (2013), EMNLP
36
Foulds and Smyth (2013), EMNLP
Latent variables for document influence citation edge influence
37
Probabilistic dependencies along the citation graph
Foulds and Smyth (2013), EMNLP
Latent variables for document influence citation edge influence
38
Restrict dependencies to citation graph Influence and topic value are both high Citing document also has the topic
39
Restrict dependencies to citation graph Influence and topic value are both high Citing document also has the topic
40
Restrict dependencies to citation graph Influence and topic are both high Citing document also has the topic
41
Restrict dependencies to citation graph Influence and topic are both high Citing document also has the topic
42
Restrict dependencies to citation graph Influence and topic are both high Citing document also has the topic
43
44
5.0: Predicate Logical operators Rule weight
45
5.0: Predicate Logical operators Rule weight
46
5.0: Predicate Logical operators Rule weight
47
5.0: Predicate Logical operators Rule weight
48
5.0: Predicate Logical operators Rule weight
49
5.0: Predicate Logical operators Rule weight
50
Conditional random field over continuous random variables between 0 and 1
51
Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions
52
Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1
53
Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Linear function
54
Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Linear function
55
Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Linear function
56
Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Hinge losses encode the distance to satisfaction for each instantiated rule
Linear function
57
58
59
60
LDA log posterior Hinge loss terms
61
LDA log posterior Hinge loss terms
62
LDA log posterior Hinge loss terms
63
LDA log posterior Hinge loss terms
64
LDA log posterior Hinge loss terms
65
66
LDA EM lower bound
67
LDA EM lower bound minus hinge loss terms
68
Convex optimization! Solve in parallel using consensus ADMM LDA EM lower bound minus hinge loss terms
69
70
71
72
73
74
75
76
77
State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)
78
State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)
79
State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)
80
State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)
81
State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)
82
Democrat topic Republican topic
83
Republican topic
84
Democrat topic
85
86
WW I WW II
87
WW I WW II Vietnam
88
Document Completion Perplexity Fully Held-Out Perplexity Latent topic networks 2.33 x 103 2.43 x 103 LDA topic model 2.36 x 103 2.59 x 103 Dynamic topic model 2.43 x 103 2.55 x 103
– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …
89
– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …
90
– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …
91
92
– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …
93
– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …
94