Supervised and Relational Topic Models David M. Blei Department of - - PowerPoint PPT Presentation
Supervised and Relational Topic Models David M. Blei Department of - - PowerPoint PPT Presentation
Supervised and Relational Topic Models David M. Blei Department of Computer Science Princeton University October 5, 2009 Joint work with Jonathan Chang and Jon McAuliffe Topic modeling Large electronic archives of document collections
Topic modeling
- Large electronic archives of document collections require new
statistical tools for analyzing text.
- Topic models have emerged as a powerful technique for
unsupervised analysis of large document collections.
- Topic models posit latent topics in text using hidden random
variables, and uncover that structure with posterior inference.
- Useful for tasks like browsing, search, information retrieval, etc.
Examples of topic modeling
contractual employment female markets criminal expectation industrial men earnings discretion gain local women investors justice promises jobs see sec civil expectations employees sexual research process breach relations note structure federal enforcing unfair employer managers see supra agreement discrimination firm
- fficer
note economic harassment risk parole perform case gender large inmates
Examples of topic modeling
the,of a, is and
n algorithm time log bound n functions polynomial log algorithm logic programs systems language sets system systems performance analysis distributed
graph graphs edge minimum vertices proof property program resolution abstract consensus
- bjects
messages protocol asynchronous networks queuing asymptotic productform server approximation s points distance convex
database constraints algebra boolean relational
formulas firstorder decision temporal queries trees regular tree search compression machine domain degree degrees polynomials routing adaptive network networks protocols
networks protocol network packets link
database transactions retrieval concurrency restrictions learning learnable statistical examples classes m merging networks sorting multiplication constraint dependencies local consistency tractable logic logics query theories languages quantum automata nc automaton languages
- nline
scheduling task competitive tasks learning knowledge reasoning verification circuit An optimal algorithm for intersecting line segments in the plane Recontamination does not help to search a graph A new approach to the maximum-flow problem The time complexity of maximum matching by simulated annealing Quantum lower bounds by polynomials On the power of bounded concurrency I: finite automata Dense quantum coding and quantum finite automata Classical physics and the Church--Turing Thesis Nearly optimal algorithms and bounds for multilayer channel routing How bad is selfish routing? Authoritative sources in a hyperlinked environment Balanced sequences and optimal routing Single-class bounds of multi-class queuing networks The maximum concurrent flow problem Contention in shared memory algorithms Linear probing with a nonuniform address distribution Magic Functions: In Memoriam: Bernard M. Dwork 1923--1998 A mechanical proof of the Church-Rosser theorem Timed regular expressions On the power and limitations of strictness analysis Module algebra On XML integrity constraints in the presence of DTDs Closure properties of constraints Dynamic functional dependencies and database aging
Examples of topic modeling
1880 electric machine power engine steam two machines iron battery wire 1890 electric power company steam electrical machine two system motor engine 1900 apparatus steam power engine engineering water construction engineer room feet 1910 air water engineering apparatus room laboratory engineer made gas tube 1920 apparatus tube air pressure water glass gas made laboratory mercury 1930 tube apparatus glass air mercury laboratory pressure made gas small 1940 air tube apparatus glass laboratory rubber pressure small mercury gas 1950 tube apparatus glass air chamber instrument small laboratory pressure rubber 1960 tube system temperature air heat chamber power high instrument control 1970 air heat power system temperature chamber high flow tube design 1980 high power design heat system systems devices instruments control large 1990 materials high power current applications technology devices design device heat 2000 devices device materials current gate high light silicon material technology
Examples of topic modeling
wild type mutant mutations mutants mutation
plants plant gene genes arabidopsis p53 cell cycle activity cyclin regulation amino acids cdna sequence isolated protein gene disease mutations families mutation rna dna rna polymerase cleavage site cells cell expression cell lines bone marrow
united states women universities students education
science scientists says research people research funding support nih program
surface tip image sample device
laser
- ptical
light electrons quantum materials
- rganic
polymer polymers molecules
volcanic deposits magma eruption volcanism
mantle crust upper mantle meteorites ratios earthquake earthquakes fault images data ancient found impact million years ago africa climate
- cean
ice changes climate change
cells proteins researchers protein found
patients disease treatment drugs clinical
genetic population populations differences variation
fossil record birds fossils dinosaurs fossil sequence sequences genome dna sequencing bacteria bacterial host resistance parasite development embryos drosophila genes expression species forest forests populations ecosystems
synapses ltp glutamate synaptic neurons
neurons stimulus motor visual cortical
- zone
atmospheric measurements stratosphere concentrations
sun solar wind earth planets planet co2 carbon carbon dioxide methane water
receptor receptors ligand ligands apoptosis
proteins protein binding domain domains activated tyrosine phosphorylation activation phosphorylation kinase magnetic magnetic field spin superconductivity superconducting physicists particles physics particle experiment surface liquid surfaces fluid model
reaction reactions molecule molecules transition state
enzyme enzymes iron active site reduction pressure high pressure pressures core inner core
brain memory subjects left task
computer problem information computers problems
stars astronomers universe galaxies galaxy
virus hiv aids infection viruses mice antigen t cells antigens immune response
Supervised topic models
- These applications of topic modeling work in the same way.
- Fit a model using a likelihood criterion. Then, hope that the
resulting model is useful for the task at hand.
- Supervised topic models and relational topic models fit
topics explicitly to perform prediction.
- Useful for building topic models that can
- Predict the rating of a review
- Predict the category of an image
- Predict the links emitted from a document
Outline
1 Unsupervised topic models 2 Supervised topic models 3 Relational topic models
Probabilistic modeling
1 Treat data as observations that arise from a generative
probabilistic process that includes hidden variables
- For documents, the hidden variables reflect the thematic
structure of the collection.
2 Infer the hidden structure using posterior inference
- What are the topics that describe this collection?
3 Situate new data into the estimated model.
- How does this query or new document fit into the estimated
topic structure?
Intuition behind LDA
Simple intuition: Documents exhibit multiple topics.
Generative model
gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01
- rganism 0.01
.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,
Topics Documents Topic proportions and assignments
- Each document is a random mixture of corpus-wide topics
- Each word is drawn from one of those topics
The posterior distribution
Topics Documents Topic proportions and assignments
- In reality, we only observe the documents
- Our goal is to infer the underlying topic structure
Latent Dirichlet allocation
θd Zd,n Wd,n N D K
βk
α
η
Dirichlet parameter Per-document topic proportions Per-word topic assignment Observed word Topics Topic hyperparameter
Each piece of the structure is a random variable.
Latent Dirichlet allocation
θd Zd,n Wd,n N D K
βk
α
η
βk ∼ Dir(η) k = 1 . . . K θd ∼ Dir(α) d = 1 . . . D Zd,n | θd ∼ Mult(1, θd) d = 1 . . . D, n = 1 . . . N Wd,n | θd, zd,n, β1:K ∼ Mult(1, βzd,n) d = 1 . . . D, n = 1 . . . N
Latent Dirichlet allocation
θd Zd,n Wd,n N D K
βk
α
η
1 Draw each topic βk ∼ Dir(η), for k ∈ {1, . . . , K}. 2 For each document: 1 Draw topic proportions θd ∼ Dir(α). 2 For each word: 1 Draw Zd,n ∼ Mult(θd). 2 Draw Wd,n ∼ Mult(βzd,n).
Latent Dirichlet allocation
θd Zd,n Wd,n N D K
βk
α
η
- From a collection of documents, infer
- Per-word topic assignment zd,n
- Per-document topic proportions θd
- Per-corpus topic distributions βk
- Use posterior expectations to perform the task at hand, e.g.,
information retrieval, document similarity, etc.
Latent Dirichlet allocation
θd Zd,n Wd,n N D K
βk
α
η
- Computing the posterior is intractable:
p(θ | α) N
n=1 p(zn | θ)p(wn | zn, β1:K)
- θ p(θ | α) N
n=1
K
z=1 p(zn | θ)p(wn | zn, β1:K)
- Several approximation techniques have been developed.
Latent Dirichlet allocation
θd Zd,n Wd,n N D K
βk
α
η
- Mean field variational methods (Blei et al., 2001, 2003)
- Expectation propagation (Minka and Lafferty, 2002)
- Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)
- Collapsed variational inference (Teh et al., 2006)
Example inference
1 8 16 26 36 46 56 66 76 86 96 Topics Probability 0.0 0.1 0.2 0.3 0.4
Example topics
human evolution disease computer genome evolutionary host models dna species bacteria information genetic
- rganisms
diseases data genes life resistance computers sequence
- rigin
bacterial system gene biology new network molecular groups strains systems sequencing phylogenetic control model map living infectious parallel information diversity malaria methods genetics group parasite networks mapping new parasites software project two united new sequences common tuberculosis simulations
Used in exploratory tools of document collections
LDA summary
- LDA is a powerful model for
- Visualizing the hidden thematic structure in large corpora
- Generalizing new data to fit into that structure
- LDA is a mixed membership model (Erosheva, 2004) that builds
- n the work of Deerwester et al. (1990) and Hofmann (1999).
- For document collections and other grouped data, this might be
more appropriate than a simple finite mixture.
- The same model was independently invented for population
genetics analysis (Pritchard et al., 2000).
LDA summary
- Modular: It can be embedded in more complicated models.
- General: The data generating distribution can be changed.
- Variational inference is fast; lets us to analyze large data sets.
- See Blei et al., 2003 for details and a quantitative comparison.
See my web-site for code and other papers.
- Jonathan Chang’s excellent R package “lda” contains Gibbs
sampling code for this model and many others.
Supervised topic models
- But LDA is an unsupervised model. How can we build a topic
model that is good at the task we care about?
- Many data are paired with response variables.
- User reviews paired with a number of stars
- Web pages paired with a number of “diggs”
- Documents paired with links to other documents
- Images paired with a category
- Supervised topic models are topic models of documents and
responses, fit to find topics predictive of the response.
Supervised LDA
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word
- Draw topic assignment zn | θ ∼ Mult(θ).
- Draw word wn | zn, β1:K ∼ Mult(βzn).
3 Draw response variable y | z1:N, η, σ2 ∼ N
- η⊤¯
z, σ2 , where ¯ z = (1/N) N
n=1 zn.
Supervised LDA
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
- The response variable y is drawn after the document because it
depends on z1:N, an assumption of partial exchangeability.
- Consequently, y is necessarily conditioned on the words.
- In a sense, this blends generative and discriminative modeling.
Supervised LDA
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
- Given a set of document-response pairs, fit the model
parameters by maximum likelihood.
- Given a new document, compute a prediction of its response.
- Both of these activities hinge on variational inference.
Variational inference (in general)
- Variational methods are a deterministic alternative to MCMC.
- Let x1:N be observations and z1:M be latent variables
- Our goal is to compute the posterior distribution
p(z1:M | x1:N) = p(z1:M, x1:N)
- p(z1:M, x1:N)dz1:M
- For many interesting distributions, the marginal likelihood of the
- bservations is difficult to efficiently compute
Variational inference
- Use Jensen’s inequality to bound the log prob of the
- bservations:
log p(x1:N) = log
- p(z1:M, x1:N)dz1:M
= log
- p(z1:M, x1:N)qν(z1:M)
qν(z1:M)dz1:M ≥ Eqν[log p(z1:M, x1:N)] − Eqν[log qν(z1:M)]
- We have introduced a distribution of the latent variables with free
variational parameters ν.
- We optimize those parameters to tighten this bound.
- This is the same as finding the member of the family qν that is
closest in KL divergence to p(z1:M | x1:N).
Mean-field variational inference
- Factorization of qν determines complexity of optimization
- In mean field variational inference qν is fully factored
qν(z1:M) =
M
- m=1
qνm(zm).
- The latent variables are independent.
- Each is governed by its own variational parameter νm.
- In the true posterior they can exhibit dependence
(often, this is what makes exact inference difficult).
MFVI and conditional exponential families
- Suppose the distribution of each latent variable conditional on all
- ther variables is in the exponential family:
p(zm | z−m, x) = hm(zm) exp{gm(z−m, x)Tzm − am(gi(z−m, x))}
- Assume qν is fully factorized, and each factor is in the same
exponential family as the corresponding conditional: qνm(zm) = hm(zm) exp{νT
mzm − am(νm)}
MFVI and conditional exponential families
- Variational inference is the following coordinate ascent algorithm
νm = Eqν[gm(Z−m, x)]
- Notice the relationship to Gibbs sampling.
Variational inference
- Alternative to MCMC; replace sampling with optimization.
- Deterministic approximation to posterior distribution.
- Uses established optimization methods
(block coordinate ascent; Newton-Raphson; interior-point).
- Faster, more scalable than MCMC for large problems.
- Biased, whereas MCMC is not.
- Emerging as a useful framework for fully Bayesian and empirical
Bayesian inference problems. Many open issues!
- Good papers: Beal’s Ph.D. thesis, Wainwright and Jordan (2009)
Variational inference in sLDA
θd Zd,n Wd,n N D K
βk
α Yd η, σ2
- In sLDA the variational bound is
E[log p(θ | α)] + N
n=1 E[log p(Zn | θ)]
+ N
n=1 E[log p(wn | Zn, β1:K)] + E[log p(y | Z1:N, η, σ2)] + H(q)
- As in Blei, Ng, and Jordan (2003), we use the fully-factorized
variational distribution q(θ, z1:N | γ, φ1:N) = q(θ | γ) N
n=1 q(zn | φn),
Variational inference in sLDA
- The distinguishing term is
E[log p(y | Z1:N, η, σ2)] = −1 2 log
- 2πσ2
− y2 − 2yη⊤E ¯ Z
- + η⊤E
¯ Z ¯ Z ⊤ η 2σ2
- The first expectation is
E ¯ Z
- = ¯
φ := 1
N
N
n=1 φn.
- The second expectation is
E ¯ Z ¯ Z ⊤ =
1 N2
N
n=1
- m=n φnφ⊤
m + N n=1 diag{φn}
- .
- Linear in φn, which leads to an easy coordinate ascent algorithm.
Maximum likelihood estimation
- The M-step is an MLE under expected sufficient statistics.
- Define
- y = y1:D is the response vector
- A is the D × K matrix whose rows are ¯
Z ⊤
d .
- MLE of the coefficients solve the expected normal equations
E
- A⊤A
- η = E[A]⊤y
⇒ ˆ ηnew ←
- E
- A⊤A
−1 E[A]⊤y
- The MLE of the variance is
ˆ σ2
new ← (1/D){y⊤y − y⊤E[A]
- E
- A⊤A
−1 E[A]⊤y}
Prediction
- We have fit SLDA parameters to a corpus, using variational EM.
- We have a new document w1:N with unknown response value.
- First, run variational inference in the unsupervised LDA model, to
- btain γ and φ1:N for the new document.
(LDA ⇔ integrating unobserved Y out of SLDA.)
- Predict y using SLDA expected value:
E
- Y | w1:N, α, β1:K, η, σ2
≈ η⊤Eq ¯ Z
- = η⊤ ¯
φ.
Example: Movie reviews
both motion simple perfect fascinating power complex however cinematography screenplay performances pictures effective picture his their character many while performance between
−30 −20 −10 10 20
- more
has than films director will characters
- ne
from there which who much what awful featuring routine dry
- ffered
charlie paris not about movie all would they its have like you was just some
- ut
bad guys watchable its not
- ne
movie least problem unfortunately supposed worse flat dull
- 10-topic sLDA model on movie reviews (Pang and Lee, 2005).
- Response: Number of stars associated with each review
- Each component of coefficient vector η is associated with a topic.
Predictive R2
(SLDA is red.)
- 5
10 15 20 25 30 35 40 45 50 0.0 0.1 0.2 0.3 0.4 0.5 Number of topics Predictive R2
Held out likelihood
(SLDA is red.)
- 5
10 15 20 25 30 35 40 45 50 −6.42 −6.41 −6.40 −6.39 −6.38 −6.37 Number of topics Per−word held out log likelihood
Diverse response types with GLMs
- Want to work with response variables that don’t live in the reals.
- binary / multiclass classification
- count data
- waiting time
- Model the response response with a generalized linear model
p(y | ζ, δ) = h(y, δ) exp ζy − A(ζ) δ
- ,
where ζ = η⊤¯ z.
- Complicates inference, but allows for flexible modeling.
Example: Multi-class classification
highway car, sign, road inside city buildings, car, sidewalk tall building trees, buildings
- ccluded, window
street tree, car, sidewalk
20 40 60 80 100 120 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 average accuracy image classification on the LabelMe dataset
# of components
(SLDA for image classification, with Chong Wang)
Supervised topic models
- SLDA enables model-based regression where the predictor
“variable” is a text document.
- It can easily be used wherever LDA is used in an unsupervised
fashion (e.g., images, genes, music).
- SLDA is a supervised dimension-reduction technique, whereas
LDA performs unsupervised dimension reduction.
- LDA + regression compared to sLDA is like principal components
regression compared to partial least squares.
- Paper: Blei and McAuliffe, NIPS 2007.
Relational topic models
52 478 430 2487 75 288 1123 2122 2299 1354 1854 1855 89 635 92 2438 136 479 109 640 119 686 120 1959 1539 147 172 177 965 911 2192 1489 885 178 378 286 208 1569 2343 1270 218 1290 223 227 236 1617 254 1176 256 634 264 1963 2195 1377 303 426 2091 313 1642 534 801 335 344 585 1244 2291 2617 1627 2290 1275 375 1027 396 1678 2447 2583 1061 692 1207 960 1238 2012 1644 2042 381 418 1792 1284 651 524 1165 2197 1568 2593 1698 547 683 2137 1637 2557 2033 632 1020 436 442 449 474 649 2636 2300 539 541 603 1047 722 660 806 1121 1138 831 837 1335 902 964 966 981 1673 1140 1481 1432 1253 1590 1060 992 994 1001 1010 1651 1578 1039 1040 1344 1345 1348 1355 1420 1089 1483 1188 1674 1680 2272 1285 1592 1234 1304 1317 1426 1695 1465 1743 1944 2259 2213We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high- accuracy concepts... Irrelevant features and the subset selection problem In many domains, an appropriate inductive bias is the MIN- FEATURES bias, which prefers consistent hypotheses definable
- ver as few features as
possible... Learning with many irrelevant features In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating... Evaluation and selection of biases in machine learning The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method... Utilizing prior concepts for learning The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile... Improving tactical plans with genetic algorithms Evolutionary learning methods have been found to be useful in several areas in the development
- f intelligent robots. In the
approach described here, evolutionary... An evolutionary approach to learning in robots Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior... Using a genetic algorithm to learn strategies for collision avoidance and local navigation
... ... ... ... ... ... ... ... ... ...- Many data sets contain connected observations.
- For example:
- Citation networks of documents
- Hyperlinked networks of web-pages.
- Friend-connected social network profiles
Relational topic models
52 478 430 2487 75 288 1123 2122 2299 1354 1854 1855 89 635 92 2438 136 479 109 640 119 686 120 1959 1539 147 172 177 965 911 2192 1489 885 178 378 286 208 1569 2343 1270 218 1290 223 227 236 1617 254 1176 256 634 264 1963 2195 1377 303 426 2091 313 1642 534 801 335 344 585 1244 2291 2617 1627 2290 1275 375 1027 396 1678 2447 2583 1061 692 1207 960 1238 2012 1644 2042 381 418 1792 1284 651 524 1165 2197 1568 2593 1698 547 683 2137 1637 2557 2033 632 1020 436 442 449 474 649 2636 2300 539 541 603 1047 722 660 806 1121 1138 831 837 1335 902 964 966 981 1673 1140 1481 1432 1253 1590 1060 992 994 1001 1010 1651 1578 1039 1040 1344 1345 1348 1355 1420 1089 1483 1188 1674 1680 2272 1285 1592 1234 1304 1317 1426 1695 1465 1743 1944 2259 2213We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high- accuracy concepts... Irrelevant features and the subset selection problem In many domains, an appropriate inductive bias is the MIN- FEATURES bias, which prefers consistent hypotheses definable
- ver as few features as
possible... Learning with many irrelevant features In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating... Evaluation and selection of biases in machine learning The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method... Utilizing prior concepts for learning The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile... Improving tactical plans with genetic algorithms Evolutionary learning methods have been found to be useful in several areas in the development
- f intelligent robots. In the
approach described here, evolutionary... An evolutionary approach to learning in robots Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior... Using a genetic algorithm to learn strategies for collision avoidance and local navigation
... ... ... ... ... ... ... ... ... ...- Research has focused on finding communities and patterns in
the link-structure of these networks (Kemp et al. 2004, Hoff et al., 2002, Hofman and Wiggins 2007, Airoldi et al. 2008).
- By adapting supervised topic modeling, we can build a good
model of content and structure.
- RTMs find related hidden structure in both types of data.
Relational topic models
α
Nd
θd wd,n zd,n
K
βk yd,d' η
Nd'
θd' wd',n zd',n
- Binary response variable with each pair of documents
- Adapt variational EM algorithm for sLDA with binary GLM
response model (with different link probability functions).
- Allows predictions that are out of reach for traditional models.
Predictive performance of one type given the other
5 10 15 20 25 3600 3550 3500 3450
- 5
10 15 20 25 14.0 13.8 13.6 13.4
- Link log likelihood
Word log likelihood # Topics # Topics
Cora corpus (McCallum et al., 2000)
Note: Traditional models of networks cannot perform these tasks.
Predictive performance of one type given the other
5 10 15 20 25 11.8 11.7 11.6 11.5 11.4
- 5
10 15 20 25 1145 1140 1135 1130
- Link log likelihood
Word log likelihood # Topics # Topics
WebKB corpus (Craven et al., 1998)
Predictive performance of one type given the other
5 10 15 20 25 13.6 13.5 13.4 13.3 13.2
- 5
10 15 20 25 2970 2960 2950 2940
- Link log likelihood
Word log likelihood # Topics # Topics
PNAS corpus (courtesy of JSTOR)
Predicting links from documents
retrieved. Markov chain Monte Carlo convergence diagnostics: A comparative review Minorization conditions and convergence rates for Markov chain Monte Carlo
RTM (ψe)
Rates of convergence of the Hastings and Metropolis algorithms Possible biases induced by MCMC convergence diagnostics Bounding convergence time of the Gibbs sampler in Bayesian image restoration Self regenerative Markov chain Monte Carlo Auxiliary variable methods for Markov chain Monte Carlo with applications Rate of Convergence of the Gibbs Sampler by Gaussian Approximation Diagnosing convergence of Markov chain Monte Carlo algorithms Exact Bound for the Convergence of Metropolis Chains
LDA + Regression
Self regenerative Markov chain Monte Carlo Minorization conditions and convergence rates for Markov chain Monte Carlo Gibbs-markov models Auxiliary variable methods for Markov chain Monte Carlo with applications Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Models Mediating instrumental variables A qualitative framework for probabilistic inference Adaptation for Self Regenerative MCMC
Given a new document, which documents is it likely to link to?
Predicting links from documents
Competitive environments evolve better solutions for complex tasks Coevolving High Level Representations
RTM (ψe)
A Survey of Evolutionary Strategies Genetic Algorithms in Search, Optimization and Machine Learning Strongly typed genetic programming in evolving cooperation strategies Solving combinatorial problems using evolutionary algorithms A promising genetic algorithm approach to job-shop scheduling. . . Evolutionary Module Acquisition An Empirical Investigation of Multi-Parent Recombination Operators. . . A New Algorithm for DNA Sequence Assembly
LDA + Regression
Identification of protein coding regions in genomic DNA Solving combinatorial problems using evolutionary algorithms A promising genetic algorithm approach to job-shop scheduling. . . A genetic algorithm for passive management The Performance of a Genetic Algorithm on a Chaotic Objective Function Adaptive global optimization with local search Mutation rates as adaptations
Given a new document, which documents is it likely to link to?
Spatially consistent topics
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
- Links are the adjacency matrix of states
- Documents are geographically-tagged news articles from Yahoo!
Not exchangeable.
- RTM finds spatially consistent topics.
Summary
- Relational topic modeling allows us to analyze connected
documents, or other data for which the mixed-membership assumptions are appropriate.
- Traditional models cannot predict with new and unlinked data.
- RTMs allow for such predictions
- links given the new words of a document
- words given the links of a new document
- Paper: Chang and Blei, AISTATS 2009.
JSTOR Discipline Analysis
- Another kind of “supervised” topic model is where the topics
have external meaning (Rammage and Manning, 2009)
- JSTOR attaches each journal to a discipline
- E.g., JASA is in Statistics; PNAS is in General Science
- We can how interdisciplinary the articles are
- Compute a distribution over terms for each discipline
- Find topic proportions for each article
- (Work done by Sean Gerrish)
Frances Stokes Berry;William D. Berry. American Journal of Political Science (1992), pp. 715-742 Journal Disciplines: Political Science Blake LeBaron. Philosophical Transactions: Physical Sciences and Engineering (1994), pp. 397-404 Journal Disciplines: Mathematics Biological Sciences General Science Gerald Marwell;Pamela Oliver. Social Psychology Quarterly (1994), pp. 373 Journal Disciplines: Psychology Sociology
Tax Innovation in the States: Capitalizing on Political Opportunity Chaos and Nonlinear Forecastability in Economics and Finance Reply: Theory Is Not a Social Dilemma
http://dbrowser.jstor.org
General Science over Time
Year log.prop
1 2 3 4 5 6 7 PNAS
- 1920
1940 1960 1980 2000 Science
- 1880
1900 1920 1940 1960 1980 2000 variable
- Biological.Sciences
- Botany...Plant.Sciences
- Developmental...Cell.Biology
- Ecology...Evolutionary.Biology
- General.Science
- Geography
- History.of.Science...Technology
- Mathematics
Navel Gazing
Year log.prop
2 4 6 8 Journal of the American Statistical Association
- 1940
1960 1980 2000 The Annals of Statistics
- 1975
1980 1985 1990 1995 2000 2005 variable
- Business
- Economics
- Finance
- Mathematics
- Statistics