Supervised and Relational Topic Models David M. Blei Department of - - PowerPoint PPT Presentation

supervised and relational topic models
SMART_READER_LITE
LIVE PREVIEW

Supervised and Relational Topic Models David M. Blei Department of - - PowerPoint PPT Presentation

Supervised and Relational Topic Models David M. Blei Department of Computer Science Princeton University October 5, 2009 Joint work with Jonathan Chang and Jon McAuliffe Topic modeling Large electronic archives of document collections


slide-1
SLIDE 1

Supervised and Relational Topic Models

David M. Blei

Department of Computer Science Princeton University

October 5, 2009 Joint work with Jonathan Chang and Jon McAuliffe

slide-2
SLIDE 2

Topic modeling

  • Large electronic archives of document collections require new

statistical tools for analyzing text.

  • Topic models have emerged as a powerful technique for

unsupervised analysis of large document collections.

  • Topic models posit latent topics in text using hidden random

variables, and uncover that structure with posterior inference.

  • Useful for tasks like browsing, search, information retrieval, etc.
slide-3
SLIDE 3

Examples of topic modeling

contractual employment female markets criminal expectation industrial men earnings discretion gain local women investors justice promises jobs see sec civil expectations employees sexual research process breach relations note structure federal enforcing unfair employer managers see supra agreement discrimination firm

  • fficer

note economic harassment risk parole perform case gender large inmates

slide-4
SLIDE 4

Examples of topic modeling

the,of a, is and

n algorithm time log bound n functions polynomial log algorithm logic programs systems language sets system systems performance analysis distributed

graph graphs edge minimum vertices proof property program resolution abstract consensus

  • bjects

messages protocol asynchronous networks queuing asymptotic productform server approximation s points distance convex

database constraints algebra boolean relational

formulas firstorder decision temporal queries trees regular tree search compression machine domain degree degrees polynomials routing adaptive network networks protocols

networks protocol network packets link

database transactions retrieval concurrency restrictions learning learnable statistical examples classes m merging networks sorting multiplication constraint dependencies local consistency tractable logic logics query theories languages quantum automata nc automaton languages

  • nline

scheduling task competitive tasks learning knowledge reasoning verification circuit An optimal algorithm for intersecting line segments in the plane Recontamination does not help to search a graph A new approach to the maximum-flow problem The time complexity of maximum matching by simulated annealing Quantum lower bounds by polynomials On the power of bounded concurrency I: finite automata Dense quantum coding and quantum finite automata Classical physics and the Church--Turing Thesis Nearly optimal algorithms and bounds for multilayer channel routing How bad is selfish routing? Authoritative sources in a hyperlinked environment Balanced sequences and optimal routing Single-class bounds of multi-class queuing networks The maximum concurrent flow problem Contention in shared memory algorithms Linear probing with a nonuniform address distribution Magic Functions: In Memoriam: Bernard M. Dwork 1923--1998 A mechanical proof of the Church-Rosser theorem Timed regular expressions On the power and limitations of strictness analysis Module algebra On XML integrity constraints in the presence of DTDs Closure properties of constraints Dynamic functional dependencies and database aging

slide-5
SLIDE 5

Examples of topic modeling

1880 electric machine power engine steam two machines iron battery wire 1890 electric power company steam electrical machine two system motor engine 1900 apparatus steam power engine engineering water construction engineer room feet 1910 air water engineering apparatus room laboratory engineer made gas tube 1920 apparatus tube air pressure water glass gas made laboratory mercury 1930 tube apparatus glass air mercury laboratory pressure made gas small 1940 air tube apparatus glass laboratory rubber pressure small mercury gas 1950 tube apparatus glass air chamber instrument small laboratory pressure rubber 1960 tube system temperature air heat chamber power high instrument control 1970 air heat power system temperature chamber high flow tube design 1980 high power design heat system systems devices instruments control large 1990 materials high power current applications technology devices design device heat 2000 devices device materials current gate high light silicon material technology

slide-6
SLIDE 6

Examples of topic modeling

wild type mutant mutations mutants mutation

plants plant gene genes arabidopsis p53 cell cycle activity cyclin regulation amino acids cdna sequence isolated protein gene disease mutations families mutation rna dna rna polymerase cleavage site cells cell expression cell lines bone marrow

united states women universities students education

science scientists says research people research funding support nih program

surface tip image sample device

laser

  • ptical

light electrons quantum materials

  • rganic

polymer polymers molecules

volcanic deposits magma eruption volcanism

mantle crust upper mantle meteorites ratios earthquake earthquakes fault images data ancient found impact million years ago africa climate

  • cean

ice changes climate change

cells proteins researchers protein found

patients disease treatment drugs clinical

genetic population populations differences variation

fossil record birds fossils dinosaurs fossil sequence sequences genome dna sequencing bacteria bacterial host resistance parasite development embryos drosophila genes expression species forest forests populations ecosystems

synapses ltp glutamate synaptic neurons

neurons stimulus motor visual cortical

  • zone

atmospheric measurements stratosphere concentrations

sun solar wind earth planets planet co2 carbon carbon dioxide methane water

receptor receptors ligand ligands apoptosis

proteins protein binding domain domains activated tyrosine phosphorylation activation phosphorylation kinase magnetic magnetic field spin superconductivity superconducting physicists particles physics particle experiment surface liquid surfaces fluid model

reaction reactions molecule molecules transition state

enzyme enzymes iron active site reduction pressure high pressure pressures core inner core

brain memory subjects left task

computer problem information computers problems

stars astronomers universe galaxies galaxy

virus hiv aids infection viruses mice antigen t cells antigens immune response

slide-7
SLIDE 7

Supervised topic models

  • These applications of topic modeling work in the same way.
  • Fit a model using a likelihood criterion. Then, hope that the

resulting model is useful for the task at hand.

  • Supervised topic models and relational topic models fit

topics explicitly to perform prediction.

  • Useful for building topic models that can
  • Predict the rating of a review
  • Predict the category of an image
  • Predict the links emitted from a document
slide-8
SLIDE 8

Outline

1 Unsupervised topic models 2 Supervised topic models 3 Relational topic models

slide-9
SLIDE 9

Probabilistic modeling

1 Treat data as observations that arise from a generative

probabilistic process that includes hidden variables

  • For documents, the hidden variables reflect the thematic

structure of the collection.

2 Infer the hidden structure using posterior inference

  • What are the topics that describe this collection?

3 Situate new data into the estimated model.

  • How does this query or new document fit into the estimated

topic structure?

slide-10
SLIDE 10

Intuition behind LDA

Simple intuition: Documents exhibit multiple topics.

slide-11
SLIDE 11

Generative model

gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01

  • rganism 0.01

.,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,,

Topics Documents Topic proportions and assignments

  • Each document is a random mixture of corpus-wide topics
  • Each word is drawn from one of those topics
slide-12
SLIDE 12

The posterior distribution

Topics Documents Topic proportions and assignments

  • In reality, we only observe the documents
  • Our goal is to infer the underlying topic structure
slide-13
SLIDE 13

Latent Dirichlet allocation

θd Zd,n Wd,n N D K

βk

α

η

Dirichlet parameter Per-document topic proportions Per-word topic assignment Observed word Topics Topic hyperparameter

Each piece of the structure is a random variable.

slide-14
SLIDE 14

Latent Dirichlet allocation

θd Zd,n Wd,n N D K

βk

α

η

βk ∼ Dir(η) k = 1 . . . K θd ∼ Dir(α) d = 1 . . . D Zd,n | θd ∼ Mult(1, θd) d = 1 . . . D, n = 1 . . . N Wd,n | θd, zd,n, β1:K ∼ Mult(1, βzd,n) d = 1 . . . D, n = 1 . . . N

slide-15
SLIDE 15

Latent Dirichlet allocation

θd Zd,n Wd,n N D K

βk

α

η

1 Draw each topic βk ∼ Dir(η), for k ∈ {1, . . . , K}. 2 For each document: 1 Draw topic proportions θd ∼ Dir(α). 2 For each word: 1 Draw Zd,n ∼ Mult(θd). 2 Draw Wd,n ∼ Mult(βzd,n).

slide-16
SLIDE 16

Latent Dirichlet allocation

θd Zd,n Wd,n N D K

βk

α

η

  • From a collection of documents, infer
  • Per-word topic assignment zd,n
  • Per-document topic proportions θd
  • Per-corpus topic distributions βk
  • Use posterior expectations to perform the task at hand, e.g.,

information retrieval, document similarity, etc.

slide-17
SLIDE 17

Latent Dirichlet allocation

θd Zd,n Wd,n N D K

βk

α

η

  • Computing the posterior is intractable:

p(θ | α) N

n=1 p(zn | θ)p(wn | zn, β1:K)

  • θ p(θ | α) N

n=1

K

z=1 p(zn | θ)p(wn | zn, β1:K)

  • Several approximation techniques have been developed.
slide-18
SLIDE 18

Latent Dirichlet allocation

θd Zd,n Wd,n N D K

βk

α

η

  • Mean field variational methods (Blei et al., 2001, 2003)
  • Expectation propagation (Minka and Lafferty, 2002)
  • Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)
  • Collapsed variational inference (Teh et al., 2006)
slide-19
SLIDE 19

Example inference

1 8 16 26 36 46 56 66 76 86 96 Topics Probability 0.0 0.1 0.2 0.3 0.4

slide-20
SLIDE 20

Example topics

human evolution disease computer genome evolutionary host models dna species bacteria information genetic

  • rganisms

diseases data genes life resistance computers sequence

  • rigin

bacterial system gene biology new network molecular groups strains systems sequencing phylogenetic control model map living infectious parallel information diversity malaria methods genetics group parasite networks mapping new parasites software project two united new sequences common tuberculosis simulations

slide-21
SLIDE 21

Used in exploratory tools of document collections

slide-22
SLIDE 22

LDA summary

  • LDA is a powerful model for
  • Visualizing the hidden thematic structure in large corpora
  • Generalizing new data to fit into that structure
  • LDA is a mixed membership model (Erosheva, 2004) that builds
  • n the work of Deerwester et al. (1990) and Hofmann (1999).
  • For document collections and other grouped data, this might be

more appropriate than a simple finite mixture.

  • The same model was independently invented for population

genetics analysis (Pritchard et al., 2000).

slide-23
SLIDE 23

LDA summary

  • Modular: It can be embedded in more complicated models.
  • General: The data generating distribution can be changed.
  • Variational inference is fast; lets us to analyze large data sets.
  • See Blei et al., 2003 for details and a quantitative comparison.

See my web-site for code and other papers.

  • Jonathan Chang’s excellent R package “lda” contains Gibbs

sampling code for this model and many others.

slide-24
SLIDE 24

Supervised topic models

  • But LDA is an unsupervised model. How can we build a topic

model that is good at the task we care about?

  • Many data are paired with response variables.
  • User reviews paired with a number of stars
  • Web pages paired with a number of “diggs”
  • Documents paired with links to other documents
  • Images paired with a category
  • Supervised topic models are topic models of documents and

responses, fit to find topics predictive of the response.

slide-25
SLIDE 25

Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

1 Draw topic proportions θ | α ∼ Dir(α). 2 For each word

  • Draw topic assignment zn | θ ∼ Mult(θ).
  • Draw word wn | zn, β1:K ∼ Mult(βzn).

3 Draw response variable y | z1:N, η, σ2 ∼ N

  • η⊤¯

z, σ2 , where ¯ z = (1/N) N

n=1 zn.

slide-26
SLIDE 26

Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

  • The response variable y is drawn after the document because it

depends on z1:N, an assumption of partial exchangeability.

  • Consequently, y is necessarily conditioned on the words.
  • In a sense, this blends generative and discriminative modeling.
slide-27
SLIDE 27

Supervised LDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

  • Given a set of document-response pairs, fit the model

parameters by maximum likelihood.

  • Given a new document, compute a prediction of its response.
  • Both of these activities hinge on variational inference.
slide-28
SLIDE 28

Variational inference (in general)

  • Variational methods are a deterministic alternative to MCMC.
  • Let x1:N be observations and z1:M be latent variables
  • Our goal is to compute the posterior distribution

p(z1:M | x1:N) = p(z1:M, x1:N)

  • p(z1:M, x1:N)dz1:M
  • For many interesting distributions, the marginal likelihood of the
  • bservations is difficult to efficiently compute
slide-29
SLIDE 29

Variational inference

  • Use Jensen’s inequality to bound the log prob of the
  • bservations:

log p(x1:N) = log

  • p(z1:M, x1:N)dz1:M

= log

  • p(z1:M, x1:N)qν(z1:M)

qν(z1:M)dz1:M ≥ Eqν[log p(z1:M, x1:N)] − Eqν[log qν(z1:M)]

  • We have introduced a distribution of the latent variables with free

variational parameters ν.

  • We optimize those parameters to tighten this bound.
  • This is the same as finding the member of the family qν that is

closest in KL divergence to p(z1:M | x1:N).

slide-30
SLIDE 30

Mean-field variational inference

  • Factorization of qν determines complexity of optimization
  • In mean field variational inference qν is fully factored

qν(z1:M) =

M

  • m=1

qνm(zm).

  • The latent variables are independent.
  • Each is governed by its own variational parameter νm.
  • In the true posterior they can exhibit dependence

(often, this is what makes exact inference difficult).

slide-31
SLIDE 31

MFVI and conditional exponential families

  • Suppose the distribution of each latent variable conditional on all
  • ther variables is in the exponential family:

p(zm | z−m, x) = hm(zm) exp{gm(z−m, x)Tzm − am(gi(z−m, x))}

  • Assume qν is fully factorized, and each factor is in the same

exponential family as the corresponding conditional: qνm(zm) = hm(zm) exp{νT

mzm − am(νm)}

slide-32
SLIDE 32

MFVI and conditional exponential families

  • Variational inference is the following coordinate ascent algorithm

νm = Eqν[gm(Z−m, x)]

  • Notice the relationship to Gibbs sampling.
slide-33
SLIDE 33

Variational inference

  • Alternative to MCMC; replace sampling with optimization.
  • Deterministic approximation to posterior distribution.
  • Uses established optimization methods

(block coordinate ascent; Newton-Raphson; interior-point).

  • Faster, more scalable than MCMC for large problems.
  • Biased, whereas MCMC is not.
  • Emerging as a useful framework for fully Bayesian and empirical

Bayesian inference problems. Many open issues!

  • Good papers: Beal’s Ph.D. thesis, Wainwright and Jordan (2009)
slide-34
SLIDE 34

Variational inference in sLDA

θd Zd,n Wd,n N D K

βk

α Yd η, σ2

  • In sLDA the variational bound is

E[log p(θ | α)] + N

n=1 E[log p(Zn | θ)]

+ N

n=1 E[log p(wn | Zn, β1:K)] + E[log p(y | Z1:N, η, σ2)] + H(q)

  • As in Blei, Ng, and Jordan (2003), we use the fully-factorized

variational distribution q(θ, z1:N | γ, φ1:N) = q(θ | γ) N

n=1 q(zn | φn),

slide-35
SLIDE 35

Variational inference in sLDA

  • The distinguishing term is

E[log p(y | Z1:N, η, σ2)] = −1 2 log

  • 2πσ2

− y2 − 2yη⊤E ¯ Z

  • + η⊤E

¯ Z ¯ Z ⊤ η 2σ2

  • The first expectation is

E ¯ Z

  • = ¯

φ := 1

N

N

n=1 φn.

  • The second expectation is

E ¯ Z ¯ Z ⊤ =

1 N2

N

n=1

  • m=n φnφ⊤

m + N n=1 diag{φn}

  • .
  • Linear in φn, which leads to an easy coordinate ascent algorithm.
slide-36
SLIDE 36

Maximum likelihood estimation

  • The M-step is an MLE under expected sufficient statistics.
  • Define
  • y = y1:D is the response vector
  • A is the D × K matrix whose rows are ¯

Z ⊤

d .

  • MLE of the coefficients solve the expected normal equations

E

  • A⊤A
  • η = E[A]⊤y

⇒ ˆ ηnew ←

  • E
  • A⊤A

−1 E[A]⊤y

  • The MLE of the variance is

ˆ σ2

new ← (1/D){y⊤y − y⊤E[A]

  • E
  • A⊤A

−1 E[A]⊤y}

slide-37
SLIDE 37

Prediction

  • We have fit SLDA parameters to a corpus, using variational EM.
  • We have a new document w1:N with unknown response value.
  • First, run variational inference in the unsupervised LDA model, to
  • btain γ and φ1:N for the new document.

(LDA ⇔ integrating unobserved Y out of SLDA.)

  • Predict y using SLDA expected value:

E

  • Y | w1:N, α, β1:K, η, σ2

≈ η⊤Eq ¯ Z

  • = η⊤ ¯

φ.

slide-38
SLIDE 38

Example: Movie reviews

both motion simple perfect fascinating power complex however cinematography screenplay performances pictures effective picture his their character many while performance between

−30 −20 −10 10 20

  • more

has than films director will characters

  • ne

from there which who much what awful featuring routine dry

  • ffered

charlie paris not about movie all would they its have like you was just some

  • ut

bad guys watchable its not

  • ne

movie least problem unfortunately supposed worse flat dull

  • 10-topic sLDA model on movie reviews (Pang and Lee, 2005).
  • Response: Number of stars associated with each review
  • Each component of coefficient vector η is associated with a topic.
slide-39
SLIDE 39

Predictive R2

(SLDA is red.)

  • 5

10 15 20 25 30 35 40 45 50 0.0 0.1 0.2 0.3 0.4 0.5 Number of topics Predictive R2

slide-40
SLIDE 40

Held out likelihood

(SLDA is red.)

  • 5

10 15 20 25 30 35 40 45 50 −6.42 −6.41 −6.40 −6.39 −6.38 −6.37 Number of topics Per−word held out log likelihood

slide-41
SLIDE 41

Diverse response types with GLMs

  • Want to work with response variables that don’t live in the reals.
  • binary / multiclass classification
  • count data
  • waiting time
  • Model the response response with a generalized linear model

p(y | ζ, δ) = h(y, δ) exp ζy − A(ζ) δ

  • ,

where ζ = η⊤¯ z.

  • Complicates inference, but allows for flexible modeling.
slide-42
SLIDE 42

Example: Multi-class classification

highway car, sign, road inside city buildings, car, sidewalk tall building trees, buildings

  • ccluded, window

street tree, car, sidewalk

20 40 60 80 100 120 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 average accuracy image classification on the LabelMe dataset

# of components

(SLDA for image classification, with Chong Wang)

slide-43
SLIDE 43

Supervised topic models

  • SLDA enables model-based regression where the predictor

“variable” is a text document.

  • It can easily be used wherever LDA is used in an unsupervised

fashion (e.g., images, genes, music).

  • SLDA is a supervised dimension-reduction technique, whereas

LDA performs unsupervised dimension reduction.

  • LDA + regression compared to sLDA is like principal components

regression compared to partial least squares.

  • Paper: Blei and McAuliffe, NIPS 2007.
slide-44
SLIDE 44

Relational topic models

52 478 430 2487 75 288 1123 2122 2299 1354 1854 1855 89 635 92 2438 136 479 109 640 119 686 120 1959 1539 147 172 177 965 911 2192 1489 885 178 378 286 208 1569 2343 1270 218 1290 223 227 236 1617 254 1176 256 634 264 1963 2195 1377 303 426 2091 313 1642 534 801 335 344 585 1244 2291 2617 1627 2290 1275 375 1027 396 1678 2447 2583 1061 692 1207 960 1238 2012 1644 2042 381 418 1792 1284 651 524 1165 2197 1568 2593 1698 547 683 2137 1637 2557 2033 632 1020 436 442 449 474 649 2636 2300 539 541 603 1047 722 660 806 1121 1138 831 837 1335 902 964 966 981 1673 1140 1481 1432 1253 1590 1060 992 994 1001 1010 1651 1578 1039 1040 1344 1345 1348 1355 1420 1089 1483 1188 1674 1680 2272 1285 1592 1234 1304 1317 1426 1695 1465 1743 1944 2259 2213

We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high- accuracy concepts... Irrelevant features and the subset selection problem In many domains, an appropriate inductive bias is the MIN- FEATURES bias, which prefers consistent hypotheses definable

  • ver as few features as

possible... Learning with many irrelevant features In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating... Evaluation and selection of biases in machine learning The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method... Utilizing prior concepts for learning The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile... Improving tactical plans with genetic algorithms Evolutionary learning methods have been found to be useful in several areas in the development

  • f intelligent robots. In the

approach described here, evolutionary... An evolutionary approach to learning in robots Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior... Using a genetic algorithm to learn strategies for collision avoidance and local navigation

... ... ... ... ... ... ... ... ... ...
  • Many data sets contain connected observations.
  • For example:
  • Citation networks of documents
  • Hyperlinked networks of web-pages.
  • Friend-connected social network profiles
slide-45
SLIDE 45

Relational topic models

52 478 430 2487 75 288 1123 2122 2299 1354 1854 1855 89 635 92 2438 136 479 109 640 119 686 120 1959 1539 147 172 177 965 911 2192 1489 885 178 378 286 208 1569 2343 1270 218 1290 223 227 236 1617 254 1176 256 634 264 1963 2195 1377 303 426 2091 313 1642 534 801 335 344 585 1244 2291 2617 1627 2290 1275 375 1027 396 1678 2447 2583 1061 692 1207 960 1238 2012 1644 2042 381 418 1792 1284 651 524 1165 2197 1568 2593 1698 547 683 2137 1637 2557 2033 632 1020 436 442 449 474 649 2636 2300 539 541 603 1047 722 660 806 1121 1138 831 837 1335 902 964 966 981 1673 1140 1481 1432 1253 1590 1060 992 994 1001 1010 1651 1578 1039 1040 1344 1345 1348 1355 1420 1089 1483 1188 1674 1680 2272 1285 1592 1234 1304 1317 1426 1695 1465 1743 1944 2259 2213

We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small high- accuracy concepts... Irrelevant features and the subset selection problem In many domains, an appropriate inductive bias is the MIN- FEATURES bias, which prefers consistent hypotheses definable

  • ver as few features as

possible... Learning with many irrelevant features In this introduction, we define the term bias as it is used in machine learning systems. We motivate the importance of automated methods for evaluating... Evaluation and selection of biases in machine learning The inductive learning problem consists of learning a concept given examples and nonexamples of the concept. To perform this learning task, inductive learning algorithms bias their learning method... Utilizing prior concepts for learning The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical plans from a simple flight simulator where a plane must avoid a missile... Improving tactical plans with genetic algorithms Evolutionary learning methods have been found to be useful in several areas in the development

  • f intelligent robots. In the

approach described here, evolutionary... An evolutionary approach to learning in robots Navigation through obstacles such as mine fields is an important capability for autonomous underwater vehicles. One way to produce robust behavior... Using a genetic algorithm to learn strategies for collision avoidance and local navigation

... ... ... ... ... ... ... ... ... ...
  • Research has focused on finding communities and patterns in

the link-structure of these networks (Kemp et al. 2004, Hoff et al., 2002, Hofman and Wiggins 2007, Airoldi et al. 2008).

  • By adapting supervised topic modeling, we can build a good

model of content and structure.

  • RTMs find related hidden structure in both types of data.
slide-46
SLIDE 46

Relational topic models

α

Nd

θd wd,n zd,n

K

βk yd,d' η

Nd'

θd' wd',n zd',n

  • Binary response variable with each pair of documents
  • Adapt variational EM algorithm for sLDA with binary GLM

response model (with different link probability functions).

  • Allows predictions that are out of reach for traditional models.
slide-47
SLIDE 47

Predictive performance of one type given the other

5 10 15 20 25 3600 3550 3500 3450

  • 5

10 15 20 25 14.0 13.8 13.6 13.4

  • Link log likelihood

Word log likelihood # Topics # Topics

Cora corpus (McCallum et al., 2000)

Note: Traditional models of networks cannot perform these tasks.

slide-48
SLIDE 48

Predictive performance of one type given the other

5 10 15 20 25 11.8 11.7 11.6 11.5 11.4

  • 5

10 15 20 25 1145 1140 1135 1130

  • Link log likelihood

Word log likelihood # Topics # Topics

WebKB corpus (Craven et al., 1998)

slide-49
SLIDE 49

Predictive performance of one type given the other

5 10 15 20 25 13.6 13.5 13.4 13.3 13.2

  • 5

10 15 20 25 2970 2960 2950 2940

  • Link log likelihood

Word log likelihood # Topics # Topics

PNAS corpus (courtesy of JSTOR)

slide-50
SLIDE 50

Predicting links from documents

retrieved. Markov chain Monte Carlo convergence diagnostics: A comparative review Minorization conditions and convergence rates for Markov chain Monte Carlo

RTM (ψe)

Rates of convergence of the Hastings and Metropolis algorithms Possible biases induced by MCMC convergence diagnostics Bounding convergence time of the Gibbs sampler in Bayesian image restoration Self regenerative Markov chain Monte Carlo Auxiliary variable methods for Markov chain Monte Carlo with applications Rate of Convergence of the Gibbs Sampler by Gaussian Approximation Diagnosing convergence of Markov chain Monte Carlo algorithms Exact Bound for the Convergence of Metropolis Chains

LDA + Regression

Self regenerative Markov chain Monte Carlo Minorization conditions and convergence rates for Markov chain Monte Carlo Gibbs-markov models Auxiliary variable methods for Markov chain Monte Carlo with applications Markov Chain Monte Carlo Model Determination for Hierarchical and Graphical Models Mediating instrumental variables A qualitative framework for probabilistic inference Adaptation for Self Regenerative MCMC

Given a new document, which documents is it likely to link to?

slide-51
SLIDE 51

Predicting links from documents

Competitive environments evolve better solutions for complex tasks Coevolving High Level Representations

RTM (ψe)

A Survey of Evolutionary Strategies Genetic Algorithms in Search, Optimization and Machine Learning Strongly typed genetic programming in evolving cooperation strategies Solving combinatorial problems using evolutionary algorithms A promising genetic algorithm approach to job-shop scheduling. . . Evolutionary Module Acquisition An Empirical Investigation of Multi-Parent Recombination Operators. . . A New Algorithm for DNA Sequence Assembly

LDA + Regression

Identification of protein coding regions in genomic DNA Solving combinatorial problems using evolutionary algorithms A promising genetic algorithm approach to job-shop scheduling. . . A genetic algorithm for passive management The Performance of a Genetic Algorithm on a Chaotic Objective Function Adaptive global optimization with local search Mutation rates as adaptations

Given a new document, which documents is it likely to link to?

slide-52
SLIDE 52

Spatially consistent topics

Topic 1 Topic 2 Topic 3 Topic 4 Topic 5

  • Links are the adjacency matrix of states
  • Documents are geographically-tagged news articles from Yahoo!

Not exchangeable.

  • RTM finds spatially consistent topics.
slide-53
SLIDE 53

Summary

  • Relational topic modeling allows us to analyze connected

documents, or other data for which the mixed-membership assumptions are appropriate.

  • Traditional models cannot predict with new and unlinked data.
  • RTMs allow for such predictions
  • links given the new words of a document
  • words given the links of a new document
  • Paper: Chang and Blei, AISTATS 2009.
slide-54
SLIDE 54

JSTOR Discipline Analysis

  • Another kind of “supervised” topic model is where the topics

have external meaning (Rammage and Manning, 2009)

  • JSTOR attaches each journal to a discipline
  • E.g., JASA is in Statistics; PNAS is in General Science
  • We can how interdisciplinary the articles are
  • Compute a distribution over terms for each discipline
  • Find topic proportions for each article
  • (Work done by Sean Gerrish)
slide-55
SLIDE 55

Frances Stokes Berry;William D. Berry. American Journal of Political Science (1992), pp. 715-742 Journal Disciplines: Political Science Blake LeBaron. Philosophical Transactions: Physical Sciences and Engineering (1994), pp. 397-404 Journal Disciplines: Mathematics Biological Sciences General Science Gerald Marwell;Pamela Oliver. Social Psychology Quarterly (1994), pp. 373 Journal Disciplines: Psychology Sociology

Tax Innovation in the States: Capitalizing on Political Opportunity Chaos and Nonlinear Forecastability in Economics and Finance Reply: Theory Is Not a Social Dilemma

http://dbrowser.jstor.org

slide-56
SLIDE 56

General Science over Time

Year log.prop

1 2 3 4 5 6 7 PNAS

  • 1920

1940 1960 1980 2000 Science

  • 1880

1900 1920 1940 1960 1980 2000 variable

  • Biological.Sciences
  • Botany...Plant.Sciences
  • Developmental...Cell.Biology
  • Ecology...Evolutionary.Biology
  • General.Science
  • Geography
  • History.of.Science...Technology
  • Mathematics
slide-57
SLIDE 57

Navel Gazing

Year log.prop

2 4 6 8 Journal of the American Statistical Association

  • 1940

1960 1980 2000 The Annals of Statistics

  • 1975

1980 1985 1990 1995 2000 2005 variable

  • Business
  • Economics
  • Finance
  • Mathematics
  • Statistics
slide-58
SLIDE 58

“We should seek out unfamiliar summaries of observational material, and establish their useful properties... And still more novelty can come from finding, and evading, still deeper lying constraints.” (John Tukey, The Future of Data Analysis, 1962)