A Versatile Probabilistic Programming Framework for Topic Models - - PowerPoint PPT Presentation

a versatile probabilistic programming framework for topic
SMART_READER_LITE
LIVE PREVIEW

A Versatile Probabilistic Programming Framework for Topic Models - - PowerPoint PPT Presentation

Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models James Foulds Shachi Kumar Lise Getoor Jack Baskin School of Engineering University of California, Santa Cruz Probabilistic latent variable modeling Data


slide-1
SLIDE 1

Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models

Jack Baskin School of Engineering University of California, Santa Cruz

James Foulds Shachi Kumar Lise Getoor

slide-2
SLIDE 2

Probabilistic latent variable modeling

2

Data

Complicated, noisy, high-dimensional

slide-3
SLIDE 3

Probabilistic latent variable modeling

3

Understand, explore, predict

Data

Complicated, noisy, high-dimensional

slide-4
SLIDE 4

Probabilistic latent variable modeling

4

Understand, explore, predict

Data

Complicated, noisy, high-dimensional Latent variable model

slide-5
SLIDE 5

Probabilistic latent variable modeling

5

Understand, explore, predict

Data

Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model

slide-6
SLIDE 6

Topic models

  • Topic models are foundational building blocks for

powerful latent variable models

– Authorship (Rosen-Zvi et al., 2004) – Conversational Influence (Nguyen et al., 2014) – Knowledge base construction (Movshovitz-Attias and Cohen, 2015) – Machine translation (Mimno et al., 2009) – Political analysis (Grimmer, 2010), (Gerrish and Blei, 2011, 2012) – Recommender systems (Wang and Blei, 2011), (Diao et al., 2014) – Scientific impact (Dietz et al. 2007), (Foulds and Smyth, 2013) – Social network analysis (Chang et al., 2009) – Word-sense disambiguation (Boyd-Graber et al., 2007) – …

6

slide-7
SLIDE 7

Custom topic models

  • Custom latent variable topic models useful for

data mining and computational social science

  • The challenge is scalability

7

slide-8
SLIDE 8

Custom topic models

  • Custom latent variable topic models useful for

data mining and computational social science

  • The challenge is scalability

8

slide-9
SLIDE 9

Custom topic models

  • Custom latent variable topic models useful for

data mining and computational social science

  • The challenge is scalability

9

Sparse, stochastic, collapsed, distributed algorithms, …

slide-10
SLIDE 10

Custom topic models

  • Custom latent variable topic models useful for

data mining and computational social science

  • The challenge is scalability

10

Sparse, stochastic, collapsed, distributed algorithms, … Max Welling There’s no end to speeding up LDA!

slide-11
SLIDE 11

Custom topic models

  • Custom latent variable topic models useful for

data mining and computational social science

  • The bottleneck is human effort and expertise

11

Design time >> run time

slide-12
SLIDE 12

Custom topic models

12

Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model

slide-13
SLIDE 13

Custom topic models

13

Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model

slide-14
SLIDE 14

Custom topic models

14

Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model (Algorithm, model) pair carefully co-designed for tractability

slide-15
SLIDE 15

Custom topic models

15

Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations Latent variable model (Algorithm, model) pair carefully co-designed for tractability Evaluate, iterate

slide-16
SLIDE 16

Custom topic models

16

Understand, explore, predict Data Complicated, noisy, high-dimensional Low-dimensional, semantically meaningful representations General-purpose modeling framework Evaluate, iterate

slide-17
SLIDE 17

Our contribution

  • We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models – Models and domain knowledge specified using a simple logical probabilistic programming language – A highly parallelizable EM training algorithm

17

slide-18
SLIDE 18

Our contribution

  • We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models – Models and domain knowledge specified using a simple logical probabilistic programming language – A highly parallelizable EM training algorithm

18

slide-19
SLIDE 19

Our contribution

  • We introduce latent topic networks

– A versatile, general-purpose framework for specifying custom topic models – Models and domain knowledge specified using a simple logical probabilistic programming language – A highly parallelizable EM training algorithm

19

slide-20
SLIDE 20

Z 𝚾 W 𝛊

LDA likelihood

Latent topic networks

slide-21
SLIDE 21

Z 𝛊 𝚾 𝚾 𝚾 W 𝛊 𝛊 𝚾

Networks of dependencies between topics, distributions over topics LDA likelihood

Latent topic networks

slide-22
SLIDE 22

Z 𝛊 𝚾 𝚾 𝚾 W X Observed covariates 𝛊 𝛊 𝚾 X

Networks of dependencies between topics, distributions over topics LDA likelihood

Latent topic networks

slide-23
SLIDE 23

Z 𝛊 𝚾 𝚾 𝚾 W X Observed covariates Y Y Labeled data 𝛊 𝛊 𝚾 X

Networks of dependencies between topics, distributions over topics LDA likelihood

Latent topic networks

slide-24
SLIDE 24

Z 𝛊 𝚾 𝚾 𝚾 W X Observed covariates Y Y Labeled data 𝛊 Z 𝛊 𝚾 X Z Latent variables

Networks of dependencies between topics, distributions over topics LDA likelihood

Latent topic networks

slide-25
SLIDE 25

25

≈6 months Grad student Topic modeling research paper

= +

Previously…

slide-26
SLIDE 26

26

≈6 months Grad student Topic modeling research paper

= +

Previously…

slide-27
SLIDE 27

27

≈6 months Grad student Topic modeling research paper

= +

Previously…

slide-28
SLIDE 28

28

≈6 months Grad student Topic modeling research paper

= +

Previously…

slide-29
SLIDE 29

29

≈6 months Grad student Topic modeling research paper

= +

Previously…

slide-30
SLIDE 30

30

≈6 months Grad student New custom topic model

= +

Latent topic networks

1 weekend

Shachi Kumar Master’s student, UCSC

slide-31
SLIDE 31

Related work

31

Correlations / Dependencies Observed Covariates Additional Latent Variables Constraints Probabilistic Programming Systems for Encoding Domain Knowledge, Covariates, and Correlations

CTM (Blei and Lafferty, 2007)

    

DMR (Mimno & McCallum, 2008)

    

Dirichlet Forests (Andzejewski et al., 2009

    

xLDA (Wahabzada et al., 2010)

    

SAGE (Eisenstein et al., 2011)

    

STM (Roberts et al., 2013)

    

Graphical Modeling and Probabilistic Programming Systems

CTRF (Zhu & Xing, 2010)

    

Fold.all (Andrzejewski et al., 2011)

    

Logic LDA (Mei et al., 2014)

    

Latent Topic Networks

    

slide-32
SLIDE 32

Related work

32

Correlations / Dependencies Observed Covariates Additional Latent Variables Constraints Probabilistic Programming Systems for Encoding Domain Knowledge, Covariates, and Correlations

CTM (Blei and Lafferty, 2007)

    

DMR (Mimno & McCallum, 2008)

    

Dirichlet Forests (Andzejewski et al., 2009

    

xLDA (Wahabzada et al., 2010)

    

SAGE (Eisenstein et al., 2011)

    

STM (Roberts et al., 2013)

    

Graphical Modeling and Probabilistic Programming Systems

CTRF (Zhu & Xing, 2010)

    

Fold.all (Andrzejewski et al., 2011)

    

Logic LDA (Mei et al., 2014)

    

Latent Topic Networks

    

slide-33
SLIDE 33

Example: modeling influence in citation networks

33

Foulds and Smyth (2013), EMNLP

slide-34
SLIDE 34

34

Which are the most important articles?

Example: modeling influence in citation networks

Foulds and Smyth (2013), EMNLP

slide-35
SLIDE 35

35

What are the influence relationships between articles?

Example: modeling influence in citation networks

Foulds and Smyth (2013), EMNLP

slide-36
SLIDE 36

Topical influence regression

36

Foulds and Smyth (2013), EMNLP

Latent variables for document influence citation edge influence

slide-37
SLIDE 37

Topical influence regression

37

Probabilistic dependencies along the citation graph

Foulds and Smyth (2013), EMNLP

Latent variables for document influence citation edge influence

slide-38
SLIDE 38

Encoding dependencies via logical rules

38

Restrict dependencies to citation graph Influence and topic value are both high Citing document also has the topic

slide-39
SLIDE 39

Encoding dependencies via logical rules

39

Restrict dependencies to citation graph Influence and topic value are both high Citing document also has the topic

slide-40
SLIDE 40

Encoding dependencies via logical rules

40

Restrict dependencies to citation graph Influence and topic are both high Citing document also has the topic

slide-41
SLIDE 41

Encoding dependencies via logical rules

41

Restrict dependencies to citation graph Influence and topic are both high Citing document also has the topic

slide-42
SLIDE 42

Encoding dependencies via logical rules

42

Restrict dependencies to citation graph Influence and topic are both high Citing document also has the topic

Entire model with just 5 rules!

slide-43
SLIDE 43

Statistical relational learning

  • An “interface layer for AI.”

– Programming languages for specifying models and encoding domain knowledge – Typically based on first-order logic

43

slide-44
SLIDE 44

Probabilistic soft logic (PSL)

  • A first-order logic-based SRL language
  • Used to specify hinge-loss MRFs, a class of

highly scalable continuous graphical models

44

5.0: Predicate Logical operators Rule weight

Continuous random variables!

slide-45
SLIDE 45

Probabilistic soft logic (PSL)

  • A first-order logic-based SRL language
  • Used to specify hinge-loss MRFs, a class of

highly scalable continuous graphical models

45

5.0: Predicate Logical operators Rule weight

Continuous random variables!

slide-46
SLIDE 46

Probabilistic soft logic (PSL)

  • A first-order logic-based SRL language
  • Used to specify hinge-loss MRFs, a class of

highly scalable continuous graphical models

46

5.0: Predicate Logical operators Rule weight

Continuous random variables!

slide-47
SLIDE 47

Probabilistic soft logic (PSL)

  • A first-order logic-based SRL language
  • Used to specify hinge-loss MRFs, a class of

highly scalable continuous graphical models

47

5.0: Predicate Logical operators Rule weight

Continuous random variables!

slide-48
SLIDE 48

Probabilistic soft logic (PSL)

  • A first-order logic-based SRL language
  • Used to specify hinge-loss MRFs, a class of

highly scalable continuous graphical models

48

5.0: Predicate Logical operators Rule weight

Continuous random variables!

slide-49
SLIDE 49

Probabilistic soft logic (PSL)

  • A first-order logic-based SRL language
  • Specifies a class of highly scalable continuous

graphical models called hinge-loss MRFs

49

5.0: Predicate Logical operators Rule weight

Continuous random variables!

slide-50
SLIDE 50

Hinge-loss MRFs

50

Conditional random field over continuous random variables between 0 and 1

slide-51
SLIDE 51

Hinge-loss MRFs

51

Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions

slide-52
SLIDE 52

Hinge-loss MRFs

52

Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1

slide-53
SLIDE 53

Hinge-loss MRFs

53

Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Linear function

slide-54
SLIDE 54

Hinge-loss MRFs

54

Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Linear function

slide-55
SLIDE 55

Hinge-loss MRFs

55

Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Linear function

2

slide-56
SLIDE 56

Hinge-loss MRFs

56

Feature functions are hinge loss functions Conditional random field over continuous random variables between 0 and 1 Hinge losses encode the distance to satisfaction for each instantiated rule

2

Linear function

slide-57
SLIDE 57

Latent Dirichlet allocation

  • Priors:

57

slide-58
SLIDE 58

Latent Dirichlet allocation

  • Priors:

58

slide-59
SLIDE 59

Latent topic networks

  • Priors: Hinge-loss MRFs

59

slide-60
SLIDE 60

Log posterior objective function

60

LDA log posterior Hinge loss terms

slide-61
SLIDE 61

Log posterior objective function

61

LDA log posterior Hinge loss terms

slide-62
SLIDE 62

Log posterior objective function

62

LDA log posterior Hinge loss terms

slide-63
SLIDE 63

Log posterior objective function

63

LDA log posterior Hinge loss terms

Tractability from convexity, instead of conjugacy!

slide-64
SLIDE 64

Log posterior objective function

64

LDA log posterior Hinge loss terms

Tractability from convexity, instead of conjugacy!

slide-65
SLIDE 65

Training algorithm

  • Expectation Maximization

– E-step: the same as for LDA

65

slide-66
SLIDE 66

Training algorithm

  • Expectation Maximization

– E-step: the same as for LDA – M-step:

66

LDA EM lower bound

slide-67
SLIDE 67

Training algorithm

  • Expectation Maximization

– E-step: the same as for LDA – M-step:

67

LDA EM lower bound minus hinge loss terms

slide-68
SLIDE 68

Training algorithm

  • Expectation Maximization

– E-step: the same as for LDA – M-step:

68

Convex optimization! Solve in parallel using consensus ADMM LDA EM lower bound minus hinge loss terms

slide-69
SLIDE 69

Weight learning

  • Optimize pseudo-likelihood approximation:
  • Gradient:
  • Importance sample from the implied Dirichlet prior

69

slide-70
SLIDE 70

Weight learning

  • Optimize pseudo-likelihood approximation:
  • Gradient:
  • Importance sample from the implied Dirichlet prior

70

slide-71
SLIDE 71

Weight learning

  • Optimize pseudo-likelihood approximation:
  • Gradient:
  • Importance sample from the implied Dirichlet prior

71

slide-72
SLIDE 72

Case study: Exploring influence in citation networks

72

slide-73
SLIDE 73

Case study: Exploring influence in citation networks

73

slide-74
SLIDE 74

Case study: Exploring influence in citation networks

74

slide-75
SLIDE 75

Case study: Exploring influence in citation networks

75

slide-76
SLIDE 76

Case study: Modeling US Presidential state of the Union addresses

  • The US President updates Congress on the state of the Union, roughly annually
  • Do these addresses depict the true, underlying state of the Union?
  • Are they biased by political agendas?

76

slide-77
SLIDE 77

Case study: Modeling US Presidential state of the Union addresses

77

State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)

slide-78
SLIDE 78

Case study: Modeling US Presidential state of the Union addresses

78

State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)

slide-79
SLIDE 79

Case study: Modeling US Presidential state of the Union addresses

79

State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)

slide-80
SLIDE 80

Case study: Modeling US Presidential state of the Union addresses

80

State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)

slide-81
SLIDE 81

Case study: Modeling US Presidential state of the Union addresses

81

State of the Union Republican party bias Democrat party bias Topic model 𝛊 Address Time (years)

slide-82
SLIDE 82

Case study: Modeling US Presidential state of the Union addresses

82

Democrat topic Republican topic

slide-83
SLIDE 83

Case study: Modeling US Presidential state of the Union addresses

83

Democrat topic

Republican topic

slide-84
SLIDE 84

Case study: Modeling US Presidential state of the Union addresses

84

Democrat topic

Republican topic

slide-85
SLIDE 85

Case study: Modeling US Presidential state of the Union addresses

85

slide-86
SLIDE 86

Case study: Modeling US Presidential state of the Union addresses

86

WW I WW II

slide-87
SLIDE 87

Case study: Modeling US Presidential state of the Union addresses

87

WW I WW II Vietnam

slide-88
SLIDE 88

Case study: Modeling US Presidential state of the Union addresses

88

Document Completion Perplexity Fully Held-Out Perplexity Latent topic networks 2.33 x 103 2.43 x 103 LDA topic model 2.36 x 103 2.59 x 103 Dynamic topic model 2.43 x 103 2.55 x 103

slide-89
SLIDE 89

Conclusion

  • We introduce latent topic networks, a versatile general-purpose

framework for building and inferring custom topic models.

  • Our experimental results show that models specified in our

framework with just a few lines of code in a logical language, are competitive with state of the art special purpose models.

  • Future directions

– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …

89

slide-90
SLIDE 90

Conclusion

  • We introduce latent topic networks, a versatile general-purpose

framework for building and inferring custom topic models.

  • Our experimental results show that models specified in our

framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

  • Future directions

– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …

90

slide-91
SLIDE 91

Conclusion

  • We introduce latent topic networks, a versatile general-purpose

framework for building and inferring custom topic models.

  • Our experimental results show that models specified in our

framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

  • Future directions

– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …

91

slide-92
SLIDE 92

Thanks to my collaborators at UC Santa Cruz

  • Lise Getoor
  • Shachi Kumar

92

slide-93
SLIDE 93

Conclusion

  • We introduce latent topic networks, a versatile general-purpose

framework for building and inferring custom topic models.

  • Our experimental results show that models specified in our

framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

  • Future directions

– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …

93

Code: coming soon at psl.cs.umd.edu !

slide-94
SLIDE 94

Conclusion

  • We introduce latent topic networks, a versatile general-purpose

framework for building and inferring custom topic models.

  • Our experimental results show that models specified in our

framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.

  • Future directions

– Using our framework to answer substantive questions in social science. – New language primitives, non-parametric Bayesian models, algorithmic advances …

94

Thank you for your attention Code: coming soon at psl.cs.umd.edu !