A Versatile Probabilistic Programming Framework for Topic Models - PowerPoint PPT Presentation

Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models James Foulds Shachi Kumar Lise Getoor Jack Baskin School of Engineering University of California, Santa Cruz

Probabilistic latent variable modeling Data Complicated, noisy, high-dimensional 2

Probabilistic latent variable modeling Understand, Data explore, predict Complicated, noisy, high-dimensional 3

Probabilistic latent variable modeling Understand, Data explore, predict Complicated, noisy, high-dimensional Latent variable model 4

Probabilistic latent variable modeling Understand, Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable model 5

Topic models • Topic models are foundational building blocks for powerful latent variable models – Authorship (Rosen-Zvi et al., 2004) – Conversational Influence (Nguyen et al., 2014) – Knowledge base construction (Movshovitz-Attias and Cohen, 2015) – Machine translation (Mimno et al., 2009) – Political analysis (Grimmer, 2010), (Gerrish and Blei, 2011, 2012) – Recommender systems (Wang and Blei, 2011), (Diao et al., 2014) – Scientific impact (Dietz et al. 2007), (Foulds and Smyth, 2013) – Social network analysis (Chang et al., 2009) – Word-sense disambiguation (Boyd-Graber et al., 2007) – … 6

Custom topic models • Custom latent variable topic models useful for data mining and computational social science • The challenge is scalability 7

Custom topic models • Custom latent variable topic models useful for data mining and computational social science • The challenge is scalability 8

Custom topic models • Custom latent variable topic models useful for data mining and computational social science • The challenge is scalability Sparse, stochastic, collapsed, distributed algorithms, … 9

Custom topic models • Custom latent variable topic models useful for data mining and computational social science • The challenge is scalability Sparse, stochastic, collapsed, distributed algorithms, … There’s no end to speeding up LDA! Max Welling 10

Custom topic models • Custom latent variable topic models useful for data mining and computational social science • The bottleneck is human effort and expertise Design time >> run time 11

Custom topic models Understand, Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable model 12

Custom topic models Understand, Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable model 13

Custom topic models Understand, Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable (Algorithm, model) pair model carefully co-designed for tractability 14

Custom topic models Evaluate, Understand, iterate Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations Latent variable (Algorithm, model) pair model carefully co-designed for tractability 15

Custom topic models Evaluate, Understand, iterate Data explore, predict Low-dimensional, Complicated, noisy, semantically meaningful high-dimensional representations General-purpose modeling framework 16

Our contribution • We introduce latent topic networks – A versatile, general-purpose framework for specifying custom topic models – Models and domain knowledge specified using a simple logical probabilistic programming language – A highly parallelizable EM training algorithm 17

Latent topic networks 𝛊 Z LDA likelihood 𝚾 W

Latent topic networks Networks of dependencies between topics, distributions over topics 𝛊 𝛊 𝛊 Z LDA likelihood 𝚾 𝚾 𝚾 W 𝚾

Latent topic networks Networks of dependencies between topics, Observed covariates distributions over topics X 𝛊 𝛊 𝛊 X Z LDA likelihood 𝚾 𝚾 𝚾 W 𝚾

Latent topic networks Networks of dependencies between topics, Observed covariates distributions over topics X 𝛊 𝛊 Labeled data 𝛊 Y Y X Z LDA likelihood 𝚾 𝚾 𝚾 W 𝚾

Latent topic networks Networks of dependencies between topics, Observed covariates distributions over topics X 𝛊 𝛊 Labeled data 𝛊 Y Z Latent variables Y X Z LDA likelihood 𝚾 Z 𝚾 𝚾 W 𝚾

Previously… + = Grad student ≈6 months Topic modeling research paper 25

Latent topic networks + = Grad student ≈6 months New custom topic model 1 weekend Shachi Kumar Master’s student, UCSC 30

Related work Correlations / Observed Additional Constraints Probabilistic Dependencies Covariates Latent Variables Programming Systems for Encoding Domain Knowledge, Covariates, and Correlations      CTM (Blei and Lafferty, 2007)      DMR (Mimno & McCallum, 2008)      Dirichlet Forests (Andzejewski et al., 2009      xLDA (Wahabzada et al., 2010)      SAGE (Eisenstein et al., 2011)      STM (Roberts et al., 2013) Graphical Modeling and Probabilistic Programming Systems      CTRF (Zhu & Xing, 2010)      Fold.all (Andrzejewski et al., 2011)      Logic LDA (Mei et al., 2014)      Latent Topic Networks 31

Related work Correlations / Observed Additional Constraints Probabilistic Dependencies Covariates Latent Variables Programming Systems for Encoding Domain Knowledge, Covariates, and Correlations      CTM (Blei and Lafferty, 2007)      DMR (Mimno & McCallum, 2008)      Dirichlet Forests (Andzejewski et al., 2009      xLDA (Wahabzada et al., 2010)      SAGE (Eisenstein et al., 2011)      STM (Roberts et al., 2013) Graphical Modeling and Probabilistic Programming Systems      CTRF (Zhu & Xing, 2010)      Fold.all (Andrzejewski et al., 2011)      Logic LDA (Mei et al., 2014)      Latent Topic Networks 32

Example: modeling influence in citation networks 33 Foulds and Smyth (2013), EMNLP

Example: modeling influence in citation networks Which are the most important articles? 34 Foulds and Smyth (2013), EMNLP

Example: modeling influence in citation networks What are the influence relationships between articles? 35 Foulds and Smyth (2013), EMNLP

Topical influence regression Latent variables for document influence citation edge influence 36 Foulds and Smyth (2013), EMNLP

Topical influence regression Latent variables for document influence citation edge influence Probabilistic dependencies along the citation graph 37 Foulds and Smyth (2013), EMNLP

Encoding dependencies via logical rules Citing document also has the topic Restrict dependencies Influence and topic to citation graph value are both high 38

Encoding dependencies via logical rules Citing document also has the topic Restrict dependencies Influence and topic to citation graph value are both high 39

Encoding dependencies via logical rules Citing document also has the topic Restrict dependencies Influence and topic to citation graph are both high 40

Encoding dependencies via logical rules Citing document also has the topic Restrict dependencies Influence and topic to citation graph are both high 41

Encoding dependencies via logical rules Citing document also has the topic Restrict dependencies Influence and topic to citation graph are both high Entire model with just 5 rules! 42

Statistical relational learning • An “ interface layer for AI .” – Programming languages for specifying models and encoding domain knowledge – Typically based on first-order logic 43

Probabilistic soft logic (PSL) • A first-order logic-based SRL language 5.0: Logical operators Predicate Rule weight Continuous random variables! • Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models 44

A Versatile Probabilistic Programming Framework for Topic Models - PowerPoint PPT Presentation

Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models James Foulds Shachi Kumar Lise Getoor Jack Baskin School of Engineering University of California, Santa Cruz Probabilistic latent variable modeling Data

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Versatile specialists Beazley Interim report 2015 Versatile specialists Beazley plc is the parent

A Versatile Sharp I nterface I mmersed A Versatile Sharp I nterface I mmersed Boundary Method

VTRx/VTTx Status and production plan Versatile Link Jan Troska Stphane Dtraz, Lauri

Proposal Versatile Link 45 mm GBLD 7 mm GBTX SF-VTRx Opto Working Group Mini Workshop, 8

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Principles of Probabilistic Programming Lectures at EWSCS 2020 Winter School Joost-Pieter Katoen

Reactive Probabilistic Programming Semantics with Mixed Nondeterministic/Probabilistic Automata

INTRODUCTION TO CREATIVE CODING AND GAMES Introduction to Programming Versatile programming

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

A Brief Introduction to Probabilistic and Quantum Programming Part II Ugo Dal Lago Universidade

Quantum Energy Partners VII Discussion March 2017 Trade Secret and Highly Confidential THIS PAGE

Open Dynamics under Rapid Repeated Interaction Daniel Grimmer David Layden Eduardo

BOND PROJECTS UPDATE Therese Gain, FUSD Facilities Director Serafin Fernandez, Bond Program

From Red Tape to Green Tape: Improving Grievance Procedures in Local Government Organizations

CRAMO PLC INTERIM REPORT 1.1.2014 31.3.2014 CEO Vesa Koivula CFO Martti Ala-Hrknen FOR

Estimating treatment effects in online experiments Media in Context and the 2015 General Election:

SUBJECTED TO FATIGUE LOADING C. S. Grimmer 1 , C. K. H. Dharan 1* 1 Department of Mechanical

Using TeleHealth to Create Take- Home Elderly Nutrition Therapy David Beck Homeplate Group