Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, - PowerPoint PPT Presentation

Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, Noah A. Smith July 18, 2018

Outline Main points of this talk: 1. Introducing Scholar 1 : a neural model for documents with metadata Background (LDA, SAGE, SLDA, etc.) Model and related work Experiments and Results 2. Power of neural variational inference for interactive modeling 1 Sparse Contextual Hidden and Observed Language Autoencoder 1

Latent Dirichlet Allocation Blei, Ng, and Jordan. Latent Dirichlet Allocation . JMLR. 2003. David Blei. Probabilistic topic models . Comm. ACM. 2012 2

Types of metadata Date or time Author(s) Rating Sentiment Ideology etc. 3

Variations and extensions Author topic model (Rosen-Zvi et al 2004) Supervised LDA (SLDA; McAuliffe and Blei, 2008) Dirichlet multinomial regression (Mimno and McCallum, 2008) Sparse additive generative models (SAGE; Eisenstein et al, 2011) Structural topic model (Roberts et al, 2014) ... 4

Desired features of model Fast, scalable inference. Easy modification by end-users. 5

Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). 5

Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). Possibility of sparse topics. 5

Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). Possibility of sparse topics. Incorporate additional prior knowledge. 5

Desired features of model Fast, scalable inference. Easy modification by end-users. Incorporation of metadata: Covariates: features which influences text (as in SAGE). Labels: features to be predicted along with text (as in SLDA). Possibility of sparse topics. Incorporate additional prior knowledge. → Use variational autoencoder (VAE) style of inference (Kingma and Welling, 2014) 5

Desired outcome Coherent groupings of words (something like topics), with offsets for observed metadata 6

Desired outcome Coherent groupings of words (something like topics), with offsets for observed metadata Encoder to map from documents to latent representations 6

Desired outcome Coherent groupings of words (something like topics), with offsets for observed metadata Encoder to map from documents to latent representations Classifier to predict labels from from latent representation 6

Model k i generator network: p ( w i ) = f g ( ) words 7

Model k i p ( w ) generator network: p ( w i ) = f g ( ) i words 8

Model k i p ( w ) generator network: p ( w i ) = f g ( ) i q ( w ) i words 9

Model k i p ( w ) generator network: p ( w i ) = f g ( ) i q ( w ) i words ELBO = E q [ log p ( words | θ i )] − D KL [ q ( θ i | words ) � p ( θ i )] 10

Model words encoder network: q ( w ) = f e ( ) i k i generator network: p ( w i ) = f g ( ) words ELBO = E q [ log p ( words | θ i )] − D KL [ q ( θ i | words ) � p ( θ i )] 11

Model words encoder network: q ( w ) = f e ( ) i r i k i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words ELBO = E q [ log p ( words | r i )] − D KL [ q ( r i | words ) � p ( r i )] 12

Model words encoder network: q ( w ) = f e ( ) i r i k i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words � S s = 1 [ log p ( words | r ( s ) ELBO ≈ 1 )] − D KL [ q ( r i | words ) � p ( r i )] S i 13

Model words (0, I ) encoder network: q ( w ) = f e ( ) i r i k i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words � S s = 1 [ log p ( words | r ( s ) ELBO ≈ 1 )] − D KL [ q ( r i | words ) � p ( r i )] S i 14

Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words � S s = 1 [ log p ( words | r ( s ) ELBO ≈ 1 )] − D KL [ q ( r i | words ) � p ( r i )] S i 15

Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) words Srivastava and Sutton, 2017, Miao et al, 2016 16

Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) k i generator network: p ( w i ) = f g ( ) y i words 17

Model words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) c i k i generator network: p ( w i ) = f g ( ) y i words 18

Model , c i , y i words (0, I ) encoder network: q ( w ) = f e ( ) i = q + ( s ) r i k q i = softmax ( r i ) c i k i generator network: p ( w i ) = f g ( ) y i words 19

Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) 20

Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) Optionally include interactions between topics and covariates 20

Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) Optionally include interactions between topics and covariates p ( y i | θ i , c i ) = f y ( θ i , c i ) 20

Scholar Generator network: i B ( topic ) + c T p ( word | θ i , c i ) = softmax ( d + θ T i B ( cov ) ) Optionally include interactions between topics and covariates p ( y i | θ i , c i ) = f y ( θ i , c i ) Encoder: µ i = f µ ( words , c i , y i ) log σ i = f σ ( words , c i , y i ) Optional incorporation of word vectors to embed input 20

Optimization Stochastic optimization using mini-batches of documents Tricks from Srivastava and Sutton, 2017: Adam optimizer with high-learning rate to bypass mode collapse Batch-norm layers to avoid divergence Annealing away from batch-norm output to keep results interpretable 21

Output of Scholar B ( topic ) , B ( cov ) : Coherent groupings of positive and negative deviations from background ( ∼ topics) 22

Output of Scholar B ( topic ) , B ( cov ) : Coherent groupings of positive and negative deviations from background ( ∼ topics) f µ , f σ : Encoder network: mapping from words to topics: ˆ θ i = softmax ( f e ( words , c i , y i , ǫ )) 22

Output of Scholar B ( topic ) , B ( cov ) : Coherent groupings of positive and negative deviations from background ( ∼ topics) f µ , f σ : Encoder network: mapping from words to topics: ˆ θ i = softmax ( f e ( words , c i , y i , ǫ )) f y : Classifier mapping from ˆ θ i to labels: ˆ y = f y ( θ i , c i ) 22

Evaluation 1. Performance as a topic model, without metadata (perplexity, coherence) 2. Performance as a classifier, compared to SLDA 3. Exploratory data analysis 23

Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA IMDB dataset (Maas, 2011) 24

Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE IMDB dataset (Maas, 2011) 25

Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM IMDB dataset (Maas, 2011) 26

Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM Scholar IMDB dataset (Maas, 2011) 27

Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM Scholar Scholar +wv IMDB dataset (Maas, 2011) 28

Quantitative results: basic model 2000 Perplexity 1000 0 0.2 Coherence 0.1 0.0 Sparsity 0.5 0.0 LDA SAGE NVDM Scholar Scholar Scholar +wv +sparsity IMDB dataset (Maas, 2011) 29

Classification results 1.0 0.9 0.8 Accuracy 0.7 0.6 0.5 LR SLDA Scholar Scholar (labels) (covariates) IMDB dataset (Maas, 2011) 30

Exploratory Data Analysis Data: Media Frames Corpus (Card et al, 2015) Collection of thousands of news articles annotated in terms of tone and framing Relevant metadata: year of publication, newspaper, etc. 31

Tone as a label english language city spanish community boat desert died men miles coast haitian visas visa applications students citizenship asylum judge appeals deportation court labor jobs workers percent study wages bush border president bill republicans state gov benefits arizona law bill bills arrested charged charges agents operation 0 1 p (pro-immigration | topic) 32

Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, - PowerPoint PPT Presentation

Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, Noah A. Smith July 18, 2018 Outline Main points of this talk: 1. Introducing Scholar 1 : a neural model for documents with metadata Background (LDA, SAGE, SLDA, etc.) Model

Land Certication and Schooling in Rural Ethiopia Heather Congdon Fors 1 ,Kenneth Houngbedji 2 ,

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

A system for interactive display and rendering of BRDF models Adri Fors Herranz Advisors:

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

The Long and Winding Road Designing and Creating a Social Science Research

GUI$Applica+ons$ Concurrent$Swing$ Niklas$Fors$ 2013>11>13$

A Technical Introduction to Bitcoin Niklas Fors, 2018-02-20 Bitcoin Decentralized digital

28 years FORS Group Information system development, Consulting, IT infrastructure, Technical

Modelling and Estimating the Clustering of Extreme Events Rob Lamb 3 , 4 Ross Towe 1 Jonathan Tawn

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton

Coupling 3D radiative transfer models with soil vegetation transfer models for sparse vegetation

The Ontario Health & Environment Integrated Surveillance (OHEIS) Project Presented to: The

GeoCom putational I ntelligence and GeoCom putational I ntelligence and High-perform ance

Enrique Baca Garca Disclosure of conflict of interest Grants/research support AFSP NARSAD

A Polymorphism in CALHM1 Influences Ca 2+ Homeostasis, A b Levels, and Alzheimers Disease Risk

Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, - PowerPoint PPT Presentation

Neural Models for Documents with Metadata Dallas Card, Chenhao Tan, Noah A. Smith July 18, 2018 Outline Main points of this talk: 1. Introducing Scholar 1 : a neural model for documents with metadata Background (LDA, SAGE, SLDA, etc.) Model

Land Certication and Schooling in Rural Ethiopia Heather Congdon Fors 1 ,Kenneth Houngbedji 2 ,

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

A system for interactive display and rendering of BRDF models Adri Fors Herranz Advisors:

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

From SDTM to displays, through ADaM &amp; Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Neural Machine Translation Gongbo Tang 16 September 2019 Outline Why Neural

The Long and Winding Road Designing and Creating a Social Science Research

GUI$Applica+ons$ Concurrent$Swing$ Niklas$Fors$ 2013&gt;11&gt;13$

A Technical Introduction to Bitcoin Niklas Fors, 2018-02-20 Bitcoin Decentralized digital

28 years FORS Group Information system development, Consulting, IT infrastructure, Technical

Modelling and Estimating the Clustering of Extreme Events Rob Lamb 3 , 4 Ross Towe 1 Jonathan Tawn

Covariate Balancing Propensity Score Kosuke Imai Princeton University Winter Conference in

Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton

Coupling 3D radiative transfer models with soil vegetation transfer models for sparse vegetation

The Ontario Health &amp; Environment Integrated Surveillance (OHEIS) Project Presented to: The

GeoCom putational I ntelligence and GeoCom putational I ntelligence and High-perform ance

Enrique Baca Garca Disclosure of conflict of interest Grants/research support AFSP NARSAD

A Polymorphism in CALHM1 Influences Ca 2+ Homeostasis, A b Levels, and Alzheimers Disease Risk

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

GUI$Applica+ons$ Concurrent$Swing$ Niklas$Fors$ 2013>11>13$

The Ontario Health & Environment Integrated Surveillance (OHEIS) Project Presented to: The