Representing Documents via Latent Keyphrase Inference April. 15 th , - PowerPoint PPT Presentation

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016

Document Representation in Vector Space Critical for document retrieval, categorization 2

Traditional Methods q Bag-of-Words or Phrases q Cons: Sparse on short texts 3

q Topic models [LDA] Each topic is a distribution over words, each document is a mixture of corpus-wide topics q Cons: Difficult for human to infer topic semantics 4

q Concept-based models [ESA] Every Wikipedia article represents a concept Concept: Panthera Cat [0.92] Leopard [0.84] Roar [0.77] Article words are associated with the concept (TF.IDF), which help infer concepts from document q Cons: Low coverage of concepts in human-curated knowledge base 5

q Word/Document embedding models [word2vec paragraph2vec] q Cons: Difficult to explain what each dimension means 6

Document Representation Using Keyphrases Corpus Domain Keyphrases <K 1 , K 2 , …, K M > q Use domain keyphrases as the entries in the vector and q Identify document keyphrases (subset of domain keyphrases) by evaluating relatedness between (doc, domain keyphrase) q Unsupervised model 7

Challenges q Where to get domain keyphrases from a given corpus? Mining Quality Phrases from Massive Text Corpora [SIGMOD15] q q How to identify document keyphrases? q Can be latent mentions (short text) q Relatedness scores 8

How to identify document keyphrases? q Powered by Bayesian Inference on “Domain Keyphrase Silhouette” Domain Keyphrase Silhouette: Topic centered on domain keyphrase q “Reverse” topic models q Learned from corpus q 9

Framework for Latent Keyphrase Inference (LAKI) 10

Domain Keyphrase Silhouette q Learning Hierarchical Bayesian Network (DAG) Binary Variables Task 1: Model Learning: learning link weights Task 2: Structure Learning: learning network structure 12

Task 1: Model Learning given Structure q Use Z to represent K (domain keyphrases) and T (content units) q Noisy-OR A parent node is easier to activate its children q when the link weight is larger Toy example A child node is influenced by all its parents q Noise / Prior Aggregated over all other links connected with 𝑎 " 13

Maximum Likelihood Estimation q Training data: Documents q Expectation-step: q For each document, collect sufficient statistics q Link firing (Parent, child both being activated) probability q Node activation probability Partially observed q Maximization-step: document keyphrases q Update link weight Fully observed content units 14

Task 2: Structure Learning q Domain keyphrases are connected to content units Help infer document keyphrases from content units q q Domain keyphrases are interconnected Help infer document keyphrases from other keyphrases q 15

A Heuristic Approach q Data-Driven, DAG, similar to ontology q Heuristic: q Two nodes are connected only q Closely Related: word2vec q Co-occur frequently q Links are always point to less frequent nodes q Work well in practice 16

Inference q Exact inference is slow! q NP hard to compute posterior probability for Noisy-Or networks q Approximate inference instead q Pruning irrelevant nodes using an efficient scoring function q Gibbs sampling 18

Experiments q Two text-related tasks to evaluate document representation quality q Phrase relatedness q Document classification q Two datasets 19

Methods ESA (Explicit Semantic Analysis) q KBLink uses link structure in Wikipedia q BoW (bag-of-words) q ESA-C extends ESA by replacing Wiki with domain corpus q LSA (Latent Semantic Analysis) q LDA (Latent Dirichlet Allocation) q Word2Vec is a neural network computing word q embeddings EKM uses explicit keyphrase detection q 20

Phrase Relatedness Correlation Document Classification 21

Case Study 22

Time Complexity 500 500 1500 Runing Time (ms) Runing Time (ms) Runing Time (ms) 400 400 300 300 1000 200 200 500 Academia Academia Academia 100 100 Yelp Yelp Yelp 0 1000 3000 5000 7000 9000 10 100 200 300 400 500 0 100 200 400 800 #Samples #Quality Phrases After Pruning #Words 24

Breakdown of Processing Time 25

Conclusion We have introduced a novel document representation method using latent q keyphrases Each dimension is explainable q Works for short text q Works for closed-domain text q We have developed an efficient inference method to do real time keyphrase q identification Future work q Better structure learning approach q Combined with knowledge base q Try other inference method other than Gibbs sampling q Code available at http://jialu.info q 26 26

Representing Documents via Latent Keyphrase Inference April. 15 th , - PowerPoint PPT Presentation

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016 Document Representation in Vector Space Critical for document retrieval, categorization 2 Traditional Methods q Bag-of-Words or Phrases q Cons: Sparse on short texts 3 q

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear

Deep Keyphrase Generation Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu

GpKex : Genetically Programmed Keyphrase Extraction from Croatian Texts Marko Bekavac and Jan

1 WHAT L-KD ( Labelled-KD ): tool for keyphrase clustering and labelling Extension of KD:

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor & Client: Dr. Randy

ZEB1 Regulates the Latent- -Lytic Lytic Switch Switch ZEB1 Regulates the Latent in Infection

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

Discriminating Languages in a Probabilistic Latent Subspace Aleksandr Sizov , Kong Aik Lee, Tomi

Video Synthesis from the StyleGAN Latent Space Advisor Dr. Chris Pollett Committee Members By

Event Generation and Statistical Sampling with Deep Generative Models Rob Verheyen Introduction

Nonparametric spectral-based estimation of latent structures Stphane Bonhomme (Chicago), Koen

Representing Documents via Latent Keyphrase Inference April. 15 th , - PowerPoint PPT Presentation

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016 Document Representation in Vector Space Critical for document retrieval, categorization 2 Traditional Methods q Bag-of-Words or Phrases q Cons: Sparse on short texts 3 q

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear

Deep Keyphrase Generation Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu

GpKex : Genetically Programmed Keyphrase Extraction from Croatian Texts Marko Bekavac and Jan

1 WHAT L-KD ( Labelled-KD ): tool for keyphrase clustering and labelling Extension of KD:

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor &amp; Client: Dr. Randy

ZEB1 Regulates the Latent- -Lytic Lytic Switch Switch ZEB1 Regulates the Latent in Infection

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

Discriminating Languages in a Probabilistic Latent Subspace Aleksandr Sizov , Kong Aik Lee, Tomi

Video Synthesis from the StyleGAN Latent Space Advisor Dr. Chris Pollett Committee Members By

Event Generation and Statistical Sampling with Deep Generative Models Rob Verheyen Introduction

Nonparametric spectral-based estimation of latent structures Stphane Bonhomme (Chicago), Koen

Latent Damage and Reliability in Semiconductor Devices May1625 - Advisor & Client: Dr. Randy