Word Semantic Representations using Bayesian Probabilistic Tensor - PowerPoint PPT Presentation

Word Semantic Representations using Bayesian Probabilistic Tensor Factorization Jingwei Zhang, Jeremy Salwen, Michael Glass and Alfio Gliozzo Department of Computer Science Columbia University IBM T.J. Watson Research Center Tuesday 21 st October, 2014

Outline 1 Introduction Objectives Motivating Idea 2 Bayesian Probabilistic Tensor Factorization Background Model Algorithm 3 Experimental Validation Resources Task Results 4 Related Works Word Vector Representations 5 Conclusion

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Objectives Objectives Combining word relatedness measures Many approaches to word relatedness Manually constructed lexical resources Distributional vector space approaches Topic-based vector spaces Continuous word representation Word embedding method capable of distinguishing synonyms and antonyms. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Motivating Idea Motivating Idea Resources for word relatedness can be complementary Manual resources get at interesting relationships Automatic methods provide high coverage without extensive human effort. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Background Collaborative Filtering Bayesian Probabilistic Matrix Factorization (BPMF) introduced for collaborative filtering (Salakhutdinov and Minh 2008 [10]) Bayesian Probabilistic Tensor Factorization (BPTF) incorporated temporal factors (Xiong et al 2010 [13]) Competitive results on real-world recommendation data sets. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model Hypothesis There is some latent set of word vectors The word relatedness measures are constructed through these latent vectors. Each word relatedness measure has some associated perspective vector Combining the perspective with the dot product of the word vectors gives the word relatedness measure. There is also some Gaussian noise. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model Basics Bayesian Probabilistic We determine the probability for a parameterization of our model by considering the probability of the data given the model, and the prior for the model. Tensor Factorization We will find vectors that when combined, give high probability to the observed tensor. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model BPTF Model - Tensor Relatedness tensor R ∈ R N × N × K . joy gladden sorrow sadden anger joyfulness 1 1 -1 gladden 1 1 -1 sad -1 1 1 R (1) : Lexical similarity joy gladden sorrow sadden anger joyfulness .3 .1 -.1 .1 .3 gladden .2 1 .2 .7 -.1 sad .6 0 .4 .5 .1 R (2) : Distributional similarity J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model BPTF Model[10][13] R k ij | V i , V j , P k ∼ N ( < V i , V j , P k >, α − 1 ) , where < · , · , · > is a generalization of dot product: D V ( d ) V ( d ) P ( d ) � < V i , V j , P k > ≡ , i j k d =1 α is the precision, the reciprocal of the variance. V i and V j are the latent vectors of word i and word j P k is the latent vector for perspective k J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model Vectors and Perspectives V i ∼ N ( µ V , Λ − 1 V ) , P i ∼ N ( µ P , Λ − 1 P ) , µ V and µ P are D dimensional vectors Λ V and Λ P are D -by- D precision matrices. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model Hyper parameters Conjugate Priors p ( α ) = W ( α | ˆ W 0 , ν 0 ) , p ( µ V , Λ V ) = N ( µ V | µ 0 , ( β 0 Λ V ) − 1 ) W (Λ V | W 0 , ν 0 ) , p ( µ P , Λ P ) = N ( µ P | µ 0 , ( β 0 Λ P ) − 1 ) W (Λ P | W 0 , ν 0 ) , J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Model W 0 , ν 0 Λ P k = 1 , ..., K P k µ 0 µ P α i , j = 1 , ..., N R k i � = j ij I k i , j = 1 V i V j · · · · · · · · · W 0 , ν 0 Λ V µ V µ 0 J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Algorithm Gibbs sampling Algorithm 1 Gibbs Sampling for BPTF Initialize the parameters. repeat Sample the hyper-parameters α , µ V , Λ V , µ P , Λ P for i = 1 to N do Sample V i end for for k = 1 to 2 do Sample P k end for until convergence J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Algorithm Out-of-vocabulary embedding Generalize to words not present in a perspective Can include all words in the BPTF procedure. More efficient: compute the R i , j for the perspective of interest using only the V i Gibbs sampling and the perspective dot product. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Algorithm Predictions Generalize and regularize the relatedness tensor by averaging over samples M ij | R ) ≈ 1 p (ˆ p (ˆ R k � R k ij | V m i , V m j , P m k , α m ) , M m =1 J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Algorithm Tuning Number of dimensions for latent word and perspective vectors: D = 40 Untuned hyper-priors µ 0 = 0 ν 0 = ˆ ν 0 = D β 0 = 1 W 0 = ˆ W 0 = I J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Resources Thesaurus 1 WordNet 2 Roget’s Thesaurus 3 Encarta Thesaurus 1 4 Macquarie Thesaurus 2 1 Not available. J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Resources Neural word embeddings Linguistic regularities [7] (e.g. King − Man+Woman ≈ Queen). Better for rare word: morphologically- trained word vectors [5]. Source: T. Minkolov J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Introduction Bayesian Probabilistic Tensor Factorization Experimental Validation Related Works Conclusion Task Evaluation The GRE test dataset by Mohammad Development set: 162 questions Test set: 950 questions Example GRE Antonym Question desultory 1 phobic 2 entrenched 3 fabulous 4 systematic 5 inconsequential J. Zhang, J. Salwen, M. Glass, A. Gliozzo Columbia University & IBM Research Word Semantic Representations with BPTF

Word Semantic Representations using Bayesian Probabilistic Tensor - PowerPoint PPT Presentation

Word Semantic Representations using Bayesian Probabilistic Tensor Factorization Jingwei Zhang, Jeremy Salwen, Michael Glass and Alfio Gliozzo Department of Computer Science Columbia University IBM T.J. Watson Research Center Tuesday 21 st

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk Essential

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Word representations and modelling ambiguity: A case study of metaphor Ekaterina Shutova ILLC

61A Lecture 16 Announcements String Representations String Representations 4 String

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

FlickOh : Personalized Movie Recommendation and Rating System What is FlickOh? Movie rating

SYSTEM FOR TURKISH CUISINE Supervisor Assist. Prof. Dr. Engin DEMR Prepared by 201112031

Requirements Engineering in the Days of Social Computing Computing John Mylopoulos University

Recommender System in KKBOX Simple Complex Ranking Model Collaborative Persona Aware based

Improving Care & Reducing Costs with Hotspotting & Community-Based Care

Project Tools Kate M. Sherman Manager, Readmissions Quality Collaborative New York State

Cognitive IoT Recipe Maven Cognitive IoT Recipe Maven Digital Expertise in the Kitchen Digital

Technical Communication Presentation Preparation May 30, 2017 A Note to 316 Instructors This

Word Semantic Representations using Bayesian Probabilistic Tensor - PowerPoint PPT Presentation

Word Semantic Representations using Bayesian Probabilistic Tensor Factorization Jingwei Zhang, Jeremy Salwen, Michael Glass and Alfio Gliozzo Department of Computer Science Columbia University IBM T.J. Watson Research Center Tuesday 21 st

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

NLU lecture 5: Word representations and morphology Adam Lopez alopez@inf.ed.ac.uk Essential

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Treating metadata in agriculture Treating metadata in agriculture using Semantic MediaWiki using

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Word representations and modelling ambiguity: A case study of metaphor Ekaterina Shutova ILLC

61A Lecture 16 Announcements String Representations String Representations 4 String

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

FlickOh : Personalized Movie Recommendation and Rating System What is FlickOh? Movie rating

SYSTEM FOR TURKISH CUISINE Supervisor Assist. Prof. Dr. Engin DEMR Prepared by 201112031

Requirements Engineering in the Days of Social Computing Computing John Mylopoulos University

Recommender System in KKBOX Simple Complex Ranking Model Collaborative Persona Aware based

Improving Care &amp; Reducing Costs with Hotspotting &amp; Community-Based Care

Project Tools Kate M. Sherman Manager, Readmissions Quality Collaborative New York State

Cognitive IoT Recipe Maven Cognitive IoT Recipe Maven Digital Expertise in the Kitchen Digital

Technical Communication Presentation Preparation May 30, 2017 A Note to 316 Instructors This

Improving Care & Reducing Costs with Hotspotting & Community-Based Care