On The Information Geometry of Word Embedding Riccardo Volpi, joint - PowerPoint PPT Presentation

“Metode de optimizare Riemanniene pentru învăţare profundă” Proiect cofinanţat din Fondul European de Dezvoltare Regională prin Programul Operaţional Competitivitate 2014-2020 On The Information Geometry of Word Embedding Riccardo Volpi, joint work with D. Marinelli, P. Hlihor, and L. Malagò Romanian Institute of Science and Technology Synergies in GDA Workshop 08 December, 2017

Word Embedding A word embedding maps the words of a dictionary in a real vector space, based on the notion of context “You shall know a word by the company it keeps” . Firth, 1957. 1/6

Word Embedding A word embedding maps the words of a dictionary in a real vector space, based on the notion of context “You shall know a word by the company it keeps” . Firth, 1957. p ( χ ∣ w ) = exp ( u T w v χ )/ Z w Analogies of the form a ∶ b = c ∶ d can be solved by ∥ u a − u b − u c + u d ∥ 2 = arg min d ( ln p ( χ ∣ a ) p ( χ ∣ b ) − ln p ( χ ∣ c ) 2 = arg min p ( χ ∣ d )) ∑ c χ ∈ D ▸ The space of word embedding has a linear geometry (cf. Arora et. al., ▸ The general model used by ’16), where vectors express semantic Skip-Gram (Mikolov et. al., ’13) and relationships between contexts Glove (Pennington et. al., ’14) 1/6

Exponential Family and Conditional Distributions Consider the joint probability distribution for W and X p ( χ,w ) = exp ( w T Cχ )/ Z, with C = U T V ▸ Conditional distributions p ( χ ∣ w ) = exp ( u T w v χ )/ Z w lay on the boundary of the joint statistical model ▸ Each column vector of U identifies a p w in the conditional model ▸ For a fixed V , all conditional simplexes are homomorphic one to each other 2/6

Exponential Family and Conditional Distributions Consider the joint probability distribution for W and X p ( χ,w ) = exp ( w T Cχ )/ Z, with C = U T V ▸ Conditional distributions p ( χ ∣ w ) = exp ( u T w v χ )/ Z w lay on the boundary of the joint statistical model ▸ Each column vector of U identifies a p w in the conditional model ▸ For a fixed V , all conditional simplexes are homomorphic one to each other We aim at characterizing the geometry of word embedding, based on alternative geometries for the exponential family studied in Information Geometry (Amari and Nagaoka, ’00) 2/6

Geometric Word Analogies Let p w be the conditional probability p ( χ ∣ W = w ) , and p a reference context ▸ The logarithmic map M → T p M is defined by a = Log p a ( p b ) ∆ p b ▸ The parallel transport of A p A ∶ T p a M → T p M Π p a ▸ Norms are computed by p = a T I ( p ) a , where I ( p ) is ∥ A ∥ 2 the Fisher information matrix Analogies of the form a ∶ b = c ∶ ? can solved by 2 ∥ Π p a p ∆ p b a − Π p c p ∆ p d c ∥ arg min p , d 3/6

The Framework in Practice: The Full Simplex ▸ For d = # ( D ) , any point ( ρ ) χ in the interior of the simplex corresponds to a conditional probability p ( χ ∣ W = w ) ▸ By setting ρ ↦ √ ρ , the probability simplex is mapped to the positive spherical orthant and the geometry of the sphere is obtained 4/6

The Framework in Practice: The Exponential Family ▸ For d ≤ # ( D ) , the Riemannian geometry of the exponential family is defined by the Fisher-Rao metric ▸ Moreover, there are at least two other affine geometries of interest: the exponential geometry and the mixture geometry 5/6

The Framework in Practice: The Exponential Family ▸ For d ≤ # ( D ) , the Riemannian geometry of the exponential family is defined by the Fisher-Rao metric ▸ Moreover, there are at least two other affine geometries of interest: the exponential geometry and the mixture geometry ▸ [Proposition] Let p 0 be the uniform distribution over D , e Π q p , and e ∆ p b a be defined according to the exponential geometry, under the hypothesis of isotropy distribution for the v ’s 2 ∥ e Π p a p ( e ∆ p b a ) − e Π p c p ( e ∆ p d c )∥ arg min p 0 , d reduces to ∥ u a − u b − u c + u d ∥ 2 , arg min d 5/6

Conclusions and Future Perspectives ▸ The language of Information Geometry can be used to describe the geometry of word embeddings ▸ We have defined a parameter-invariant way to solve word analogies ▸ The exponential geometry of the exponential family allows to recover the standard way to solve word analogies ▸ Evaluating experimentally the role of different geometries of word embedding 6/6

Conclusions and Future Perspectives ▸ The language of Information Geometry can be used to describe the geometry of word embeddings ▸ We have defined a parameter-invariant way to solve word analogies ▸ The exponential geometry of the exponential family allows to recover the standard way to solve word analogies ▸ Evaluating experimentally the role of different geometries of word embedding “One geometry cannot be more true than another; it can only be more convenient” . Henri Poincaré, Science and Hypothesis, 1902. 6/6

On The Information Geometry of Word Embedding Riccardo Volpi, joint - PowerPoint PPT Presentation

Metode de optimizare Riemanniene pentru nvare profund Proiect cofinanat din Fondul European de Dezvoltare Regional prin Programul Operaional Competitivitate 2014-2020 On The Information Geometry of Word Embedding Riccardo

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Stochastic geometry and random generation 1 Stochastic geometry and random generation

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Building On The Word Building On The Word Nehemiah 8:1-8 Nehemiah 8:1-8

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Hairs of a higher-dimensional analogue of the exponential family Patrick Comdhr

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random

Bayesian Methods 1 Chris Williams School of Informatics, University of Edinburgh October 2015 1

Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui 1 Tuyen N.

Perturbation Hiding and the Batch Steganography Problem Andrew Ker @

Mixed models for binary and count data Rasmus Waagepetersen Department of Mathematics Aalborg

scheduling 3 1 Changelog Changes not seen in fjrst lecture: 4 Feb 2020: MLFQ example: number

Sambuz

Useful Links

Newsletter

Mail Us

On The Information Geometry of Word Embedding Riccardo Volpi, joint - PowerPoint PPT Presentation

Metode de optimizare Riemanniene pentru nvare profund Proiect cofinanat din Fondul European de Dezvoltare Regional prin Programul Operaional Competitivitate 2014-2020 On The Information Geometry of Word Embedding Riccardo

Greedy embedding of a graph Greedy embedding of a graph 99 Greedy embedding Greedy embedding

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Graph Drawing Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 )

Planarity Embedding Embedding For a given graph G = ( V , E ) , an embedding (into R 2 ) assigns

Stochastic geometry and random generation 1 Stochastic geometry and random generation

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding

Embedding 3-manifolds via surgery on surfaces Kyle Larson University of Texas at Austin

48-175 Descriptive Geometry Basic Concepts of Descriptive Geometry Descriptive geometry is

Hyperbolic Geometry Victor Gonzalez Mentor: Ryan Kirk May 4, 2016 Hyperbolic Geometry We are

Geometry Problems Geometry Problems Examples for Typical ACM Instances Elementary Geometry

3d Geometry for Computer Graphics Lesson 1: Basics &amp; PCA 3d geometry 3d geometry 3d

&gt;&gt;&gt;CLICK HERE&lt;&lt;&lt; Presentation d un document word New Haven. peugeot 207 workshop

Is this a word that would be used by a mature language user? Is it a frequently used word?

Word Meaning &amp; Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Building On The Word Building On The Word Nehemiah 8:1-8 Nehemiah 8:1-8

Create PDF in MS Word 2013 using Adobe Distiller 10 Sep 2020 V0C V0C Create PDF In MS Word 2013

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Hairs of a higher-dimensional analogue of the exponential family Patrick Comdhr

Bias-Adjusted Maximum Likelihood Estimation Improving Estimation for Exponential-Family Random

Bayesian Methods 1 Chris Williams School of Informatics, University of Edinburgh October 2015 1

Automorphism Groups of Graphical Models and Lifted Variational Inference Hung Hai Bui 1 Tuyen N.

Perturbation Hiding and the Batch Steganography Problem Andrew Ker @

Mixed models for binary and count data Rasmus Waagepetersen Department of Mathematics Aalborg

scheduling 3 1 Changelog Changes not seen in fjrst lecture: 4 Feb 2020: MLFQ example: number

Sambuz

Useful Links

Newsletter

Mail Us

3d Geometry for Computer Graphics Lesson 1: Basics & PCA 3d geometry 3d geometry 3d

>>>CLICK HERE<<< Presentation d un document word New Haven. peugeot 207 workshop

Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT