Covariance in Unsupervised Learning of Probabilis6c Grammars - PowerPoint PPT Presentation

Covariance ¡in ¡Unsupervised ¡ Learning ¡of ¡Probabilis6c ¡Grammars ¡ Cohen ¡and ¡Smith ¡(2010) ¡ Presenter: ¡Alice ¡Lai ¡

Introduc6on ¡ • A ¡framework ¡for ¡modeling ¡covariance ¡in ¡ probabilis6c ¡grammars ¡ • Express ¡priors ¡using ¡logis6c ¡normal ¡ distribu6ons ¡ • Experiments ¡on ¡dependency ¡grammar ¡ induc6on ¡with ¡parameter ¡tying ¡within ¡and ¡ across ¡grammars ¡ ¡

Grammar ¡Induc6on ¡ • Grammar ¡induc6on: ¡unsupervised ¡discovery ¡ of ¡gramma6cal ¡structure ¡ • Bayesian ¡models ¡used ¡to ¡specify ¡priors ¡of ¡ probabilis6c ¡grammars ¡ • Many ¡models ¡use ¡Dirichlet ¡distribu6ons ¡ because ¡of ¡conjugate ¡prior ¡property ¡

Dependency ¡Grammars ¡ • Syntax ¡is ¡a ¡directed ¡tree, ¡words ¡are ¡ver6ces, ¡ edges ¡are ¡dependency ¡rela6ons ¡ • Two ¡words ¡have ¡a ¡dependency ¡rela6on ¡if ¡one ¡is ¡ an ¡argument ¡or ¡modifier ¡of ¡the ¡other ¡ Figure ¡from ¡Nivre ¡(2005), ¡Dependency ¡Grammar ¡and ¡Dependency ¡Parsing. ¡

Dependency ¡Model ¡with ¡Valence ¡ • Proposed ¡by ¡Klein ¡and ¡Manning ¡(2004) ¡ • Each ¡word ¡has: ¡ • Binomial ¡distribu6on ¡over ¡whether ¡it ¡has ¡any ¡leW/ right ¡children ¡ • Geometric ¡distribu6on ¡over ¡the ¡number ¡of ¡leW/ right ¡children ¡ • Inference ¡is ¡cubic ¡in ¡the ¡length ¡of ¡the ¡ sentence ¡ • Maximum ¡likelihood ¡via ¡EM ¡algorithm ¡

DMV ¡Example ¡ 𝑞 𝐲,𝐳 ⁠ 𝜄 = 𝜄↓𝑑 VBZ ⁠ $,r × 𝑞 y ↑( 1 ) ⁠ VBZ, 𝜄 y ¡= ¡ ¡$ ¡ ¡ ¡DT ¡ ¡ ¡ ¡ ¡JJ ¡ ¡ ¡ ¡ ¡ ¡ ¡NN ¡ ¡ ¡ ¡ ¡VBZ ¡ ¡ 𝑞 𝐳 ↑( 1 ) ⁠ VBZ, 𝜄 = 𝜄↓𝑡 ¬stop ⁠ VBZ,l, ¡f × 𝜄↓𝑑 NN ⁠ VBZ, ¡l × 𝑞 ( 𝐳 ↑( 2 ) |NN, 𝜄 )× 𝜄↓𝑡 (stop|VBZ,l,t)× 𝜄↓𝑡 (stop|VBZ,r,f) x ¡ = ¡ ¡ ¡ ¡ ¡ ¡ ¡ 〈$ ¡DT ¡JJ ¡NN ¡VBZ〉 ¡ ¡ 𝑞 𝐳 ↑( 2 ) ⁠ NN, 𝜄 = 𝜄↓𝑡 ¬stop ⁠ NN,l,f × 𝜄↓𝑑 JJ ⁠ NN,l × 𝜄↓𝑡 stop ⁠ JJ,r,f × 𝜄↓𝑡 stop ⁠ JJ,l,f × 𝜄↓𝑡 ¬stop ⁠ NN,l,t × 𝜄↓𝑑 DT ⁠ NN,l × 𝜄↓𝑡 stop ⁠ DT,r,f × 𝜄↓𝑡 stop ⁠ DT,l,f × 𝜄↓𝑡 stop ⁠ NN,l,t × 𝜄↓𝑡 stop ⁠ NN,r,f

Modeling ¡Covariance ¡ • We ¡expect ¡to ¡see ¡covariance ¡in ¡probabilis6c ¡ grammars ¡ • Words ¡and ¡word ¡classes ¡(e.g. ¡parts ¡of ¡speech) ¡ follow ¡pa^erns ¡ • Example: ¡the ¡probability ¡that ¡a ¡word ¡class ¡has ¡ singular ¡noun ¡arguments ¡is ¡related ¡to ¡the ¡ probability ¡that ¡it ¡has ¡plural ¡noun ¡arguments ¡ • Use ¡logis6c ¡normal ¡distribu6on ¡to ¡model ¡ covariance ¡

Logis6c ¡Normal ¡Distribu6on ¡ • Logis6c ¡transforma6on ¡of ¡mul6variate ¡normal ¡ distribu6on ¡to ¡points ¡on ¡probabilis6c ¡simplex ¡ • Used ¡by ¡Blei ¡and ¡Lafferty ¡(2006) ¡for ¡correlated ¡ topic ¡models ¡

Limita6ons ¡of ¡LN ¡Distribu6on ¡ • Covariance ¡only ¡modeled ¡within ¡a ¡ mul6nomial, ¡not ¡across ¡mul6nomials ¡ • Probabilis6c ¡grammar ¡models ¡involve ¡mul6ple ¡ mul6nomials ¡ • We ¡want ¡to ¡model ¡the ¡correla6on ¡between ¡ different ¡verb ¡types ¡(VBD, ¡VBZ) ¡both ¡taking ¡nouns ¡ as ¡arguments ¡

Par66oned ¡LN ¡Distribu6on ¡ • Define ¡a ¡Gaussian ¡over ¡ 𝑂 = ∑𝑙 =1 ↑𝐿▒𝑂↓𝑙 ¡ variables ¡with ¡one ¡ 𝑂 × 𝑂 ¡covariance ¡matrix ¡ • Covariance ¡matrix ¡models ¡correla6ons ¡ between ¡all ¡pairs ¡of ¡events ¡across ¡all ¡ mul6nomials ¡ • Apply ¡the ¡logis6c ¡transforma6on ¡to ¡ subvectors ¡to ¡get ¡individual ¡mul6nomials ¡

Shared ¡LN ¡Distribu6on ¡ • 𝑂 × 𝑂 ¡size ¡covariance ¡matrix ¡is ¡expensive ¡to ¡ create ¡ • Instead ¡of ¡a ¡single ¡normal ¡vector ¡for ¡all ¡ mul6nomials, ¡use ¡several ¡normal ¡vectors ¡ • Par66on ¡normal ¡vectors, ¡use ¡ 𝑂 ¡normal ¡ experts ¡to ¡sample ¡from ¡mul6nomials, ¡ recombine ¡parts ¡of ¡vectors ¡and ¡take ¡average ¡ • Result: ¡ 𝜄 ~SLN( 𝜈 ,Σ, 𝜀 ) ¡

SLN ¡Example ¡

Bayesian ¡Models ¡over ¡Grammars ¡ • Use ¡maximum ¡ a ¡posteriori ¡framework ¡for ¡learning ¡ with ¡symmetric ¡Dirichlet ¡priors ¡(Smith ¡2006): ¡ ¡ ¡ • This ¡model: ¡treat ¡ 𝜄 ¡as ¡a ¡hidden ¡variable: ¡integrate ¡ out ¡ 𝜄 ¡in ¡the ¡probability ¡of ¡the ¡data ¡ ¡ • Es6mate ¡ 𝛽 , ¡the ¡distribu6on ¡over ¡grammar ¡parameters ¡

Two ¡Model ¡Varia6ons ¡ Model ¡1: ¡grammar ¡parameters ¡ 𝜄 ¡drawn ¡once ¡ per ¡sentence ¡ Model ¡2: ¡grammar ¡parameters ¡ 𝜄 ¡drawn ¡once ¡for ¡ all ¡sentences ¡in ¡corpus ¡

Choosing ¡the ¡Prior ¡Distribu6on ¡ • Raiffa ¡and ¡Schaifer ¡(1961) ¡establish ¡3 ¡ necessary ¡quali6es ¡for ¡prior ¡distribu6ons ¡ 1) Analy6cal ¡tractability ¡ 2) Richness ¡ 3) Interpretability ¡ • Most ¡literature ¡has ¡focused ¡on ¡(1), ¡using ¡a ¡ Dirichlet ¡prior ¡because ¡it ¡is ¡conjugate ¡to ¡the ¡ mul6nomial ¡family ¡ • What ¡about ¡(2) ¡and ¡(3)? ¡

Dirichlet ¡Priors ¡ • Computa6onally, ¡a ¡good ¡choice ¡for ¡prior ¡ because ¡of ¡analy6c ¡tractability ¡ • May ¡encourage ¡sparse ¡solu6ons ¡(elimina6ng ¡ unnecessary ¡grammar ¡rules) ¡ • However, ¡no ¡explicit ¡covariance ¡structure ¡ when ¡drawing ¡ 𝜄 ¡from ¡a ¡Dirichlet ¡distribu6on ¡

LN ¡Priors ¡ • Define ¡one ¡LN ¡distribu6on ¡for ¡each ¡mul6nomial ¡ • SLN ¡covariance: ¡define ¡one ¡normal ¡expert ¡for ¡each ¡ single ¡mul6nomial ¡and ¡other ¡experts ¡across ¡related ¡ mul6nomials ¡ • Prior ¡over ¡ 𝜄↓𝑙 ¡that ¡allows ¡covariance ¡among ¡ 〈 𝜄↓{𝑙 ,1 } ,…, 𝜄↓{𝑙 , 𝑂↓𝑙 } 〉 ¡ • For ¡SLN, ¡covariance ¡among ¡ 𝜄↓{𝑙 , 𝑗} ¡not ¡directly ¡ defined ¡ • Normal ¡experts ¡ 𝜃↓{𝑗 , 𝑘} ¡define ¡this ¡rela6onship. ¡Think ¡ of ¡ 𝜃↓{𝑗 , 𝑘} ¡as ¡weights ¡associated ¡with ¡event ¡ probabili6es. ¡

Decoding ¡ • How ¡to ¡choose ¡an ¡analysis ¡(gramma6cal ¡ structure ¡ y ) ¡given ¡the ¡input ¡ • Viterbi ¡decoding: ¡the ¡most ¡likely ¡analysis ¡ • Minimum ¡Bayes ¡risk ¡decoding: ¡the ¡analysis ¡that ¡ minimizes ¡risk ¡ ¡ ¡ • cost(𝐳, ¡ 𝐳 ↑ ∗ ) ¡is ¡the ¡cost ¡of ¡choosing ¡ 𝐳 ¡ when ¡the ¡ correct ¡analysis ¡is ¡ 𝐳 ↑ ∗ ¡

3 ¡Decoding ¡Techniques ¡ 1) Viterbi ¡decoding ¡applied ¡to ¡point ¡es6mate ¡of ¡ 𝜄 ¡ 2) MBR ¡decoding ¡applied ¡to ¡point ¡es6mate ¡of ¡ 𝜄 ¡ • Loss ¡func6on ¡is ¡dependency ¡a^achment ¡error. ¡ 3) Commi^ee ¡decoding: ¡randomly ¡sample ¡ grammar ¡weights, ¡apply ¡decoding, ¡average ¡ results ¡ • Viterbi ¡and ¡MBR ¡ignore ¡covariance ¡matrix ¡ Σ ¡ • This ¡method ¡has ¡generaliza6on ¡error ¡guarantees ¡

Varia6onal ¡Inference ¡ • Bound ¡the ¡log-‑likelihood ¡and ¡op6mize ¡with ¡ respect ¡to ¡approximate ¡posterior ¡ 𝑟 ( 𝜄 , 𝒛 ) ¡ • Mean-‑field ¡approxima6on: ¡ 𝑟 ( 𝜄 , 𝒛 ) ¡is ¡factorized ¡ and ¡has ¡form ¡ 𝑟(𝜄 , 𝒛) 𝒛) = 𝑟(𝜄)𝑟 ( 𝒛 ) ¡ • LN ¡prior ¡requires ¡addi6onal ¡approxima6on ¡ because ¡of ¡lack ¡of ¡conjugacy ¡ • First-‑order ¡Taylor ¡approxima6on ¡to ¡log ¡of ¡ normaliza6on ¡of ¡LN ¡distribu6on ¡ • Use ¡inside-‑outside ¡algorithm ¡with ¡weighted ¡grammar ¡ for ¡inference ¡

Covariance in Unsupervised Learning of Probabilis6c Grammars - PowerPoint PPT Presentation

Covariance in Unsupervised Learning of Probabilis6c Grammars Cohen and Smith (2010) Presenter: Alice Lai Introduc6on A framework for modeling covariance in

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Parametric Models Part II: Expectation-Maximization and Mixture Density Estimation Selim Aksoy

Probabilistic Programming Fun but Intricate Too! Joost-Pieter Katoen with Friedrich Gretz, Nils

Transport problems 18.S995 - L32 dunkel@math.mit.edu Root systems Katifori lab, MPI Goettingen

Improved reconstruction attacks using range query leakage Marie-Sarah Lacharit Brice Minaud

Recap: variance/covariance structure for linear mixed models Important features of linear mixed

Visualizing covariates in proportional hazards Model comparison with rank-hazard plots

What does your model say? It may depend on who is asking David M. Drukker Executive Director of

Motivation: disease progression modelling Covariate-GPLVM Motivation: disease progression

Sambuz

Useful Links

Newsletter

Mail Us

Covariance in Unsupervised Learning of Probabilis6c Grammars - PowerPoint PPT Presentation

Covariance in Unsupervised Learning of Probabilis6c Grammars Cohen and Smith (2010) Presenter: Alice Lai Introduc6on A framework for modeling covariance in

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Lecture 14 Covariance Functions 3/08/2018 1 More on Covariance Functions 2 Nugget Covariance

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Covariance Matrices and Covariance Operators Theory and Applications H` a Quang Minh Functional

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Covariance Matrices and Covariance Operators in Machine Learning and Pattern Recognition A

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Covariance &amp; anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming

Covariance Matrix Adaptation Covariance Matrix Adaptation Evolution Strategies Recalling New

Parametric Models Part II: Expectation-Maximization and Mixture Density Estimation Selim Aksoy

Probabilistic Programming Fun but Intricate Too! Joost-Pieter Katoen with Friedrich Gretz, Nils

Transport problems 18.S995 - L32 dunkel@math.mit.edu Root systems Katifori lab, MPI Goettingen

Improved reconstruction attacks using range query leakage Marie-Sarah Lacharit Brice Minaud

Recap: variance/covariance structure for linear mixed models Important features of linear mixed

Visualizing covariates in proportional hazards Model comparison with rank-hazard plots

What does your model say? It may depend on who is asking David M. Drukker Executive Director of

Motivation: disease progression modelling Covariate-GPLVM Motivation: disease progression

Sambuz

Useful Links

Newsletter

Mail Us

Covariance & anchored t ypes 1 Covariance? Wit hin t he t ype syst em of a programming