probabilistic grammars and hierarchical dirichlet
play

Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang - PowerPoint PPT Presentation

Introduction Induction: HDP-PCFG Bayesian Inference Induction Experiments Refinement: HDP-PCFG-GR Refinement Experiments Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung & Gourab Kundu CS


  1. Introduction Induction: HDP-PCFG Bayesian Inference Induction Experiments Refinement: HDP-PCFG-GR Refinement Experiments Probabilistic Grammars and Hierarchical Dirichlet Processes (Liang et. al 2009) Sean Massung & Gourab Kundu CS 598jhm April 9th 2013 Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  2. Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments Background This paper (chapter of a book) describes a Bayesian approach to the problem of syntactic parsing and the underlying problems of grammar induction and grammar refinement . Grammar induction : estimating grammars based on raw sentences alone, without any other type of supervision Original approaches had poor performance due to the coarse-grained nature of the syntactic categories Grammar refinement : “splitting” coarse-grained syntactic categories into finer, more accurate and descriptive labels e.g. parent annotation (syntactic), lexicalization (semantic) Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  3. Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments PCFG Example S s γ φ s ( γ ) ✏ PPP ✏ ✏ S → NP VP 0.9 ✏ P NP VP S → S CONJ S 0.1 ✏ PPPP ✏ ✏ NP → JJ JJ NNS 0.5 ✏ PRP VBP NP NP → PRP 0.5 ✘ ❳❳❳❳ ✘ ✘ ✘ ✘ ❳ VP → VP NP 0.4 They have JJ JJ NNS VP → VBP NP 0.3 VP → VBG NP 0.3 many theoretical ideas Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  4. Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments Mathematical Definition Formally, a PCFG is specified by the following: Σ, a set of terminal symbols (the words in the sentence) S , a set of nonterminal symbols (the syntactic categories) Root ∈ S , a designated nonterminal starting symbol φ , rule probabilities: φ = ( φ s ( γ ) : s ∈ S , γ ∈ Σ ∪ ( S × S )), such that φ s ( γ ) ≥ 0 and � γ φ s ( γ ) = 1 Note the restriction on γ , that γ ∈ Σ or γ ∈ ( S × S ). Such transitions make a PCFG in Chomsky normal form. Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  5. Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments Mathematical Definition II A parse tree has a set of nonterminal nodes N along with the corresponding symbols s = ( s i ∈ S , i ∈ N ). Now, let N E denote nodes having one terminal child, N B denote nodes having two nonterminal children The tree structure is represented by c = ( c j ( i ) : i ∈ N B , j = 1 , 2) for nonterminal nodes x = ( x i : i ∈ N E ) for terminal nodes (the “yield”) The joint probability of a parse tree z = ( N , s , c ) and x is then � � p ( x , z | φ ) = φ s i ( s c 1 ( i ) , s c 2 ( i ) ) φ s i ( x i ) i ∈ N B i ∈ N E Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  6. Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments HDP-PCFG: Generating the parse tree and its yield So, given rule probabilities φ , for each syntactic category z consisting of φ T z (rule t ype parameters), φ E z ( e mission parameters), and φ B z ( b inary productions), we can generate a tree and its parse in the following way: For each node i in the parse tree: t i ∼ Mult ( φ T z i ) if t i = Emission , x i ∼ Mult ( φ E z i ) if t i = BinaryProduction , ( z c 1 ( i ) , z c 2 ( i ) ) ∼ Mult ( φ B z i ) Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  7. Introduction Induction: HDP-PCFG Background Bayesian Inference Mathematical Definitions Induction Experiments Focus Refinement: HDP-PCFG-GR Refinement Experiments This Paper’s Focus Traditionally, PCFGs are defined with a fixed, finite S and the parameters φ are fit using smoothed maximum likelihood This paper develops a nonparametric version of the PCFG that allows S to be countably infinite The model then performs posterior inference over S and the set of parse trees to find φ This model is called a Hierarchical Dirichlet Process PCFG (HDP-PCFG), and is described in the next section Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  8. Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments HDP-PCFG: Generating the grammar β ∼ GEM ( α ) For each grammar symbol z ∈ { 1 , 2 , . . . } : φ T z ∼ Dir ( α T ) φ E z ∼ Dir ( α E ) φ B z ∼ DP ( α B , ββ ⊤ ) What do β, φ { T , E , B } , and ββ ⊤ look like? z Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  9. Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments HDP-PCFG: The whole process β ∼ GEM ( α ) For each grammar symbol z ∈ { 1 , 2 , . . . } : φ T z ∼ Dir ( α T ) φ E z ∼ Dir ( α E ) φ B z ∼ DP ( α B , ββ ⊤ ) For each node i in the parse tree: t i ∼ Mult ( φ T z i ) if t i = Emission , x i ∼ Mult ( φ E z i ) if t i = BinaryProduction , ( z c 1 ( i ) , z c 2 ( i ) ) ∼ Mult ( φ B z i ) Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  10. Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments Why is an HDP model advantageous? Allows the complexity of the grammar to grow as more training data is available; a DP prior penalizes the use of more symbols than are supported in the training data . . . which in turn means the level of sophistication of the grammar can adequately match the corpus Can you think of any disadvantages? Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  11. Introduction Induction: HDP-PCFG Bayesian Inference The Model Induction Experiments Discussion Refinement: HDP-PCFG-GR Refinement Experiments Hierarchical Dirichlet Process How is this a Hierarchical DP? How is this related to the HDP-HMM from Thursday? Why not a simpler model: for each symbol z , draw a distribution separately over left children l z ∼ DP ( β ) and right children r z ∼ DP ( β )? Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  12. Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Bayesian Inference for HDP-PCFG The authors chose to use structured mean-field approximation (variational inference with KL-divergence as a dissimilarity function) The random variables of interest are the parameters θ = ( β, φ ), the parse tree z , and the observed yield x Thus the goal is to approximate the posterior p ( θ, z | x ). We want to find a q ( θ, z ) such that argmin KL ( q ( θ, z ) || p ( θ, z | x )) q ∈Q where Q is a tractable subset of distributions. Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  13. Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Bayesian Inference for HDP-PCFG The set of approximate distributions Q are defined to be those that factor as follows: � � K � � � q ( φ T z ) q ( φ E z ) q ( φ B Q = q : q ( β ) z ) q ( z ) z =1 Additionally, other constraints are introduced: q ( β ) is degenerate and truncated q ( φ { T , E , B } ) are Dirichlet distributions z q ( z ) is any multinomial distribution Note that we have a fixed K . How does this affect the approximation? Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  14. Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Coordinate Ascent The optimization problem to find the best q is non-convex They use a coordinate ascent algorithm to find a local optimum Iteratively, 1 Optimize q ( z ), keeping q ( φ ) and q ( β ) fixed 2 Optimize q ( φ ), keeping q ( z ) and q ( β ) fixed 3 Optimize q ( β ), keeping q ( z ) and q ( φ ) fixed Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

  15. Introduction Induction: HDP-PCFG Bayesian Inference Framing the Problem Induction Experiments Coordinate Ascent Refinement: HDP-PCFG-GR Refinement Experiments Prediction We want to parse a new sentence with the induced grammar. The prediction is given by z ∗ new = argmax E p ( θ, z | x ) p ( z new | θ, x new ) z new Sean Massung & Gourab Kundu Probabilistic Grammars and Hierarchical Dirichlet Processes

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend