Variational Inference for Tutorial Outline Structured NLP Models - PDF document

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor Graphs 2. Mean Field 3. Structured Mean Field 4. Belief Propagation 5. Structured Belief Propagation ACL, August 4, 2013 David Burkett and Dan Klein 6. Wrap-Up Part 1: Structured Models and Structured NLP Models Factor Graphs Example: Hidden Markov Model (Sample Application: Part of Speech Tagging) Outputs (POS tags) Inputs (words) Goal: Queries from posterior Structured NLP Models Structured NLP Models Example: Hidden Markov Model Example: Hidden Markov Model

Structured NLP Models Structured NLP Models Example: Hidden Markov Model Example: Hidden Markov Model Structured NLP Models Structured NLP Models Example: Hidden Markov Model Example: Hidden Markov Model Factor Graph Notation Factor Graph Notation Variables Y i Variables Y i Factors Factors Cliques Cliques Binary Factor Unary Factor

(Lafferty et al., 2001) Factor Graph Notation Structured NLP Models Example: Conditional Random Field (Sample Application: Named Entity Recognition) Variables Y i Factors Cliques Variables have factor (clique) neighbors: Factors have variable neighbors: Structured NLP Models Structured NLP Models Example: Conditional Random Field Example: Conditional Random Field Structured NLP Models Structured NLP Models Example: Edge-Factored Dependency Parsing Example: Edge-Factored Dependency Parsing L O O O O L O O O O O L L L L L O O O L the cat ate the rat the cat ate the rat (McDonald et al., 2005)

Structured NLP Models Structured NLP Models Example: Edge-Factored Dependency Parsing Example: Edge-Factored Dependency Parsing O O O O O O L O R O O R O L R O L L O L rat rat the cat ate the the cat ate the Structured NLP Models Inference Example: Edge-Factored Dependency Parsing Input: Factor Graph L L L L L L Output: Marginals L L L L Inference Complex Structured Models Typical NLP Approach: Dynamic Programs! POS Tagging Joint Examples: Sequence Models (Forward/Backward) Named Entity Phrase Structure Parsing (CKY, Inside/Outside) Recognition Dependency Parsing (Eisner algorithm) ITG Parsing (Bitext Inside/Outside) (Sutton et al., 2004)

Complex Structured Models Complex Structured Models Word Alignment Dependency Parsing with Second Order Features vi el gato frío (McDonald & Pereira, 2006) I saw the cold cat (Carreras, 2007) (Taskar et al., 2005) Complex Structured Models Variational Inference Word Alignment Approximate inference techniques that can be applied to any graphical model This tutorial: vi Mean Field: Approximate the joint distribution el with a product of marginals Belief Propagation: Apply tree inference gato algorithms even if your graph isn’t a tree Structure: What changes when your factor graph frío has tractable substructures I saw the cold cat Mean Field Warmup Part 2: Mean Field Wanted: Idea: coordinate ascent Key object: assignments Iterated Conditional Modes (Besag, 1986)

Mean Field Warmup Mean Field Warmup Wanted: Wanted: Mean Field Warmup Mean Field Warmup Wanted: Wanted: Iterated Conditional Modes Mean Field Warmup Example Wanted: Approximate Result:

Iterated Conditional Modes Iterated Conditional Modes Example Example Iterated Conditional Modes Iterated Conditional Modes Example Example Iterated Conditional Modes Iterated Conditional Modes Example Example

Iterated Conditional Modes Mean Field Intro Example Mean Field is coordinate ascent, just like Iterated Conditional Modes, but with soft assignments to each variable! Mean Field Intro Mean Field Intro Wanted: Idea: coordinate ascent Key object: (approx) marginals Mean Field Intro Mean Field Intro Wanted:

Mean Field Intro Mean Field Intro Wanted: Wanted: Mean Field Procedure Mean Field Procedure Wanted: Wanted: Mean Field Procedure Mean Field Procedure Wanted: Wanted:

Example Results Mean Field Derivation Goal: Approximation: Constraint: Objective: Procedure: Coordinate ascent on each What’s the update? Mean Field Update Approximate Expectations f 1) Y i 2) 3-9) Lots of algebra 10) General: General Update * Mean Field Inference Example Exponential Family: 1 1 1 2 5 1 2 1 .4 .6 .5 .5 Generic: .7 .2 .5 .5 .5 .3 .2 .1

Mean Field Inference Example Mean Field Inference Example 1 1 1 1 1 2 5 1 2 5 1 2 1 1 2 1 .4 .6 .5 .5 .4 .6 .5 .5 .7 .2 .5 .69 .7 .2 .5 .69 .3 .2 .1 .31 .3 .2 .1 .31 Mean Field Inference Example Mean Field Inference Example 1 1 1 1 1 2 5 1 2 5 1 2 1 1 2 1 .4 .6 .40 .60 .4 .6 .40 .60 .7 .2 .5 .69 .7 .2 .5 .73 .3 .2 .1 .31 .3 .2 .1 .27 Mean Field Inference Example Mean Field Inference Example 1 1 1 1 1 2 5 1 2 5 1 2 1 1 2 1 .4 .6 .38 .62 .4 .6 .38 .62 .7 .2 .5 .73 .7 .2 .5 .73 .28 .45 .27 .27 .10 .17 .3 .2 .1 .3 .2 .1

Mean Field Inference Example Mean Field Inference Example 2 1 1 1 2 1 1 1 9 1 1 1 1 1 1 5 .67 .33 .67 .33 .62 .38 .82 .18 .67 .44 .22 .67 .44 .22 .62 .56 .06 .82 .67 .15 .33 .22 .11 .33 .22 .11 .38 .06 .31 .18 .15 .03 Mean Field Q&A Why Only Local Optima?! Are the marginals guaranteed to converge to Variables: the right thing, like in sampling? No Discrete distributions: Is the algorithm at least guaranteed to e.g. P(0,1,0,…0) = 1 converge to something? All distributions Yes (all convex combos) So it’s just like EM? Mean field approximable Yes (can represent all discrete ones, but not all) Mean Field Approximation Part 3: Structured Mean Field Model: Approximate Graph: … … … … … … … …

Structured Mean Field Structured Mean Field Approximation Approximation Model: Approximate Graph: … … … … … … … … (Xing et al, 2003) Structured Mean Field Structured Mean Field Approximation Approximation Computing Structured Updates Computing Structured Updates Marginal probability of under . ?? Updating . Computed with consists of computing forward-backward all marginals .

Structured Mean Field Notation Structured Mean Field Notation Structured Mean Field Notation Structured Mean Field Notation Structured Mean Field Notation Structured Mean Field Notation Neighbors: Connected Components

Structured Mean Field Updates Expected Feature Counts Naïve Mean Field: Structured Mean Field: Component Factorizability * Component Factorizability * (Abridged) Condition Example Feature Generic Condition Use conjunctive indicator features (pointwise product) Joint Parsing and Alignment Joint Parsing and Alignment Input: Sentences project project 产品产品 and and 、、 product product 项目项目 of of 水平水平 levels levels 高高 High High (Burkett et al, 2010)

Joint Parsing and Alignment Joint Parsing and Alignment Alignments Output: Trees contain Output: Nodes project project 产品产品 and and 、、 product product 项目项目 of of 水平水平 levels levels 高高 High High Joint Parsing and Alignment Joint Parsing and Alignment Alignments Output: Output: contain Bispans project project 产品产品 and and 、、 product product 项目项目 of of 水平水平 levels levels 高高 High High Joint Parsing and Alignment Joint Parsing and Alignment Variables Variables project project 产品产品 and and 、、 product product 项目项目 of of 水平水平 levels levels 高高 High High

Joint Parsing and Alignment Joint Parsing and Alignment Variables Factors project project 产品产品 and and 、、 product product 项目项目 of of 水平水平 levels levels 高高 High High Joint Parsing and Alignment Joint Parsing and Alignment Factors Factors project project 产品产品 and and 、、 product product 项目项目 of of 水平水平 levels levels 高高 High High Joint Parsing and Alignment Notational Abuse Factors Subscript Omission: Shorthand: project 产品 and 、 product 项目 of 水平 levels 高 Skip Nonexistent Substructures: High Structural factors are implicit

Model Form Training Expected Feature Marginals Counts Structured Mean Field Approximate Component Scores Approximation Monolingual parser: Score for If we knew : Score for To compute : Score for Expected Feature Counts Inference Procedure Initialize: For fixed : Marginals computed Marginals computed with bitext inside-outside with inside-outside

Inference Procedure Approximate Marginals Iterate marginal updates: …until convergence! Decoding Structured Mean Field Summary Split the model into pieces you have dynamic programs for Substitute expected feature counts for actual counts in cross-component factors Iterate computing marginals until convergence (Minimum Risk) Structured Mean Field Tips Break Time! Try to make sure cross-component features are products of indicators You don’t have to run all the way to convergence; marginals are usually pretty good after just a few rounds Recompute marginals for fast components more frequently than for slow ones e.g. For joint parsing and alignment, the two monolingual tree marginals ( ) were updated until convergence between each update of the ITG marginals ( )

Variational Inference for Tutorial Outline Structured NLP Models - PDF document

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor Graphs 2. Mean Field 3. Structured Mean Field 4. Belief Propagation 5. Structured Belief Propagation ACL, August 4, 2013 David Burkett and Dan

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Freedom of Associa.on and Internet Infrastructure dra3-tenoever-hrpc-associa.on-01 Gisela Prez

REJECTED! What has God done? Life And Beauty Claim My Freedom He Is To Blame What To Do With

Here are the songs we sang this Sunday. This shows the song name, the artist who performed the

X-Phi Meets A-Phi David Chalmers What are the Targets? n What is experimental philosophy? n

Resource Allocation Task Force report Magorzata Krakowian 1 www.egi.eu www.egi.eu

TAKE ACTION: CHOOSE EFFECTIVE POLICIES & PROGRAMS IN DEPTH Sarah Kile Kate Konkle Karen

Kingdom Economics with Jesus Mark 12:13 17 By Paul Lamey Kingdom Economics with Jesus (Mark

Variational Inference for Tutorial Outline Structured NLP Models - PDF document

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor Graphs 2. Mean Field 3. Structured Mean Field 4. Belief Propagation 5. Structured Belief Propagation ACL, August 4, 2013 David Burkett and Dan

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Deep Variational Inference FLARE Reading Group Presentation Wesley Tansey 9/28/2016 What is

Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28

Variational Auto-encoders 2 VARIATIONAL AUTO-ENCODERS INTRODUCTION VARIATIONAL AUTO-ENCODERS

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Rejection Sampling Variational Inference Karan Grewal CSC2547 / STA4273 Overview Variational

CS480/680 Machine Learning Lecture 11: February 11 th , 2020 Variational Inference Zahra

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Tutorial Tutorial A2 is out, its called Inpainting Tutorial Tutorial A2 is out, its called

NLP Programming Tutorial 0 - Programming Basics Graham Neubig Nara Institute of Science and

An Introduction to An Introduction to Variational Variational Methods for Graphical Models

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Lecture Variational 13 Inference Panini Kaushal Scribes : - Margulies Smedeuranh Niklas

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. Probabilistic latent variable

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Freedom of Associa.on and Internet Infrastructure dra3-tenoever-hrpc-associa.on-01 Gisela Prez

REJECTED! What has God done? Life And Beauty Claim My Freedom He Is To Blame What To Do With

Here are the songs we sang this Sunday. This shows the song name, the artist who performed the

X-Phi Meets A-Phi David Chalmers What are the Targets? n What is experimental philosophy? n

Resource Allocation Task Force report Magorzata Krakowian 1 www.egi.eu www.egi.eu

TAKE ACTION: CHOOSE EFFECTIVE POLICIES &amp; PROGRAMS IN DEPTH Sarah Kile Kate Konkle Karen

Kingdom Economics with Jesus Mark 12:13 17 By Paul Lamey Kingdom Economics with Jesus (Mark

TAKE ACTION: CHOOSE EFFECTIVE POLICIES & PROGRAMS IN DEPTH Sarah Kile Kate Konkle Karen