Discriminative L earning over C onstrained L atent R epresentations - PowerPoint PPT Presentation

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan Goldwasser, Dan Roth and Vivek Srikumar Computer Science Department, University of Illinois at Urbana-Champaign Page. 1/27

An one minute version of the talk What we did Provide a general recipe for many important NLP problems Our algorithm: L earning over C onstrained L atent R epresentations Page. 2/27

An one minute version of the talk What we did Provide a general recipe for many important NLP problems Our algorithm: L earning over C onstrained L atent R epresentations Example NLP problems Transliteration (Klementiev and Roth 2008), Textual entailment (RTE) (Dagan, Glickman, and Magnini 2006) Paraphrase identification (Dolan, Quirk, and Brockett 2004) Question Answering, and many more! Page. 2/27

An one minute version of the talk What we did Provide a general recipe for many important NLP problems Our algorithm: L earning over C onstrained L atent R epresentations Example NLP problems Transliteration (Klementiev and Roth 2008), Textual entailment (RTE) (Dagan, Glickman, and Magnini 2006) Paraphrase identification (Dolan, Quirk, and Brockett 2004) Question Answering, and many more! Problems of Interests Binary classification tasks that require an intermediate representation Page. 2/27

Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said face Alan murder will charges be , charged Bob with said murder Page. 3/27

Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan murder will charges be , charged Bob with said murder Page. 3/27

Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will charges be , charged Bob with said murder Page. 3/27

Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Bob with said murder Page. 3/27

Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Just an example; the real intermediate Bob with representation is more complicated said murder Page. 3/27

Example task: Paraphrase Identification Yes/NO Alan Bob Q: Are sentence 1 and sentence 2 paraphrases of each other? will said Yes, but why? face Alan They carry the same information! murder will Justifying the decision requires an charges be intermediate representation , charged Just an example; the real intermediate Bob with representation is more complicated said murder Problem of interests Binary output problem: y ∈ {− 1 , 1 } Intermediate representation: h Some structure that justifies the positive label The intermediate representation is latent (not present in the data) Page. 3/27

Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Page. 4/27

Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Stage 2: Classification based on the intermediate representation Extract features using the fixed representation and learn: Φ( X , H ) → Y Page. 4/27

Limitations of existing approaches: two-stage approach Most systems: a two-stage approach Stage 1: Generate the intermediate representation Obtain intermediate representation → Fix it (ignore the second stage) ! X → H Stage 2: Classification based on the intermediate representation Extract features using the fixed representation and learn: Φ( X , H ) → Y Problem: the intermediate representation ignores the binary task Page. 4/27

Limitations of existing approaches: inference Observation: decisions on intermediate representation are interdependent Alan Bob will said face Alan murder will charges be , charged Bob with said murder Page. 5/27

Limitations of existing approaches: inference Observation: decisions on intermediate representation are interdependent Alan Bob will said face Alan murder will charges be , charged Bob with said murder Many frameworks use custom designed inference procedures Difficult to add linguistic intuition/constraints on the intermediate representation Difficult to generalize to other tasks Page. 5/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y input Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y intermediate rep- input resentation Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y intermediate rep- input features resentation Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels Φ( X , H ) X H Y intermediate rep- input binary label features resentation Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels feedback Φ( X , H ) X H Y intermediate rep- input binary label features resentation Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels feedback Φ( X , H ) X H Y intermediate rep- input binary label features resentation Find an intermediate representation that helps the binary task Page. 6/27

Learning Constrained Latent Representation (LCLR) Property 1: Jointly learn intermediate representations and labels feedback Φ( X , H ) X H Y intermediate rep- input binary label features resentation Find an intermediate representation that helps the binary task Property 2: Constraint-based inference for the intermediate representation Uses integer linear programming on latent variables Easy to inject constraints on latent variables Easy to generalize to other tasks Page. 6/27

Outline Motivation and Contribution 1 Property 1: Jointly learn intermediate representations and labels 2 Property 2: Constraint-based inference for the intermediate 3 representation LCLR: Putting Everything Together 4 Experiments 5 Page. 7/27

Outline Motivation and Contribution 1 Property 1: Jointly learn intermediate representations and labels 2 Property 2: Constraint-based inference for the intermediate 3 representation LCLR: Putting Everything Together 4 Experiments 5 Page. 8/27

The intuition behind the joint approach Yes/NO Alan Bob will said face Alan murder will charges be , charged Bob with said murder Page. 9/27

The intuition behind the joint approach Yes/NO intermediate representation ⇔ { 1 , − 1 } Alan Bob Only positive examples have good will said intermediate representations face Alan No negative example has a good intermediate murder will representation charges be , charged Bob with said murder Page. 9/27

Discriminative L earning over C onstrained L atent R epresentations - PowerPoint PPT Presentation

Discriminative L earning over C onstrained L atent R epresentations Ming-Wei Chang , Dan Goldwasser, Dan Roth and Vivek Srikumar Computer Science Department, University of Illinois at Urbana-Champaign Page. 1/27 An one minute version of the talk

O PEN L EARNING S EMINAR P ROVISIONING @ U NISA P RINCIPLES OF O PEN L EARNING O PEN L EARNING P

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

01 Meet ALICE ALICE A sset L imited I ncome C onstrained E mployed ALICE How we learned

P rediction of U nderlying L atent C lasses via K -means and H ierarchical C lustering A lgorithm

aug ( h ) = E in ( h ) + onstrained unonstrained : heuristi smo oth, simple h

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

L EARNING COGNITIVE TASKS ( CURRICULUM ): N OT MY FIRST CHAIR L EARNING ABOUT OBJECTS

Virtual Memory 1 L earning to Play Well With Others (Physical) Memory malloc(0x20000) 0x10000

Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu

Dynamic Re-ordering in Mining Top- k Productive Discriminative Patterns Yoshitaka Kameya * and

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

NLP Programming Tutorial 6 - Advanced Discriminative Learning Graham Neubig Nara Institute of

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10 Patrick Simianer, Gesa

CS7015 (Deep Learning) : Lecture 19 Using joint distributions for classification and sampling,

About generative aspects of Variational Autoencoders LOD19 The Fifth International Conference

User Recommendation in Content Curation Platforms Jianling Wang, Ziwei Zhu and James Caverlee

Latent Variables and Real-Time Forecasting in DSGE Models with Occasionally Binding

Representing Documents via Latent Keyphrase Inference April. 15 th , 2016 Document Representation

Discriminating Languages in a Probabilistic Latent Subspace Aleksandr Sizov , Kong Aik Lee, Tomi

Video Synthesis from the StyleGAN Latent Space Advisor Dr. Chris Pollett Committee Members By

Event Generation and Statistical Sampling with Deep Generative Models Rob Verheyen Introduction