Structured Perceptron with Inexact Search x x the man bit - PowerPoint PPT Presentation

Structured Perceptron with Inexact Search x x the man bit the dog x the man bit the dog x DT NN VBD DT NN y 那人咬了狗 y y=+ 1 y=- 1 Liang Huang Suphan Fayong Yang Guo Information Sciences Institute University of Southern California NAACL 2012 Montréal June 2012

Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 2

Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification the man bit the dog x DT NN VBD DT NN y 2

Structured Perceptron (Collins 02) binary classification w x x exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 structured classification w exact the man bit the dog x z x update weights inference if y ≠ z y DT NN VBD DT NN y 2

Structured Perceptron (Collins 02) binary classification trivial w x x exact z x constant update weights inference # of classes if y ≠ z y y=+ 1 y=- 1 hard exponential structured classification w # of classes exact the man bit the dog x z x update weights inference if y ≠ z y DT NN VBD DT NN y • challenge: search efficiency (exponentially many classes) • often use dynamic programming (DP) • but still too slow for repeated use, e.g. parsing is O ( n 3 ) • and can’t use non-local features in DP 2

Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search 3

Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) 3

Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? 3

Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? • so far most structured learning theory assume exact search 3

Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? • so far most structured learning theory assume exact search • would search errors break these learning properties? 3

Perceptron w/ Inexact Inference w the man bit the dog x inexact z x update weights inference if y ≠ z DT NN VBD DT NN y y does it still work??? beam search greedy search • routine use of inexact inference in NLP (e.g. beam search) • how does structured perceptron work with inexact search? • so far most structured learning theory assume exact search • would search errors break these learning properties? • if so how to modify learning to accommodate inexact search? 3

Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y 4

Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y • a partial answer: “early update” (Collins & Roark, 2004) • a heuristic for perceptron with greedy or beam search • updates on prefixes rather than full sequences • works much better than standard update in practice, but... 4

Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y • a partial answer: “early update” (Collins & Roark, 2004) • a heuristic for perceptron with greedy or beam search • updates on prefixes rather than full sequences • works much better than standard update in practice, but... • two major problems for early update • there is no theoretical justification -- why does it work? • it learns too slowly (due to partial examples); e.g. 40 epochs 4

Prior work: Early update (Collins/Roark) w greedy z x early update on or beam prefixes y’, z’ y • a partial answer: “early update” (Collins & Roark, 2004) • a heuristic for perceptron with greedy or beam search • updates on prefixes rather than full sequences • works much better than standard update in practice, but... • two major problems for early update • there is no theoretical justification -- why does it work? • it learns too slowly (due to partial examples); e.g. 40 epochs • we’ll solve problems in a much larger framework 4

Our Contributions w greedy z x early update on or beam prefixes y’, z’ y • theory: a framework for perceptron w/ inexact search • explains early update (and others) as a special case • practice: new update methods within the framework • converges faster and better than early update • real impact on state-of-the-art parsing and tagging • more advantageous when search error is severer 5

In this talk... • Motivations: Structured Learning and Search Efficiency • Structured Perceptron and Inexact Search • perceptron does not converge with inexact search • early update (Collins/Roark ’04) seems to help; but why? • New Perceptron Framework for Inexact Search • explains early update as a special case • convergence theory with arbitrarily inexact search • new update methods within this framework • Experiments 6

Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z x x w exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 7

Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z x x w exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 the man bit the dog x DT NN VBD DT NN y 7

Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z x x w exact z x update weights inference if y ≠ z y y=+ 1 y=- 1 w the man bit the dog x exact z x update weights inference if y ≠ z DT NN VBD DT NN y y 7

Structured Perceptron (Collins 02) • simple generalization from binary/multiclass perceptron • online learning: for each example (x, y) in data • inference: find the best output z given current weight w • update weights when if y ≠ z trivial x x w constant exact z x update weights classes inference if y ≠ z y y=+ 1 y=- 1 hard exponential w classes the man bit the dog x exact z x update weights inference if y ≠ z DT NN VBD DT NN y y 7

Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 y 100 x 100 x 111 δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 R: diameter y 100 x 100 x 111 δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 R: diameter y 100 x 100 x 111 δ δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

Convergence with Exact Search • linear classification: converges iff. data is separable • structured: converges iff. data separable & search exact • there is an oracle vector that correctly labels all examples • one vs the rest (correct label better than all incorrect labels) • theorem: if separable, then # of updates ≤ R 2 / δ 2 R: diameter x 100 R: diameter R: diameter y 100 x 100 x 111 δ δ x 3012 x 2000 Rosenblatt => Collins z ≠ y 100 y=- 1 y=+ 1 1957 2002 8

Structured Perceptron with Inexact Search x x the man bit - PowerPoint PPT Presentation

Structured Perceptron with Inexact Search x x the man bit the dog x the man bit the dog x DT NN VBD DT NN y y y=+ 1 y=- 1 Liang Huang Suphan Fayong Yang Guo Information Sciences Institute

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Structured Learning with Inexact Search x x the man bit the dog x the man hit

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

Working with YANG Data Models and Instances Using (Mainly) pyang Ladislav Lhotka 20 July 2014

Diversity of Supernova-Hypernova Properties K. Nomoto (IPMU, U. Tokyo) Diversity &

JST-CREST Extreme Big Data Project (2013-2018) Future Non-Silo Extreme Big Data Scientific

Entropy Rate Estimation for Markov Chains with Large State Space Yanjun Han Stanford EE Jiantao

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Weighted SGD for p Regression with Randomized Preconditioning Jiyan Yang Stanford University

ELECTROWEAK PHASE TRANSITION S AND HIGGS COUPLINGS Patrick Meade C.N. Yang Institute for

Structured Perceptron with Inexact Search x x the man bit - PowerPoint PPT Presentation

Structured Perceptron with Inexact Search x x the man bit the dog x the man bit the dog x DT NN VBD DT NN y y y=+ 1 y=- 1 Liang Huang Suphan Fayong Yang Guo Information Sciences Institute

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Structured Learning with Inexact Search x x the man bit the dog x the man hit

Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random

Yang Yang MICHIGAN TECH Yang Yang , yyang7@mtu.edu RESEARCH FORUM TECHTALKS Current research

Working with YANG Data Models and Instances Using (Mainly) pyang Ladislav Lhotka 20 July 2014

Diversity of Supernova-Hypernova Properties K. Nomoto (IPMU, U. Tokyo) Diversity &amp;

JST-CREST Extreme Big Data Project (2013-2018) Future Non-Silo Extreme Big Data Scientific

Entropy Rate Estimation for Markov Chains with Large State Space Yanjun Han Stanford EE Jiantao

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Weighted SGD for p Regression with Randomized Preconditioning Jiyan Yang Stanford University

ELECTROWEAK PHASE TRANSITION S AND HIGGS COUPLINGS Patrick Meade C.N. Yang Institute for

Diversity of Supernova-Hypernova Properties K. Nomoto (IPMU, U. Tokyo) Diversity &