Structured Perceptron/ Margin Methods Graham Neubig Site - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Structured Perceptron/ Margin Methods Graham Neubig Site https://phontron.com/class/nn4nlp2020/

Types of Prediction • Two classes ( binary classification ) positive I hate this movie negative • Multiple classes ( multi-class classification ) very good good I hate this movie neutral bad very bad • Exponential/infinite labels ( structured prediction ) I hate this movie PRP VBP DT NN I hate this movie kono eiga ga kirai

Many Varieties of Structured Prediction! • Models: • RNN-based decoders Covered • Convolution/self attentional decoders already • CRFs w/ local factors • Training algorithms: • Maximum likelihood w/ teacher forcing • Sequence level likelihood Covered • Structured perceptron, structured large margin today • Reinforcement learning/minimum risk training • Sampling corruptions of data

<latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> Reminder: Globally Normalized Models • Locally normalized models: each decision made by the model has a probability that adds to one | Y | e S ( y j | X,y 1 ,...,y j − 1 ) Y P ( Y | X ) = y j ∈ V e S (˜ y j | X,y 1 ,...,y j − 1 ) P ˜ j =1 • Globally normalized models (a.k.a. energy- based models): each sentence has a score, which is not normalized over a particular decision e S ( X,Y ) P ( Y | X ) = Y ∈ V ∗ e S ( X, ˜ Y ) P ˜

Globally Normalized Likelihood

<latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> Difficulties Training Globally Normalized Models • Partition function problematic e S ( X,Y ) P ( Y | X ) = Y ∈ V ∗ e S ( X, ˜ Y ) P ˜ • Two options for calculating partition function • Structure model to allow enumeration via dynamic programming, e.g. linear chain CRF, CFG • Estimate partition function through sub-sampling hypothesis space

Two Methods for Approximation • Sampling: • Sample k samples according to the probability distribution • + Unbiased estimator: as k gets large will approach true distribution • - High variance: what if we get low-probability samples? • Beam search: • Search for k best hypotheses • - Biased estimator: may result in systematic differences from true distribution • + Lower variance: more likely to get high-probability outputs

Un-normalized Models: Structured Perceptron

Structured Perceptron/ Margin Methods Graham Neubig Site - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Structured Perceptron/ Margin Methods Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Types of Prediction Two classes ( binary classification ) positive I hate this movie negative

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Large Margin Classification Using the Perceptron Algorithm (Part 2) Henry Tan Georgetown

ECON 950 Winter 2020 Prof. James MacKinnon 12. Support Vector Machines These notes are based

f able : Estimation of marginal effects with transformed covariates Taking Margins a step further

Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder

CSC 411 Lecture 9: SVMs and Boosting Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun

Luke 10:38-42 Martha Mary & Margin OVERCOMING OVERLOAD Mary FOCUSED RELAXED UNCONCERNED

THIRD QUARTER 2017 REVIEW November 1, 2017 w w w . w e s t e r n g a s . c o m | N Y S E : W E

ECO 199 GAMES OF STRATEGY Spring Term 2004 April 15 COLLECTIVE ACTION GAMES TWO GENERAL

Sambuz

Useful Links

Newsletter

Mail Us

Structured Perceptron/ Margin Methods Graham Neubig Site - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Structured Perceptron/ Margin Methods Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Types of Prediction Two classes ( binary classification ) positive I hate this movie negative

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science

Sequential Data Modeling - The Structured Perceptron Graham Neubig Nara Institute of Science and

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Large Margin Classification Using the Perceptron Algorithm (Part 2) Henry Tan Georgetown

ECON 950 Winter 2020 Prof. James MacKinnon 12. Support Vector Machines These notes are based

f able : Estimation of marginal effects with transformed covariates Taking Margins a step further

Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder

CSC 411 Lecture 9: SVMs and Boosting Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun

Luke 10:38-42 Martha Mary &amp; Margin OVERCOMING OVERLOAD Mary FOCUSED RELAXED UNCONCERNED

THIRD QUARTER 2017 REVIEW November 1, 2017 w w w . w e s t e r n g a s . c o m | N Y S E : W E

ECO 199 GAMES OF STRATEGY Spring Term 2004 April 15 COLLECTIVE ACTION GAMES TWO GENERAL

Sambuz

Useful Links

Newsletter

Mail Us

Luke 10:38-42 Martha Mary & Margin OVERCOMING OVERLOAD Mary FOCUSED RELAXED UNCONCERNED