structured perceptron margin methods
play

Structured Perceptron/ Margin Methods Graham Neubig Site - PowerPoint PPT Presentation

CS11-747 Neural Networks for NLP Structured Perceptron/ Margin Methods Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Types of Prediction Two classes ( binary classification ) positive I hate this movie negative


  1. CS11-747 Neural Networks for NLP Structured Perceptron/ Margin Methods Graham Neubig Site https://phontron.com/class/nn4nlp2020/

  2. Types of Prediction • Two classes ( binary classification ) positive I hate this movie negative • Multiple classes ( multi-class classification ) very good good I hate this movie neutral bad very bad • Exponential/infinite labels ( structured prediction ) I hate this movie PRP VBP DT NN I hate this movie kono eiga ga kirai

  3. Many Varieties of Structured Prediction! • Models: • RNN-based decoders Covered • Convolution/self attentional decoders already • CRFs w/ local factors • Training algorithms: • Maximum likelihood w/ teacher forcing • Sequence level likelihood Covered • Structured perceptron, structured large margin today • Reinforcement learning/minimum risk training • Sampling corruptions of data

  4. <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> Reminder: Globally Normalized Models • Locally normalized models: each decision made by the model has a probability that adds to one | Y | e S ( y j | X,y 1 ,...,y j − 1 ) Y P ( Y | X ) = y j ∈ V e S (˜ y j | X,y 1 ,...,y j − 1 ) P ˜ j =1 • Globally normalized models (a.k.a. energy- based models): each sentence has a score, which is not normalized over a particular decision e S ( X,Y ) P ( Y | X ) = Y ∈ V ∗ e S ( X, ˜ Y ) P ˜

  5. Globally Normalized Likelihood

  6. <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> <latexit sha1_base64="OZ0fwiGra8uyR0OpSCMUh2f+CTQ=">ACNnicbVBSxtBGJ3VamO0GvXoZWgQEilhV4S2B0H04klS2piEbAyzs9/qkJnZWZWCMP+Ky/+DW968aDitT+hkxikTfpg4PHe+/jme1HGmTa+f+8tLH5YWv5YWimvrn1a36hsbp3rNFcUWjTlqepERANnElqGQ6dTAEREYd2NDwZ+1rUJql8pcZdAX5FKyhFinDSonDVrXRwKFuNOHR/iMFGEWriw+Get8wV360VhQ52LgQ0N4zHYboFDJvH5XvEendceFCp+g1/AjxPgimpoimag8pdGKc0FyAN5UTrXuBnpm+JMoxyKMphriEjdEguoeoJAJ0307uLvCuU2KcpMo9afBE/XvCEqH1SEQuKYi50rPeWPyf18tN8q1vmcxyA5K+LUpyjk2KxyXimCmgho8cIVQx91dMr4irzriqy6EYPbkedLab3xvBD8OqkfH0zZKaAd9RjUoK/oCJ2iJmohim7QA3pCz96t9+i9eK9v0QVvOrON/oH3+w9wk6ow</latexit> Difficulties Training Globally Normalized Models • Partition function problematic e S ( X,Y ) P ( Y | X ) = Y ∈ V ∗ e S ( X, ˜ Y ) P ˜ • Two options for calculating partition function • Structure model to allow enumeration via dynamic programming, e.g. linear chain CRF, CFG • Estimate partition function through sub-sampling hypothesis space

  7. Two Methods for Approximation • Sampling: • Sample k samples according to the probability distribution • + Unbiased estimator: as k gets large will approach true distribution • - High variance: what if we get low-probability samples? • Beam search: • Search for k best hypotheses • - Biased estimator: may result in systematic differences from true distribution • + Lower variance: more likely to get high-probability outputs

  8. Un-normalized Models: Structured Perceptron

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend