lti
play

lti Is convex Perceptron Boosting Max-Margin Conditional - PowerPoint PPT Presentation

Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference


  1. Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti

  2. Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference Minimum Error Latent Variable Rate Training Conditional Risk Likelihood lti

  3. Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Minimum Error Latent Variable Rate Training Conditional Risk Likelihood lti

  4. Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Jensen Minimum Error Risk Bound Latent Variable Rate Training Conditional Risk Likelihood lti

  5. Linear Models for Structured Prediction input output θ ⊤ f � �� � � � � ������ � � ∈ � � � � weights features � For probabilistic interpretation, exponentiate and normalize: ��� { θ ⊤ f � �� � � } � θ � � | � � � � � ′ ∈ � � � � ��� { θ ⊤ f � �� � ′ � } lti

  6. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� task-specific cost function lti

  7. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� cost-augmented decoding lti

  8. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� � Softmax-margin: replace “max” with “softmax”   � � � θ ⊤ f � � � � � � � � � ����� � � � � � � �  − θ ⊤ f � � � � � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � “cost-augmented summing” lti

  9. Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � ����� � � � � � � � θ ⊤ f � � � � � � � � ��� ��� θ � ∈ � � � � � � � � �� � Softmax-margin: replace “max” with “softmax”   � � � θ ⊤ f � � � � � � � � � ����� � � � � � � �  − θ ⊤ f � � � � � � � � � � � �  ��� ��� { } ��� θ � �� � ∈ � � � � � � � Sha and Saul (2006), Povey et al. (2008) lti

  10. Properties of Softmax-Margin � Has a probabilistic interpretation in the minimum divergence framework (Jelinek, 1997) � Details in technical report � Is a bound on: � Max-margin � Conditional likelihood � Risk lti

  11. Properties of Softmax-Margin � Has a probabilistic interpretation in the minimum divergence framework (Jelinek, 1997) � Details in technical report � Is a bound on: � Max-margin (because “softmax” bounds “max”) � � � � � Conditional likelihood � Risk lti

  12. Risk? � Risk is the expected value of the cost function (Smith and Eisner, 2006; Li and Eisner, 2009): � � � � θ � �| � � � � � ������ � � � � � � �� ��� θ � �� lti

  13. Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � ��� ��� { θ ⊤ f � � � � � � � � � ����� � � � � � � � }  � �� � ∈ � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � ��� � � ��� � � � ���� { ����� � � � � � � � } � � � � �� � �� Conditional likelihood Bound on risk via Jensen’s inequality lti

  14. Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � ��� ��� { θ ⊤ f � � � � � � � � � ����� � � � � � � � }  � �� � ∈ � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � ��� � � ��� � � � ���� { ����� � � � � � � � } � � � � �� � �� Conditional likelihood Bound on risk via Jensen’s inequality Softmax-margin is a convex bound on max-margin, conditional likelihood, and risk lti

  15. Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � ��� ��� { θ ⊤ f � � � � � � � � � ����� � � � � � � � }  � �� � ∈ � � � � � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � ��� � � − θ ⊤ f � � � � � � � � � � � � ��� � � ��� � � � ���� { ����� � � � � � � � } � � � � � � �� � �� � �� Bound on risk via Conditional likelihood Jensen Risk Bound Jensen’s inequality Easier to optimize than risk (cf. Li and Eisner, 2009) lti

  16. Implementation � Conditional likelihood → Softmax-margin � If cost function factors the same way as the features, it’s easy: � Add additional features for the cost function � Keep their weights fixed � If not, use a simpler cost function or use approximate inference lti

  17. Experiments � English named-entity recognition (CoNLL 2003) � Compared softmax-margin and Jensen risk bound with five baselines: � Perceptron (Collins, 2002) � 1-best MIRA with cost-augmented decoding (Crammer et al., 2006) � Max-margin via subgradient descent (Ratliff et al., 2006) � Conditional likelihood (Lafferty et al., 2001) � Risk (Xiong et al., 2009) � For risk and Jensen risk bound, initialized using output of conditional likelihood training � Used Hamming cost for cost function lti

  18. Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Risk 85.59* Jensen Risk Bound 85.65* Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

  19. Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Significant improvement with Risk 85.59* equal training time and Jensen Risk Bound 85.65* implementation difficulty Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

  20. Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Comparable Risk 85.59* performance with half the Jensen Risk Bound 85.65* training time Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

  21. Is convex Perceptron Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Jensen Risk Bound Risk lti

  22. Softmax-Margin MIRA Jensen Risk Bound Risk Performance Conditional Likelihood Max-Margin Perceptron Time lti

  23. (Cost-Augmented) (Cost-Augmented) Decoding Decoding Expectations Expectations of Products Softmax-Margin of Products MIRA Jensen Risk Bound Risk Performance Conditional Likelihood Max-Margin (Cost-Augmented) (Cost-Augmented) Summing Summing Perceptron Time lti

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend