lti Is convex Perceptron Boosting Max-Margin Conditional - PowerPoint PPT Presentation

Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti

Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference Minimum Error Latent Variable Rate Training Conditional Risk Likelihood lti

Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Minimum Error Latent Variable Rate Training Conditional Risk Likelihood lti

Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Jensen Minimum Error Risk Bound Latent Variable Rate Training Conditional Risk Likelihood lti

Linear Models for Structured Prediction input output θ ⊤ f � �� ∈ � � � � weights features � For probabilistic interpretation, exponentiate and normalize: �� { θ ⊤ f � �� } � θ � � | � � � � � ′ ∈ � � � � �� { θ ⊤ f � �� ′ � } lti

Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  �� { } �� θ � �� ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � �� θ ⊤ f � � � � � � � � �� θ � ∈ � � � � � � � � �� task-specific cost function lti

Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  �� { } �� θ � �� ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � �� θ ⊤ f � � � � � � � � �� θ � ∈ � � � � � � � � �� cost-augmented decoding lti

Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  �� { } �� θ � �� ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � �� θ ⊤ f � � � � � � � � �� θ � ∈ � � � � � � � � �� Softmax-margin: replace “max” with “softmax”   � � � θ ⊤ f � � � � � � � � � ��  − θ ⊤ f � � � � � � � � � � � �  �� { } �� θ � �� ∈ � � � � � � � “cost-augmented summing” lti

Training � Standard approach is to maximize conditional likelihood:   � � �  − θ ⊤ f � � � � � � � � � � � � θ ⊤ f � � � � � � � �  �� { } �� θ � �� ∈ � � � � � � � � Another approach maximizes margin (Taskar et al., 2003): � � � � � � − θ ⊤ f � � � � � � � � � � � � � �� θ ⊤ f � � � � � � � � �� θ � ∈ � � � � � � � � �� Softmax-margin: replace “max” with “softmax”   � � � θ ⊤ f � � � � � � � � � ��  − θ ⊤ f � � � � � � � � � � � �  �� { } �� θ � �� ∈ � � � � � � � Sha and Saul (2006), Povey et al. (2008) lti

Properties of Softmax-Margin � Has a probabilistic interpretation in the minimum divergence framework (Jelinek, 1997) � Details in technical report � Is a bound on: � Max-margin � Conditional likelihood � Risk lti

Properties of Softmax-Margin � Has a probabilistic interpretation in the minimum divergence framework (Jelinek, 1997) � Details in technical report � Is a bound on: � Max-margin (because “softmax” bounds “max”) � � � � � Conditional likelihood � Risk lti

Risk? � Risk is the expected value of the cost function (Smith and Eisner, 2006; Li and Eisner, 2009): � � � � θ � �| � � � � � �� θ � �� lti

Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � �� { θ ⊤ f � � � � � � � � � �� }  � �� ∈ � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � �� { �� } � � � � �� Conditional likelihood Bound on risk via Jensen’s inequality lti

Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � �� { θ ⊤ f � � � � � � � � � �� }  � �� ∈ � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � �� { �� } � � � � �� Conditional likelihood Bound on risk via Jensen’s inequality Softmax-margin is a convex bound on max-margin, conditional likelihood, and risk lti

Bounding Conditional Likelihood and Risk � Softmax-margin:   � � �  − θ ⊤ f � � � � � � � � � � � � �� { θ ⊤ f � � � � � � � � � �� }  � �� ∈ � � � � � � � � � � � � � � � � � − θ ⊤ f � � � � � � � � � � � � �� − θ ⊤ f � � � � � � � � � � � � �� { �� } � � � � � � �� Bound on risk via Conditional likelihood Jensen Risk Bound Jensen’s inequality Easier to optimize than risk (cf. Li and Eisner, 2009) lti

Implementation � Conditional likelihood → Softmax-margin � If cost function factors the same way as the features, it’s easy: � Add additional features for the cost function � Keep their weights fixed � If not, use a simpler cost function or use approximate inference lti

Experiments � English named-entity recognition (CoNLL 2003) � Compared softmax-margin and Jensen risk bound with five baselines: � Perceptron (Collins, 2002) � 1-best MIRA with cost-augmented decoding (Crammer et al., 2006) � Max-margin via subgradient descent (Ratliff et al., 2006) � Conditional likelihood (Lafferty et al., 2001) � Risk (Xiong et al., 2009) � For risk and Jensen risk bound, initialized using output of conditional likelihood training � Used Hamming cost for cost function lti

Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Risk 85.59* Jensen Risk Bound 85.65* Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Significant improvement with Risk 85.59* equal training time and Jensen Risk Bound 85.65* implementation difficulty Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

Results Method Test F 1 Perceptron 83.98* MIRA 85.72* Max-Margin 85.28* Conditional Likelihood 85.46* Comparable Risk 85.59* performance with half the Jensen Risk Bound 85.65* training time Softmax-Margin 85.84* * Indicates significance (compared with softmax-margin) lti

Is convex Perceptron Max-Margin Conditional Likelihood MIRA Softmax-Margin Based on Uses a cost probabilistic function inference Jensen Risk Bound Risk lti

Softmax-Margin MIRA Jensen Risk Bound Risk Performance Conditional Likelihood Max-Margin Perceptron Time lti

(Cost-Augmented) (Cost-Augmented) Decoding Decoding Expectations Expectations of Products Softmax-Margin of Products MIRA Jensen Risk Bound Risk Performance Conditional Likelihood Max-Margin (Cost-Augmented) (Cost-Augmented) Summing Summing Perceptron Time lti

lti Is convex Perceptron Boosting Max-Margin Conditional - PowerPoint PPT Presentation

Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference

Models for LTI systems LTI system stands for linear time invariant system Model describing LTI

Topic 2: LTI Systems and Convolution Response of LTI Systems Impulse response and unit

SIMPLE & LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

lti 1 (typically) Unsupervised learning in NLP non-convex optimization lti 2

lti The Goal Input: educational text Output: quiz lti The Goal Input:

C. H. Perez & Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

dt < | ( ) | h t (this has to do with system stability system stability)

INC 212 Signals and systems Lecture#4: Frequency response of LTI systems Assoc. Prof. Benjamas

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita

Textual Predictors of Bill Survival in Congressional Committees Tae Yano , LTI, CMU Noah Smith ,

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Signal and Systems Chapter 2: LTI Systems Representation of DT signals in terms of shifted unit

dt < | ( ) | h t (this has to do with system stability system stability (ECE

Representation of LTI Systems Prof. Seungchul Lee Industrial AI Lab. Transfer Function

VASCO (VAcuum Stability COde) : multi-gas code to calculate gas density lti d t l l t d it

2. The economists view: a multiproduct firm A multiproduct firm is a firm that produces

CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Delek US Holdings, Inc. Third Quarter 2020 Earnings Call November 5, 2020 Disclaimers Forward

Cost of the School Day slides for school staff contain information, reflective questions and a

COST-MINIMIZATION C ( w, r, Q ) = min { w L + r K | F ( K, L ) Q } FONCs w = Q/ L,

Hardware-Software Codesign 4. System Partitioning Lothar Thiele Swiss Federal Computer

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Optimal Rapidly-exploring Random Trees Miguel Vargas Material taken form: S. Karaman, E.

lti Is convex Perceptron Boosting Max-Margin Conditional - PowerPoint PPT Presentation

Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions Kevin Gimpel and Noah A. Smith lti Is convex Perceptron Boosting Max-Margin Conditional Likelihood MIRA Based on Uses a cost probabilistic function inference

Models for LTI systems LTI system stands for linear time invariant system Model describing LTI

Topic 2: LTI Systems and Convolution Response of LTI Systems Impulse response and unit

SIMPLE &amp; LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

lti 1 (typically) Unsupervised learning in NLP non-convex optimization lti 2

lti The Goal Input: educational text Output: quiz lti The Goal Input:

C. H. Perez &amp; Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

dt &lt; | ( ) | h t (this has to do with system stability system stability)

INC 212 Signals and systems Lecture#4: Frequency response of LTI systems Assoc. Prof. Benjamas

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita

Textual Predictors of Bill Survival in Congressional Committees Tae Yano , LTI, CMU Noah Smith ,

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Signal and Systems Chapter 2: LTI Systems Representation of DT signals in terms of shifted unit

dt &lt; | ( ) | h t (this has to do with system stability system stability (ECE

Representation of LTI Systems Prof. Seungchul Lee Industrial AI Lab. Transfer Function

VASCO (VAcuum Stability COde) : multi-gas code to calculate gas density lti d t l l t d it

2. The economists view: a multiproduct firm A multiproduct firm is a firm that produces

CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Delek US Holdings, Inc. Third Quarter 2020 Earnings Call November 5, 2020 Disclaimers Forward

Cost of the School Day slides for school staff contain information, reflective questions and a

COST-MINIMIZATION C ( w, r, Q ) = min { w L + r K | F ( K, L ) Q } FONCs w = Q/ L,

Hardware-Software Codesign 4. System Partitioning Lothar Thiele Swiss Federal Computer

Applied Machine Learning Applied Machine Learning Decision Trees Siamak Ravanbakhsh Siamak

Optimal Rapidly-exploring Random Trees Miguel Vargas Material taken form: S. Karaman, E.

SIMPLE & LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

C. H. Perez & Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

dt < | ( ) | h t (this has to do with system stability system stability)

dt < | ( ) | h t (this has to do with system stability system stability (ECE