Dual-Decomposed Learning with Factorwise Oracles for Structured - PowerPoint PPT Presentation

Dual-Decomposed Learning with Factorwise Oracles for Structured Prediction of Large Output Domain Xiangru Huang ∗ Joint work 1 with Ian E.H. Yen † , Kai Zhong ∗ , Ruohan Zhang ∗ , Chia Dai † , Pradeep Ravikumar † and Inderjit Dhillon ∗ . ∗ University of Texas at Austin † Carnegie Mellon University 1 [1] Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain. NIPS 2016.

Outline Motivations Key Idea Methodology Sketch Experimental Results

Problem Setting ◮ Classification: learn function g : X → Y

Problem Setting ◮ Classification: learn function g : X → Y ◮ Structural: Assuming structured dependencies on output g : X → Y 1 × Y 2 × · · · × Y m

Example: Sequence Labeling ◮ Unigram Factor: θ u : Y t × X t → R ◮ Bigram Factor: Y b = Y t − 1 × Y t θ b : Y b → R Figure: Sequence Labeling

Example: Multi-Label Classification with Pairwise Interaction ◮ Unigram Factor : θ u : Y k × X → R ◮ Bigram Factor : Y b = Y k × Y k ′ θ b : Y b → R Figure: Multi-Label with Pairwise Interaction

Motivations ◮ g : X → Y 1 × Y 2 × · · · × Y m

Motivations ◮ g : X → Y 1 × Y 2 × · · · × Y m ◮ Learning requires inference per iteration. ◮ Exact inference is slow: each iteration takes O( |Y i | n ) for n-gram factor, where |Y i | ≥ 3000.

Motivations ◮ g : X → Y 1 × Y 2 × · · · × Y m ◮ Learning requires inference per iteration. ◮ Exact inference is slow: each iteration takes O( |Y i | n ) for n-gram factor, where |Y i | ≥ 3000. ◮ Approximation downgrades performance.

Key Idea: Dual Decomposed Learning ◮ Structural Oracle (joint inference) is too expensive.

Key Idea: Dual Decomposed Learning ◮ Structural Oracle (joint inference) is too expensive. ◮ Reduce Structural SVM to Multiclass SVMs via soft enforcement of consistency between factors.

Key Idea: Dual Decomposed Learning ◮ Structural Oracle (joint inference) is too expensive. ◮ Reduce Structural SVM to Multiclass SVMs via soft enforcement of consistency between factors. ◮ (Cheap) Active Sets + Factorwise Oracles + Message Passing (between factors).

Key Idea: Factorwise Oracles ◮ Inner-Product (unigram) Factor : θ w ( x , y ) = � w y , x � . ◮ Reduces to a primal and dual sparse Extreme Multiclass SVM . ◮ Reduce O ( ·|A i | ) (details see [2]) 2 . D ·|Y i | ) to O ( |F u | �� feat. dim. #uni. fac. ◮ Indicator (bigram) Factor : θ ( y 1 , y 2 ) = v y 1 , y 2 . ◮ Maintain Priority Queue on v y 1 , y 2 . ◮ Reduce O ( |Y 1 ||Y 2 | ) to O ( |A 1 ||A 2 | ). � �� active set sizes 2 [2] PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. ICML 2016.

Methodology Sketch ◮ Original problem: n 1 � 2 � w � 2 + C min L ( w ; x i , y i ) w i =1 � �� struct hinge loss 3 Simon Julien et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs. ICML 2013.

Methodology Sketch ◮ Original problem: n 1 � 2 � w � 2 + C min L ( w ; x i , y i ) w i =1 � �� struct hinge loss ◮ Dual-Decomposed into independent problems: α f ∈ ∆ |Y f | G ( α ) := 1 � � � φ ( x f , y f ) T α f � 2 − δ T min � j α j 2 F f ∈ F j ∈V � �� Independent Multiclass SVMs with consistency constraints M if α f = α i , ∀ ( i , f ) ∈ E . ◮ Standard approach 3 finds feasible descent direction, which however needs joint inference. 3 Simon Julien et al. Block-Coordinate Frank-Wolfe Optimization for Structural SVMs. ICML 2013.

Methodology Sketch ◮ Dual-Decomposed into independent problems: α f ∈ ∆ |Y f | G ( α ) := 1 � � � φ ( x f , y f ) T α f � 2 − δ T min � j α j 2 F f ∈ F j ∈V with consistency constraints M jf α f = α j , ∀ ( j , f ) ∈ E ◮ Augmented Lagrangian Method: + ρ � � � M jf α f − α j + λ t jf � 2 L ( α, λ ) := G F ( α F ) 2 F ( j , f ) ∈E � �� indep. multiclass SVMs messages between factors (sparse) with incremental updated multipliers λ t +1 = λ t jf + η ( M jf α t +1 − α t +1 ) j jf f

Methodology Sketch ◮ Augmented Lagrangian Method: + ρ � � � M jf α f − α j + λ t jf � 2 L ( α, λ ) := G F ( α F ) 2 F ( j , f ) ∈E � �� indep. multiclass SVMs messages between factors (sparse) with incremental updated multipliers λ t +1 = λ t jf + η ( M jf α t +1 − α t +1 ) jf f j ◮ Update α and λ alternatively.

Experiments: Sequence Labeling (on ChineseOCR) ◮ Chinese OCR: N = 12 , 064, T = 14 . 4, D = 400 , K = 3 , 039. ◮ |Y b | = 3 , 039 2 = 9 , 235 , 521 (bigram language model). ◮ Decoding: Viterbi Algorithm. ChineseOCR × 10 5 ChineseOCR 3 0.95 BCFW GDMM-subFMO 0.9 SSG Soft-BCFW- ρ =1 0.85 Soft-BCFW- ρ =10 BCFW 2.5 GDMM-subFMO 0.8 SSG 0.75 Soft-BCFW- ρ =1 test error Objective Soft-BCFW- ρ =10 0.7 2 0.65 0.6 0.55 0.5 1.5 0.45 10 3 10 4 10 3 10 4 time time Figure: Test Error Figure: Objective

Experiments: Multi-Label Classification (on RCV1) ◮ RCV-1: N = 23 , 149, D = 47 , 236 , K = 228. ◮ |F b | = 228 2 = 51 , 984 (pairwise interaction). ◮ Decoding: Linear Program RCV1-regions RCV1-regions BCFW 10 9 BCFW GDMM-subFMO GDMM-subFMO SSG SSG Soft-BCFW- ρ =1 Soft-BCFW- ρ =1 Soft-BCFW- ρ =10 Soft-BCFW- ρ =10 10 8 10 7 test error Objective 10 -2 10 6 10 5 10 4 10 2 10 3 10 4 10 5 10 2 10 3 10 4 10 5 time time Figure: Objective Figure: Test Error

Dual-Decomposed Learning with Factorwise Oracles for Structured - PowerPoint PPT Presentation

Dual-Decomposed Learning with Factorwise Oracles for Structured Prediction of Large Output Domain Xiangru Huang Joint work 1 with Ian E.H. Yen , Kai Zhong , Ruohan Zhang , Chia Dai , Pradeep Ravikumar and Inderjit Dhillon

Format Oracles on OpenPGP F. Maury J.-R. Reinhard O. Levillain H. Gilbert ANSSI, France

Oracles and Tokens Prof. Tom Austin San Jos State University Oracles Motivation EVM

Oracles in TTCN-3 and UTP Ina Schieferdecker 2012, May 22nd, CREST Workshop, London Outline

Automated Test Oracles Automated Test Oracles for GUIs for GUIs Eighth International Symposium

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

Lenguaje dual en el distrito 47 Dual Language in District 47 2017-2018 What is Dual Language?

Web Application for the Dual Web Application for the Dual Web Application for the Dual Web

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

Test Oracles and Test Script Generation in Combinatorial Testing Peter M. Kruse Berner &

Dual Credit Courses What does it mean to be a dual credit student? Dual enrollment simply means: A

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

FBISD/HCC Dual Credit Program Welcome! We are excited to have you participate in FBISD/HCCs

Dual Credit Temple College Please pick up a Dual Credit and/or REACH Packet. DO NOT fill

Dual-Enrollment Lakewood Ranch High School What is Dual-Enrollment? The dual enrollment

Dual Enrollment 2020-21 presentation by Lori Morrell What is Dual Enrollment? Dual

Differential Privacy Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline

CS3000: Algorithms & Data Jonathan Ullman Lecture 16: Applications of Network Flow

National Healthcare Safety Network (NHSN) NHSN Analysis: Advanced Features & Terminology

Goodbye World! The perils of relying on output streams in C Jim Meyering meyering@redhat.com

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Round-Optimal Secure Multiparty Computation with Honest Majority Prabhanjan Ananth Arka Rai

Generating output in the COMIC multimodal dialogue system Mary Ellen Foster School of

Lecture 7: Convolutional Networks Justin Johnson Lecture 7 - 1 September 24, 2019 Reminder: A2

Dual-Decomposed Learning with Factorwise Oracles for Structured - PowerPoint PPT Presentation

Dual-Decomposed Learning with Factorwise Oracles for Structured Prediction of Large Output Domain Xiangru Huang Joint work 1 with Ian E.H. Yen , Kai Zhong , Ruohan Zhang , Chia Dai , Pradeep Ravikumar and Inderjit Dhillon

Format Oracles on OpenPGP F. Maury J.-R. Reinhard O. Levillain H. Gilbert ANSSI, France

Oracles and Tokens Prof. Tom Austin San Jos State University Oracles Motivation EVM

Oracles in TTCN-3 and UTP Ina Schieferdecker 2012, May 22nd, CREST Workshop, London Outline

Automated Test Oracles Automated Test Oracles for GUIs for GUIs Eighth International Symposium

Calhoun Community College Dual Enrollment Info Session for Students &amp; Parents What is Dual

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

Lenguaje dual en el distrito 47 Dual Language in District 47 2017-2018 What is Dual Language?

Web Application for the Dual Web Application for the Dual Web Application for the Dual Web

Training Deterministic Parsers with Non-Deterministic Oracles by Yoav Goldberg and Joakim

Test Oracles and Test Script Generation in Combinatorial Testing Peter M. Kruse Berner &amp;

Dual Credit Courses What does it mean to be a dual credit student? Dual enrollment simply means: A

Dual Interface Technology Update EuroForum 2014 Munich Agenda 1/ Dual Interface Technologies

FBISD/HCC Dual Credit Program Welcome! We are excited to have you participate in FBISD/HCCs

Dual Credit Temple College Please pick up a Dual Credit and/or REACH Packet. DO NOT fill

Dual-Enrollment Lakewood Ranch High School What is Dual-Enrollment? The dual enrollment

Dual Enrollment 2020-21 presentation by Lori Morrell What is Dual Enrollment? Dual

Differential Privacy Privacy &amp; Fairness in Data Science CS848 Fall 2019 2 Outline

CS3000: Algorithms &amp; Data Jonathan Ullman Lecture 16: Applications of Network Flow

National Healthcare Safety Network (NHSN) NHSN Analysis: Advanced Features &amp; Terminology

Goodbye World! The perils of relying on output streams in C Jim Meyering meyering@redhat.com

CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer

Round-Optimal Secure Multiparty Computation with Honest Majority Prabhanjan Ananth Arka Rai

Generating output in the COMIC multimodal dialogue system Mary Ellen Foster School of

Lecture 7: Convolutional Networks Justin Johnson Lecture 7 - 1 September 24, 2019 Reminder: A2

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

Test Oracles and Test Script Generation in Combinatorial Testing Peter M. Kruse Berner &

Differential Privacy Privacy & Fairness in Data Science CS848 Fall 2019 2 Outline

CS3000: Algorithms & Data Jonathan Ullman Lecture 16: Applications of Network Flow

National Healthcare Safety Network (NHSN) NHSN Analysis: Advanced Features & Terminology