* Equal Contributors Maryland Virginia Tech - PowerPoint PPT Presentation

Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs Stephen H. Bach* Bert Huang* Jordan Boyd-Graber Lise Getoor * Equal Contributors Maryland Virginia Tech Colorado UC Santa Cruz ICML 2015

This Talk § In rich, structured domains, latent variables can capture fundamental aspects and increase accuracy § Learning with latent variables needs repeated inferences § Recent work has overcome the inference bottleneck in discrete models, but using continuous variables introduces new challenges § We introduce paired-dual learning (PDL) § PDL is so fast that is often finishes before traditional methods make a single parameter update 2

Latent Variable Models

Community Detection � � � � � � � � � 4

Latent User Attributes � � � Connector? Popular? � � � Introverted? � � � 5

Image Reconstruction § Latent variables can represent archetypical components Originals With LVs Without § Learned components for face reconstruction: 6

Learning with Latent Variables

Model § Observations x § Targets with ground-truth labels ˆ y y § Latent (unlabeled) z § Parameters w 1 − w > φ ( x , y , z ) � � P ( y , z | x ; w ) = Z ( x ; w ) exp X − w > φ ( x , y , z ) � � Z ( x ; w ) = exp y , z 8

Learning Objective log P (ˆ y | x ; w ) = log Z ( x , ˆ y ; w ) − log Z ( x ; w ) w > φ ( x , y , z ) ⇥ ⇤ = min max − H ( ρ ) q 2 ∆ ( z ) E ρ ρ 2 ∆ ( y , z ) w > φ ( x , ˆ ⇥ ⇤ y , z ) + H ( q ) − E q Optimize w Inference in P ( y , z | x ; w ) Inference in P ( z | x , ˆ y ; w ) 9

Traditional Method § Perform full inference in each distribution § Compute the gradient with respect to w § Update using the gradient w Optimize r w w Inference in P ( y , z | x ; w ) Inference in P ( z | x , ˆ y ; w ) 10

How can we solve the inference bottleneck? 11

Smart Supervised Learning § Supervised learning objective contains an inner inference § Interleave inference and learning - e.g., Taskar et al. [ICML 2005], Meshi et al. [ICML 2010], Hazan and Urtasun [NIPS 2010] § Idea: turn saddle-point optimization into joint minimization by dualizing inner inference problem : 12

Smart Latent Variable Learning § For discrete models , Schwing et al. [ICML 2012] proposed dualizing one of the inferences and interleaving with parameter updates Optimize w r w Inference in P ( y , z | x ; w ) Inference in P ( z | x , ˆ y ; w ) r δ 13

How can we solve the inference bottleneck for continuous models? 14

Continuous Structured Prediction § The learning objective contains expectations and entropy functions that are intractable for continuous distributions § Recently, there’s been a lot of work on developing - continuous probabilistic graphical models - continuous probabilistic programming languages 15

Hinge-Loss Markov Random Fields § Natural language processing - Beltagy et al. [ACL 2014], Foulds et al. [ICML 2015] § Social network analysis - Huang et al. [SBP 2013], West et al. [TACL 2014], Li et al. [2014] § Massive open online course (MOOC) analysis - Ramesh et al. [AAAI 2014, ACL 2015] § Bioinformatics - Fakhraei et al. [TCBB 2014] 16

Hinge-Loss Markov Random Fields § MRFs over continuous variables in [0,1] and hinge-loss potential functions 0 1 m X w j (max { � j ( y ) , 0 } ) p j P ( y ) ∝ exp @ − A j =1 where is a linear function and p j ∈ { 1 , 2 } ` j 17

MAP Inference in HL-MRFs § Exact MAP inference in HL-MRFs is very fast, thanks to the alternating direction method of multipliers (ADMM) § ADMM decomposes inference by - Forming augmented Lagrangian - Iteratively updating blocks of variables L w ( y , z , α , ¯ y , ¯ z ) 18

Paired-Dual Learning

Continuous Latent Variables § The objective is the same, but the expectations and entropies are intractable arg min max min ρ 2 ∆ ( y , z ) q 2 ∆ ( z ) w λ 2 k w k 2 � E ρ w > φ ( x , y , z ) ⇥ ⇤ + H ( ρ ) w > φ ( x , ˆ ⇥ ⇤ + E q y , z ) � H ( q ) 20

Variational Approximations § We can restrict the distribution families to single points - In other words, we can approximate expectations with MAP - Great for models with fast, convex inference, like HL-MRFs § But, the entropy of a point distribution is always zero arg min max min z 0 y , z w λ 2 k w k 2 � w > φ ( x , y , z ) + w > φ ( x , ˆ y , z 0 ) § Therefore, is always a global optimum w = 0 21

Entropy Surrogates § We design surrogates to fill the role of entropy terms - They need to be tractable - Choice should be tailored to problem and model - Options include curvature and one-sided vs. two-sided § Goal: require non-zero parameters to predict ground truth § Example: − max { y, 0 } 2 − max { 1 − y, 0 } 2 22

Paired-Dual Learning arg min max min z 0 y , z w λ 2 k w k 2 � w > φ ( x , y , z ) + h ( y , z ) + w > φ ( x , ˆ y , z 0 ) � h (ˆ y , z 0 ) § Repeatedly solving the inner inference problems with ADMM still becomes expensive § But we can replace the inference problems with their augmented Lagrangians 23

Paired-Dual Learning arg min Optimize max min min v 0 max w r w v , ¯ v 0 , ¯ α 0 v α w λ 2 k w k 2 + L 0 w ( v 0 , α 0 , ¯ v 0 ) � L w ( v , α , ¯ v ) Optimize Optimize L 0 w ( z 0 , α 0 , ¯ z 0 ) L w ( y , z , α , ¯ y , ¯ z ) ( z 0 , α 0 , ¯ z 0 ) ( y , z , α , ¯ y , ¯ z ) § If the inner maxes and mins were solved to convergence this objective would be equivalent § Instead, paired-dual learning iteratively updates the parameters and blocks of Lagrangian variables 24

Evaluation

Evaluation § Three real-world problems: - Community detection - Latent user attributes - Image reconstruction § Learning methods: - Paired-dual learning (PDL) (N=1, N=10) - Expectation maximization (EM) - Primal gradient descent (Primal) § Evaluated: - Learning objective - Predictive performance - Vs. ADMM (inference) iterations 26

Community Detection § Case Study: 2012 Venezuelan Presidential Election - Incumbent: Hugo Chávez - Challenger: Henrique Capriles Chávez Capriles 27 Left: This photograph was produced by Agência Brasil, a public Brazilian news agency. This file is licensed under the Creative Commons Attribution 3.0 Brazil license. Right: This photograph was produced by Wilfredor. This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

Twitter (One Fold) 4 x 10 PDL, N=1 5 PDL, N=10 EM Objective Primal 4 3 2 0 500 1000 1500 2000 2500 ADMM iterations 28

Twitter (One Fold) 0.4 0.3 AuPR 0.2 PDL, N=1 PDL, N=10 0.1 EM Primal 0 0 500 1000 1500 2000 2500 ADMM iterations 29

Latent User Attributes § Task: trust prediction in Epinions social network [Richardson et al., ISWC 2003] § Latent variables represent whether users are: Trusting? Trustworthy? � � � 30

Epinions (One Fold) 12000 PDL, N=1 PDL, N=10 10000 EM Objective Primal 8000 6000 4000 2000 0 1000 2000 ADMM iterations 31

Epinions (One Fold) 0.6 0.4 AuPR PDL, N=1 0.2 PDL, N=10 EM Primal 0 0 500 1000 1500 2000 2500 ADMM iterations 32

Image Reconstruction § Tested on Olivetti faces [Famaria and Harter, 1994], using experimental protocol of Poon and Domingos [UAI 2012] § Latent variables capture facial structure Originals With LVs Without 33

Image Reconstruction 5000 PDL, N=1 PDL, N=10 EM Primal 4500 Objective 4000 3500 0 1000 2000 3000 4000 ADMM iterations 34

Image Reconstruction 1800 PDL, N=1 PDL, N=10 EM Primal 1600 MSE 1400 1200 0 1000 2000 3000 4000 ADMM iterations 35

Conclusion

Conclusion § Continuous latent variables - Capture rich, nuanced information in structured domains - Learning them introduces new challenges Thank You! § Paired-dual learning bach@cs.umd.edu @stevebach - Learns accurate models much faster than traditional methods, often before they make a single parameter update - Makes large-scale, latent variable hinge-loss MRFs practical § Open questions - Convergence proof for paired-dual learning - Should we also use it for discrete models? 37

* Equal Contributors Maryland Virginia Tech - PowerPoint PPT Presentation

Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs Stephen H. Bach* Bert Huang* Jordan Boyd-Graber Lise Getoor * Equal Contributors Maryland Virginia Tech Colorado

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Equal Value: How do We Get There? The Presentation Equal Pay and UN Global Compact

Todays Agenda Welcome Thank you Contributors Why this is important Introductions Q&A

Equality: Are Some Equality: Are Some More Equal than than More Equal Others? Others? All

European Quadricycles League (EQUAL) United- Kingdom PRESENTATION OF EQUAL EQUAL is the European

Pay Equity Law Federal Equal Pay Act Employers are required to provide equal pay for men and

Practice on Equal Pay for Work of Equal Value 28 October 2014 Presented by: Lionel van

Diane B ne B. Allen len Equal Pay Act Diane B. Allen Equal Pay Act On April 24, 2018, Governor

Equal Rights Trust No One Left Behind: An Equal Rights Approach to Sustainable Development

K EY INSIGHTS FROM EQUAL EQUAL funding (particularly under the Adaptability pillar) has provided

Commi Commission ssion Briefing Briefing on on Equal Employment Equal Employment Opportu

Commission Briefing on Commission Briefing on Equal Employment Equal Employment Opportunity and

Commission Briefing on Commission Briefing on Human Capital and Equal Human Capital and Equal

What Make Long Term Contributors Willingness and Opportunity in Open Source Community Minghui

DECAN: Differential Analysis for fine level performance evaluation. Current contributors: E.

INCREASING STUDENT OPPORTUNITIES FOR RIGOROUS COURSEWORK IN HIGH SCHOOL Equal Opportunity

Discrete vs. Continuous Data MDM4U: Mathematics of Data Management Recap Identify the discrete

Agenda Course 02402 Introduction to Statistics Continuous random variables and distributions 1

01 Foundations Shravan Vasishth SMLP Shravan Vasishth 01 Foundations SMLP 1 / 29 Preview:

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. 1 Random Variables

Probability Chapters 4 & 5 1 Overview Statistics important for What are some

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Advanced Mathematical Methods Part II Statistics Probability Distributions Mel Slater

Alexander Kanevskiy OpenIoT Summit Europe 2016-10-12 Who am I? Alexander.Kanevskiy@Intel.com 2

* Equal Contributors Maryland Virginia Tech - PowerPoint PPT Presentation

Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs Stephen H. Bach* Bert Huang* Jordan Boyd-Graber Lise Getoor * Equal Contributors Maryland Virginia Tech Colorado

Lesson 2 Greek Vocabulary One does not equal five!!! One does not equal five!!! One does not

Equal Value: How do We Get There? The Presentation Equal Pay and UN Global Compact

Todays Agenda Welcome Thank you Contributors Why this is important Introductions Q&amp;A

Equality: Are Some Equality: Are Some More Equal than than More Equal Others? Others? All

European Quadricycles League (EQUAL) United- Kingdom PRESENTATION OF EQUAL EQUAL is the European

Pay Equity Law Federal Equal Pay Act Employers are required to provide equal pay for men and

Practice on Equal Pay for Work of Equal Value 28 October 2014 Presented by: Lionel van

Diane B ne B. Allen len Equal Pay Act Diane B. Allen Equal Pay Act On April 24, 2018, Governor

Equal Rights Trust No One Left Behind: An Equal Rights Approach to Sustainable Development

K EY INSIGHTS FROM EQUAL EQUAL funding (particularly under the Adaptability pillar) has provided

Commi Commission ssion Briefing Briefing on on Equal Employment Equal Employment Opportu

Commission Briefing on Commission Briefing on Equal Employment Equal Employment Opportunity and

Commission Briefing on Commission Briefing on Human Capital and Equal Human Capital and Equal

What Make Long Term Contributors Willingness and Opportunity in Open Source Community Minghui

DECAN: Differential Analysis for fine level performance evaluation. Current contributors: E.

INCREASING STUDENT OPPORTUNITIES FOR RIGOROUS COURSEWORK IN HIGH SCHOOL Equal Opportunity

Discrete vs. Continuous Data MDM4U: Mathematics of Data Management Recap Identify the discrete

Agenda Course 02402 Introduction to Statistics Continuous random variables and distributions 1

01 Foundations Shravan Vasishth SMLP Shravan Vasishth 01 Foundations SMLP 1 / 29 Preview:

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. 1 Random Variables

Probability Chapters 4 &amp; 5 1 Overview Statistics important for What are some

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Advanced Mathematical Methods Part II Statistics Probability Distributions Mel Slater

Alexander Kanevskiy OpenIoT Summit Europe 2016-10-12 Who am I? Alexander.Kanevskiy@Intel.com 2

Todays Agenda Welcome Thank you Contributors Why this is important Introductions Q&A

Probability Chapters 4 & 5 1 Overview Statistics important for What are some