Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie - PowerPoint PPT Presentation

Mutual Angular Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1

Latent Variable Models (LVMs) Machine Learning Latent Variable Models Pattern 2

Latent Variable Models Topic Models Gaussian Mixture Model Topics Groups Words Feature vectors Hidden Markov Model, Kalman Filtering, Restricted Boltzmann Machine, Deep Belief Network, Factor Analysis, etc. Neural Network, Sparse Coding, Matrix Factorization, Distance Metric learning, Principal Component Analysis, etc. 3

Latent Variable Models Latent Factors Components Behind Data in LVMs Topics in Documents Topic Models Politics Economics Education Obama GDP University Constitution Bank Knowledge Student Government Marketing Groups in Images Gaussian Mixture Model Tiger Car Food 4

Motivation I: Popularity of latent factors is skewed  Popularity of latent factors follows a power-law distribution Groups in Topics in News Flickr Photos Dominant Topics Long-Tail Topics Dominant Groups Long-Tail Groups Politics Economics Furniture Flower Diamond Painting Obama GDP Sofa Rose Car Food Constitution Bank Closet Tulip Government Marketing Curtain Lily 5

Standard LVMs are insufficient to capture long-tail factors  Latent Dirichlet Allocation (LDA)  “Extremely common words tend to dominate all topics” (Wallach, 2009)  Tencent Peacock LDA, “When learning ≥ 10 5 topics, around 20% ∼ 40% topics have duplicates in practice” (Wang, 2015 )  Restricted Boltzmann Machine Topic 1 Topic 2 Topic 3 president iraq iraq  Ran on 20-Newsgroup dataset clinton united un  Many duplicate topics (e.g., the three iraq un iraqi united weapons lewinsky exemplar topics are all about politics) spkr iraqi saddam  Common words occur repeatedly house nuclear clinton people india baghdad across topics, such as iraq, clinton, lewinsky minister inspectors united, weapons government saddam weapons white military white 6

Standard LVMs are insufficient to capture long-tail factors Latent factors behind data Components in LVM 7

Long-tail factors are important  The amount of long-tail factors is large Long-tail factors  Long-tail factors are more important than dominant factors in some applications  Example: Tencent applied topic models for advertisement and showed that long- tail topics such as “lose weight”, “nursing” improves click-through rate by 40% (Jin, 2015) 8

Diversification Latent factors behind data Components in LVM 9

Motivation II: Tradeoff induced by the number of components k  Tradeoff between Expressiveness and Complexity  Small k: low expressiveness, low complexity  Large k: high expressiveness, high complexity  Can we achieve the best of both worlds?  Small k: high expressiveness, low complexity 10 10

Reduce model complexity without sacrificing expressiveness Without diversification With diversification Data Samples Use components to capture the principal directions of data point cloud Components in LVM 11 11

Mutual Angular Regularization of LVMs  Goal: encourage the components to diversely spread out to (1) improve the coverage of long-tail latent factors; (2) reduce model complexity without compromising expressiveness  Approach:  Define a score based on mutual angles to measure the diversity of components  Use the score to regularize latent variable models and control the geometry of the latent space during learning 12 12

Outline  Mutual Angular Regularizer  Algorithm  Applications  Theory 13 13

Mutual Angular Regularizer  Components are parametrized by vectors  In Latent Dirichlet Allocation, each topic has a multinomial vector  In Sparse Coding, each dictionary item has a real vector  Measure the dissimilarity between two vectors  Measure the diversity of a vector set 14 14

Dissimilarity between two vectors  Invariant to scale, translation, rotation and orientation of the two vectors  Euclidean distance, L1 distance  Distance 𝑒 is variant to scale d d O O  Negative cosine similarity  Negative cosine similarity 𝑏 is variant to orientation O O a=0.6 a=-0.6 15 15

Dissimilarity between two vectors  Non-obtuse angle 𝜄 𝜄 𝜄 𝜄 O O O  Invariant to scale, translation, rotation and orientation of the two vectors  Definition    x y     arccos   x y   16 16

Measure the diversity of a vector set  Based on the pairwise dissimilarity measure between vectors    a K  The diversity of a set of vectors is defined as A  i i 1 2   Mutual    K K K K  K K  1 1 1    a a          ( ) A   Angular i j   arccos    ij ij pq     K K ( 1) K K ( 1) K K ( 1) ij       i 1 j 1 i 1 j 1 p 1 q 1 a a     Regularizer    i j j i j i q p Mean of angles Variance of angles  Mean: summarize how these vectors are different from each other on the whole  Variance: encourage the vectors to evenly spread out 17 17

LVM with Mutual Angular Regularization (MAR-LVM)    max L D ( ; A ) ( ) A A 2   K K K K  K K  1  1  1         ( ) A      ij ij pq   K K ( 1) K K ( 1) K K ( 1)       i 1 j 1 i 1 j 1 p 1 q 1      j i j i q p    a a     i j arccos   ij a a   i j 18 18

Algorithm  Challenge: the mutual angular regularizer is non-smooth and    a K non-convex w.r.t the parameter vectors A  i i 1  Derive a smooth lower bound  The lower bound is easier to derive if the parameter vectors lie on a sphere  Decompose the parameter vectors into magnitudes and directions  Proved that optimizing the lower bound with gradient ascent method can increase the mutual angular regularizer in each iteration 19 19

Optimization Reparametrize     a g a g a a 1 A diag( ) g A i i i i i i = Ω diag(𝐡)𝐁 Ω 𝐁 Magnitude Direction ~ ~    max L ( D ; g A ) ( A ) ~    g , A max L D A ( ; ) ( ) A ~ g A ,    i , a 1 , g 0 s . t . i i Alternating Optimization ~ ~ g g Fix , optimize Fix , optimize A A ~ ~ ~    max ~ max L ( D ; g A ) ( A ) L ( D ; g A ) g A ~     s . t . a i , 1 s . t . i , g 0 i i 20 20

Optimize 𝑩 ~ ~    max ~ g A A L ( D ; ) ( ) A ~   s . t . i , a 1 i Lower bound      2             T T   ( ) A ( ) A arcsin det A A arcsin det A A   2 𝑈 𝑩 is the volume of the parallelipiped Intuition of the lower bound: det 𝑩 . The larger det 𝑩 𝑈 𝑩 is, the more likely that the formed by the vectors in 𝑩 have larger angles (not surely). Γ 𝑩 is an increasing function w.r.t vectors in 𝑩 𝑈 𝑩 . Hence larger Γ 𝑩 is likely to yield larger Ω 𝑩 . det 𝑩 Optimize the lower bound, which is smooth and much more amenable for optimization ~ ~    max ~ L ( D ; g A ) ( A ) A ~   s . t . i , a 1 i 21 21

Close Alignment between the Regularizer and its Lower Bound  If the lower bound is optimized with projected gradient ascent (PGA), the mutual angular regularizer can be increased in each iteration of the PGA procedure  Optimizing the lower bound with PGA can increase the mean of the angles in each iteration  Optimizing the lower bound with PGA can decrease the variance of the angles in each iteration 2   - K K K K  K K  1 1 1           ( ) A     ij ij pq   K K ( 1) K K ( 1) K K ( 1)       i 1 j 1 i 1 j 1 p 1 q 1      j i j i q p Variance Mean 22 22

Geometry Interpretation of the Close Alignment  The gradient of the lower bound w.r.t is orthogonal to all a i     other vectors a a , , , a a 1 2 K i  Move along its gradient direction would enlarge its angle a i with other vectors a a a are parameter vectors g ˆ  2 3 1 1 a a a g is the gradient of 1 a and are orthogonal to 2 1 1 3     a ˆ a a g 1 1 1 1    ˆ     a The angle between and is greater than a 3 1 a a a between and 3 3 1 a 2 23 23

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie - PowerPoint PPT Presentation

Mutual Angular Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent Variable Models (LVMs) Machine Learning Latent Variable

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co- design Pengtao Xie

Mining Software Engineering Data Tao Xie Ahmed E. Hassan North Carolina State University

Health Economics Health & Biosecurity | Health Data Analytic Team Yang Xie | yang.xie@csiro.au

NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs Xinfeng Xie , Xing

Stuart Cousins itle Pre se nte rs Na me a lm.a xie ll.c o m How Many E Mu User s . . .

Wall Turbulence Control by spanwise-traveling waves Wenxuan Xie, Maurizio Quadrio Department of

Robust hypothesis test using Wasserstein uncertainty sets Yao Xie Georgia Institute of

Opportunistic IPv6 Insight via Abusive Traffic Robert Beverly, Geoffrey Xie Naval Postgraduate

SCTP NAT Transverse Considerations <draft-xie-tsvwg-sctp-nat-00.txt > Presenter: Qiaobing

Yi Xie, SRF2013, Paris 1 This talk is adapted from part of my PhD defense presentation at

Statistical Animations Using R Yihui Xie useR! 2008 @ Dortmund Aug 11, 2008 8/20/2008

Precomputed Panel Solver for Aerodynamics Simulation Haoran Xie The University of Tokyo / JAIST

Reasoning about causal belief Kaibo Xie Institute for Logic, Language and Computation July 27,

Towards Unification for Dependent Types Ningning Xie , Bruno C. d. S. Oliveira The University of

EE E6882 SVIA: Homework 1 Due on October 1, 2007 Shih-Fu Chang, Lexing Xie Monday 4:10-6:30

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier W. Xie D. Bindel

Deliverable #4 Marie-Rene Arend Josh Cason Anthony Gentile 4 June 2013 Big idea:

4/3/2014 Disclosure Vena Cava Filters: Research support: Bayer Pharmaceuticals Does

STATUS REPORT ON IN-FOCUS PHASE CONTRAST Bob Glaeser A THE TULIP APERTURE IS A

Program Analysis Toolkit Allen D. Malony and Janice E. Cuny

AIOHTTP INTRODUCTION ANDREW SVETLOV andrew.svetlov@gmail.com BIO Use Python for more than 16

Xiapu Luo, Edmond W. W. Chan, Rocky K. C. Chang Department of Computing The Hong Kong Polytechnic

On the Use of Linked Open Data for Trusting Web Data Davide Ceolin and Valentina Maccatrozzo

Managing financial crises WIDER 2016 Conference, Helsinki The views expressed are my own and do

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie - PowerPoint PPT Presentation

Mutual Angular Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent Variable Models (LVMs) Machine Learning Latent Variable

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co- design Pengtao Xie

Mining Software Engineering Data Tao Xie Ahmed E. Hassan North Carolina State University

Health Economics Health &amp; Biosecurity | Health Data Analytic Team Yang Xie | yang.xie@csiro.au

NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs Xinfeng Xie , Xing

Stuart Cousins itle Pre se nte rs Na me a lm.a xie ll.c o m How Many E Mu User s . . .

Wall Turbulence Control by spanwise-traveling waves Wenxuan Xie, Maurizio Quadrio Department of

Robust hypothesis test using Wasserstein uncertainty sets Yao Xie Georgia Institute of

Opportunistic IPv6 Insight via Abusive Traffic Robert Beverly, Geoffrey Xie Naval Postgraduate

SCTP NAT Transverse Considerations &lt;draft-xie-tsvwg-sctp-nat-00.txt &gt; Presenter: Qiaobing

Yi Xie, SRF2013, Paris 1 This talk is adapted from part of my PhD defense presentation at

Statistical Animations Using R Yihui Xie useR! 2008 @ Dortmund Aug 11, 2008 8/20/2008

Precomputed Panel Solver for Aerodynamics Simulation Haoran Xie The University of Tokyo / JAIST

Reasoning about causal belief Kaibo Xie Institute for Logic, Language and Computation July 27,

Towards Unification for Dependent Types Ningning Xie , Bruno C. d. S. Oliveira The University of

EE E6882 SVIA: Homework 1 Due on October 1, 2007 Shih-Fu Chang, Lexing Xie Monday 4:10-6:30

Edge-Weighted Personalized PageRank: Breaking a Decade-Old Performance Barrier W. Xie D. Bindel

Deliverable #4 Marie-Rene Arend Josh Cason Anthony Gentile 4 June 2013 Big idea:

4/3/2014 Disclosure Vena Cava Filters: Research support: Bayer Pharmaceuticals Does

STATUS REPORT ON IN-FOCUS PHASE CONTRAST Bob Glaeser A THE TULIP APERTURE IS A

Program Analysis Toolkit Allen D. Malony and Janice E. Cuny

AIOHTTP INTRODUCTION ANDREW SVETLOV andrew.svetlov@gmail.com BIO Use Python for more than 16

Xiapu Luo, Edmond W. W. Chan, Rocky K. C. Chang Department of Computing The Hong Kong Polytechnic

On the Use of Linked Open Data for Trusting Web Data Davide Ceolin and Valentina Maccatrozzo

Managing financial crises WIDER 2016 Conference, Helsinki The views expressed are my own and do

Health Economics Health & Biosecurity | Health Data Analytic Team Yang Xie | yang.xie@csiro.au

SCTP NAT Transverse Considerations <draft-xie-tsvwg-sctp-nat-00.txt > Presenter: Qiaobing