pengtao xie
play

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie - PowerPoint PPT Presentation

Mutual Angular Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent Variable Models (LVMs) Machine Learning Latent Variable


  1. Mutual Angular Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1

  2. Latent Variable Models (LVMs) Machine Learning Latent Variable Models Pattern 2

  3. Latent Variable Models Topic Models Gaussian Mixture Model Topics Groups Words Feature vectors Hidden Markov Model, Kalman Filtering, Restricted Boltzmann Machine, Deep Belief Network, Factor Analysis, etc. Neural Network, Sparse Coding, Matrix Factorization, Distance Metric learning, Principal Component Analysis, etc. 3

  4. Latent Variable Models Latent Factors Components Behind Data in LVMs Topics in Documents Topic Models Politics Economics Education Obama GDP University Constitution Bank Knowledge Student Government Marketing Groups in Images Gaussian Mixture Model Tiger Car Food 4

  5. Motivation I: Popularity of latent factors is skewed  Popularity of latent factors follows a power-law distribution Groups in Topics in News Flickr Photos Dominant Topics Long-Tail Topics Dominant Groups Long-Tail Groups Politics Economics Furniture Flower Diamond Painting Obama GDP Sofa Rose Car Food Constitution Bank Closet Tulip Government Marketing Curtain Lily 5

  6. Standard LVMs are insufficient to capture long-tail factors  Latent Dirichlet Allocation (LDA)  “Extremely common words tend to dominate all topics” (Wallach, 2009)  Tencent Peacock LDA, “When learning ≥ 10 5 topics, around 20% ∼ 40% topics have duplicates in practice” (Wang, 2015 )  Restricted Boltzmann Machine Topic 1 Topic 2 Topic 3 president iraq iraq  Ran on 20-Newsgroup dataset clinton united un  Many duplicate topics (e.g., the three iraq un iraqi united weapons lewinsky exemplar topics are all about politics) spkr iraqi saddam  Common words occur repeatedly house nuclear clinton people india baghdad across topics, such as iraq, clinton, lewinsky minister inspectors united, weapons government saddam weapons white military white 6

  7. Standard LVMs are insufficient to capture long-tail factors Latent factors behind data Components in LVM 7

  8. Long-tail factors are important  The amount of long-tail factors is large Long-tail factors  Long-tail factors are more important than dominant factors in some applications  Example: Tencent applied topic models for advertisement and showed that long- tail topics such as “lose weight”, “nursing” improves click-through rate by 40% (Jin, 2015) 8

  9. Diversification Latent factors behind data Components in LVM 9

  10. Motivation II: Tradeoff induced by the number of components k  Tradeoff between Expressiveness and Complexity  Small k: low expressiveness, low complexity  Large k: high expressiveness, high complexity  Can we achieve the best of both worlds?  Small k: high expressiveness, low complexity 10 10

  11. Reduce model complexity without sacrificing expressiveness Without diversification With diversification Data Samples Use components to capture the principal directions of data point cloud Components in LVM 11 11

  12. Mutual Angular Regularization of LVMs  Goal: encourage the components to diversely spread out to (1) improve the coverage of long-tail latent factors; (2) reduce model complexity without compromising expressiveness  Approach:  Define a score based on mutual angles to measure the diversity of components  Use the score to regularize latent variable models and control the geometry of the latent space during learning 12 12

  13. Outline  Mutual Angular Regularizer  Algorithm  Applications  Theory 13 13

  14. Mutual Angular Regularizer  Components are parametrized by vectors  In Latent Dirichlet Allocation, each topic has a multinomial vector  In Sparse Coding, each dictionary item has a real vector  Measure the dissimilarity between two vectors  Measure the diversity of a vector set 14 14

  15. Dissimilarity between two vectors  Invariant to scale, translation, rotation and orientation of the two vectors  Euclidean distance, L1 distance  Distance 𝑒 is variant to scale d d O O  Negative cosine similarity  Negative cosine similarity 𝑏 is variant to orientation O O a=0.6 a=-0.6 15 15

  16. Dissimilarity between two vectors  Non-obtuse angle 𝜄 𝜄 𝜄 𝜄 O O O  Invariant to scale, translation, rotation and orientation of the two vectors  Definition    x y     arccos   x y   16 16

  17. Measure the diversity of a vector set  Based on the pairwise dissimilarity measure between vectors    a K  The diversity of a set of vectors is defined as A  i i 1 2   Mutual    K K K K  K K  1 1 1    a a          ( ) A   Angular i j   arccos    ij ij pq     K K ( 1) K K ( 1) K K ( 1) ij       i 1 j 1 i 1 j 1 p 1 q 1 a a     Regularizer    i j j i j i q p Mean of angles Variance of angles  Mean: summarize how these vectors are different from each other on the whole  Variance: encourage the vectors to evenly spread out 17 17

  18. LVM with Mutual Angular Regularization (MAR-LVM)    max L D ( ; A ) ( ) A A 2   K K K K  K K  1  1  1         ( ) A      ij ij pq   K K ( 1) K K ( 1) K K ( 1)       i 1 j 1 i 1 j 1 p 1 q 1      j i j i q p    a a     i j arccos   ij a a   i j 18 18

  19. Algorithm  Challenge: the mutual angular regularizer is non-smooth and    a K non-convex w.r.t the parameter vectors A  i i 1  Derive a smooth lower bound  The lower bound is easier to derive if the parameter vectors lie on a sphere  Decompose the parameter vectors into magnitudes and directions  Proved that optimizing the lower bound with gradient ascent method can increase the mutual angular regularizer in each iteration 19 19

  20. Optimization Reparametrize     a g a g a a 1 A diag( ) g A i i i i i i = Ω diag(𝐡)𝐁 Ω 𝐁 Magnitude Direction ~ ~    max L ( D ; g A ) ( A ) ~    g , A max L D A ( ; ) ( ) A ~ g A ,    i , a 1 , g 0 s . t . i i Alternating Optimization ~ ~ g g Fix , optimize Fix , optimize A A ~ ~ ~    max ~ max L ( D ; g A ) ( A ) L ( D ; g A ) g A ~     s . t . a i , 1 s . t . i , g 0 i i 20 20

  21. Optimize 𝑩 ~ ~    max ~ g A A L ( D ; ) ( ) A ~   s . t . i , a 1 i Lower bound      2             T T   ( ) A ( ) A arcsin det A A arcsin det A A   2 𝑈 𝑩 is the volume of the parallelipiped Intuition of the lower bound: det 𝑩 . The larger det 𝑩 𝑈 𝑩 is, the more likely that the formed by the vectors in 𝑩 have larger angles (not surely). Γ 𝑩 is an increasing function w.r.t vectors in 𝑩 𝑈 𝑩 . Hence larger Γ 𝑩 is likely to yield larger Ω 𝑩 . det 𝑩 Optimize the lower bound, which is smooth and much more amenable for optimization ~ ~    max ~ L ( D ; g A ) ( A ) A ~   s . t . i , a 1 i 21 21

  22. Close Alignment between the Regularizer and its Lower Bound  If the lower bound is optimized with projected gradient ascent (PGA), the mutual angular regularizer can be increased in each iteration of the PGA procedure  Optimizing the lower bound with PGA can increase the mean of the angles in each iteration  Optimizing the lower bound with PGA can decrease the variance of the angles in each iteration 2   - K K K K  K K  1 1 1           ( ) A     ij ij pq   K K ( 1) K K ( 1) K K ( 1)       i 1 j 1 i 1 j 1 p 1 q 1      j i j i q p Variance Mean 22 22

  23. Geometry Interpretation of the Close Alignment  The gradient of the lower bound w.r.t is orthogonal to all a i     other vectors a a , , , a a 1 2 K i  Move along its gradient direction would enlarge its angle a i with other vectors a a a are parameter vectors g ˆ  2 3 1 1 a a a g is the gradient of 1 a and are orthogonal to 2 1 1 3     a ˆ a a g 1 1 1 1    ˆ     a The angle between and is greater than a 3 1 a a a between and 3 3 1 a 2 23 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend