bag of components an online algorithm for batch learning
play

Bag-of-components: an online algorithm for batch learning of mixture - PowerPoint PPT Presentation

Information Geometry for mixtures Co-Mixture Models Bag of components Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander Frank Nielsen Universit Pierre et Marie Curie, Paris, France cole


  1. Information Geometry for mixtures Co-Mixture Models Bag of components Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander Frank Nielsen Université Pierre et Marie Curie, Paris, France École polytechnique, Palaiseau, France October 29, 2015 1 / 20

  2. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Exponential families Definition p ( x ; λ ) = p F ( x ; θ ) = exp ( � t ( x ) | θ � − F ( θ ) + k ( x )) ◮ λ source parameter ◮ t ( x ) sufficient statistic ◮ θ natural parameter ◮ F ( θ ) log-normalizer ◮ k ( x ) carrier measure F is a stricly convex and differentiable function �·|·� is a scalar product 2 / 20

  3. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Multiple parameterizations: dual parameter spaces Multiple source parameterizations Source Parameters (not unique) λ 1 ∈ Λ 1 , λ 2 ∈ Λ 2 , . . . , λ n ∈ Λ n Legendre Transform ( F, Θ) ↔ ( F ⋆ , H ) θ = ∇ F ⋆ ( η ) η = ∇ F ( θ ) θ ∈ Θ η ∈ H Natural Parameters Expectation Parameters Two canonical parameterizations 3 / 20

  4. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Bregman divergences Definition and properties B F ( x � y ) = F ( x ) − F ( y ) − � x − y , ∇ F ( y ) � ◮ F is a stricly convex and differentiable function ◮ No symmetry! Contains a lot of common divergences ◮ Squared Euclidean, Mahalanobis, Kullback-Leibler, Itakura-Saito. . . 4 / 20

  5. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Bregman centroids Right-sided centroid Left-sided centroid � � min ω i B F ( c � x i ) min ω i B F ( x i � c ) c c i i Closed-form �� � � c R = c L = ∇ F ∗ ω i x i ω i ∇ F ( x i ) i i 5 / 20

  6. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Link with exponential families [Banerjee 2005] Bijection with exponential families log p F ( x | θ ) = − B F ∗ ( t ( x ) � η ) + F ∗ ( t ( x )) + k ( x ) Kullback-Leibler between exponential families ◮ between members of the same exponential family KL ( p F ( x , θ 1 ) , p F ( x , θ 2 )) = B F ( θ 2 � θ 1 ) = B F ⋆ ( η 1 � η 2 ) Kullback-Leibler centroids ◮ In closed-form through the Bregman divergence 6 / 20

  7. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Maximum likelihood estimator A Bregman centroid � η = arg max ˆ log p F ( x i , η ) η i � B F ∗ ( t ( x i ) � η ) − F ∗ ( t ( x i )) − k ( x i ) = arg min � �� � η i does not depend on η � = arg min B F ∗ ( t ( x i ) � η ) η i � = t ( x i ) i And ˆ θ = ∇ F ⋆ (ˆ η ) 7 / 20

  8. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Mixtures of exponential families � m ( x ; ω, θ ) = ω i p F ( x ; θ i ) 1 ≤ i ≤ k Fixed Parameters ◮ Family of the components P F ◮ Weights � i ω i = 1 ◮ Number of components k ◮ Component parameters θ i (model selection techniques to choose) Learning a mixture ◮ Input: observations x 1 , . . . , x N ◮ Output: ω i and θ i 8 / 20

  9. Information Geometry for mixtures Exponential families Co-Mixture Models Bregman divergences Bag of components Mixture models Bregman Soft Clustering: EM for exponential families [Banerjee 2005] E-step p ( i , j ) = ω j p F ( x i , θ j ) m ( x i ) M-step � η j = arg max p ( i , j ) log p F ( x i , θ j ) η i   �    B F ∗ ( t ( x i ) � η ) − F ∗ ( t ( x i )) − k ( x i ) = arg min p ( i , j )  � �� � η i does not depend on η � p ( i , j ) = u p ( u , j ) t ( x u ) � i 9 / 20

  10. Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Joint estimation of mixture models Exploit shared information between multiple pointsets ◮ to improve quality ◮ to improve speed Inspiration Efficient algorithms ◮ Dictionary methods ◮ Building ◮ Transfer learning ◮ Comparing 10 / 20

  11. Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Co-Mixtures Sharing components of all the mixtures k � ω (1) m 1 ( x | ω (1) , η ) = p F ( x | η j ) i i =1 . . . k � ω ( S ) m S ( x | ω ( S ) , η ) = p F ( x | η j ) i i =1 ◮ Same η 1 . . . η k everywhere ◮ Different weights ω ( l ) 11 / 20

  12. Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications co-Expectation-Maximization Maximize the mean of the likelihoods on each mixtures E-step ◮ A posterior matrix for each dataset ω ( l ) j p F ( x i , θ j ) p ( l ) ( i , j ) = m ( x ( l ) | ω ( l ) , η ) i M-step ◮ Maximization on each dataset � p ( i , j ) η ( l ) u p ( l ) ( u , j ) t ( x ( l ) = u ) � j i ◮ Aggregation S � η j = 1 η ( l ) j S l =1 12 / 20

  13. Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Variational approximation of Kullback-Leibler [Hershey Olsen 2007] � j ω (1) e − KL ( p F ( · ; θ i ) � p F ( · ; θ j )) K � ω (1) j � KL Variationnal ( m 1 , m 2 ) = log � i j ω (2) e − KL ( p F ( · ; θ i ) � p F ( · ; θ j )) i =1 j With shared parameters ◮ Precompute D ij = e − KL ( p F ( ·| η i ) , p F ( ·| η j )) Fast version � j ω (1) e − D ij � ω (1) j KL var ( m 1 � m 2 ) = log � i j ω (2) e − D ij i j 13 / 20

  14. Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications co-Segmentation Segmentation from 5D RGBxy mixtures Original EM Co-EM 14 / 20

  15. Information Geometry for mixtures Motivation Co-Mixture Models Algorithms Bag of components Applications Transfer learning Increase the quality of one particular mixture of interest ◮ First image: only 1% of the points ◮ Two other images: full set of points ◮ Not enough points for EM 15 / 20

  16. Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Bag of Components Training step ◮ Comix on some training set ◮ Keep the parameters ◮ Costly but offline D = { θ 1 , . . . , θ K } Online learning of mixtures ◮ For a new pointset ◮ For each observation arriving: arg max θ ∈D p F ( x j , θ ) or arg min θ ∈D B F ( t ( x j ) , θ ) 16 / 20

  17. Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Nearest neighbor search Naive version ◮ Linear search ◮ O ( number of samples × number of components ) ◮ Same order of magnitude as one step of EM Improvement ◮ Computational Bregman Geometry to speed-up the search ◮ Bregman Ball Trees ◮ Hierarchical clustering ◮ Approximate nearest neighbor 17 / 20

  18. Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Image segmentation Segmentation on a random subset of the pixels 100% 10% 1% EM BoC 18 / 20

  19. Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Computation times 120 Training 100 EM BoC 80 60 40 20 0 Training 100% 10% 1% 19 / 20

  20. Information Geometry for mixtures Algorithm Co-Mixture Models Experiments Bag of components Summary Comix ◮ Mixtures with shared components ◮ Compact description of a lot of mixtures ◮ Fast KL approximations ◮ Dictionary-like methods Bag of Components ◮ Online method ◮ Predictable time (no iteration) ◮ Works with only a few points ◮ Fast 20 / 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend