Learning Mixtures of Spherical Gaussians: Moment Methods and - PowerPoint PPT Presentation

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Daniel Hsu and Sham M. Kakade Microsoft Research, New England Also based on work with Anima Anandkumar (UCI) , Rong Ge (Princeton) , Matus Telgarsky (UCSD) . 1

Unsupervised machine learning ◮ Many applications in machine learning and statistics : ◮ Lots of high-dimensional data, but mostly unlabeled. 2

Unsupervised machine learning ◮ Many applications in machine learning and statistics : ◮ Lots of high-dimensional data, but mostly unlabeled. ◮ Unsupervised learning : discover interesting structure of population from unlabeled data. ◮ This talk : learn about sub-populations in data source. 2

Learning mixtures of Gaussians Mixture of Gaussians : � k i = 1 w i N ( � µ i , Σ i ) k sub-populations; each modeled as multivariate Gaussian N ( � µ i , Σ i ) together with mixing weight w i . 3

Learning mixtures of Gaussians Mixture of Gaussians : � k i = 1 w i N ( � µ i , Σ i ) k sub-populations; each modeled as multivariate Gaussian N ( � µ i , Σ i ) together with mixing weight w i . Goal: efficient algorithm that approximately recovers parameters from samples. 3

Learning mixtures of Gaussians Mixture of Gaussians : � k i = 1 w i N ( � µ i , Σ i ) k sub-populations; each modeled as multivariate Gaussian N ( � µ i , Σ i ) together with mixing weight w i . Goal: efficient algorithm that approximately recovers parameters from samples. (Alternative goal: density estimation. Not in this talk.) 3

Learning setup ◮ Input : i.i.d. sample S ⊂ R d from unknown mixtures of Gaussians with parameters θ ⋆ := { ( � µ i ⋆ , Σ ⋆ i , w i ⋆ ) : i ∈ [ k ] } . 4

Learning setup ◮ Input : i.i.d. sample S ⊂ R d from unknown mixtures of Gaussians with parameters θ ⋆ := { ( � µ i ⋆ , Σ ⋆ i , w i ⋆ ) : i ∈ [ k ] } . ◮ Each data point drawn from one of k Gaussians N ( � µ i ⋆ , Σ ⋆ i ) µ i ⋆ , Σ ⋆ i ) with probability w i ⋆ .) (choose N ( � 4

Learning setup ◮ Input : i.i.d. sample S ⊂ R d from unknown mixtures of Gaussians with parameters θ ⋆ := { ( � µ i ⋆ , Σ ⋆ i , w i ⋆ ) : i ∈ [ k ] } . ◮ Each data point drawn from one of k Gaussians N ( � µ i ⋆ , Σ ⋆ i ) µ i ⋆ , Σ ⋆ i ) with probability w i ⋆ .) (choose N ( � ◮ But “labels” are not observed . 4

Learning setup ◮ Input : i.i.d. sample S ⊂ R d from unknown mixtures of Gaussians with parameters θ ⋆ := { ( � µ i ⋆ , Σ ⋆ i , w i ⋆ ) : i ∈ [ k ] } . ◮ Each data point drawn from one of k Gaussians N ( � µ i ⋆ , Σ ⋆ i ) µ i ⋆ , Σ ⋆ i ) with probability w i ⋆ .) (choose N ( � ◮ But “labels” are not observed . ◮ Goal : estimate parameters θ = { ( � µ i , Σ i , w i ) : i ∈ [ k ] } such that θ ≈ θ ⋆ . 4

Learning setup ◮ Input : i.i.d. sample S ⊂ R d from unknown mixtures of Gaussians with parameters θ ⋆ := { ( � µ i ⋆ , Σ ⋆ i , w i ⋆ ) : i ∈ [ k ] } . ◮ Each data point drawn from one of k Gaussians N ( � µ i ⋆ , Σ ⋆ i ) µ i ⋆ , Σ ⋆ i ) with probability w i ⋆ .) (choose N ( � ◮ But “labels” are not observed . ◮ Goal : estimate parameters θ = { ( � µ i , Σ i , w i ) : i ∈ [ k ] } such that θ ≈ θ ⋆ . ◮ In practice : local search for maximum-likelihood parameters (E-M algorithm). 4

When are there efficient algorithms? Well-separated mixtures : estimation is easier if there is large minimum separation between component means (Dasgupta, ’99) : sep � � µ i − � µ j � sep := min max { σ i , σ j } . i � = j ◮ sep = Ω( d c ) or sep = Ω( k c ) : simple clustering methods, perhaps after dimension reduction (Dasgupta, ’99; Vempala-Wang, ’02; and many more.) 5

When are there efficient algorithms? Well-separated mixtures : estimation is easier if there is large minimum separation between component means (Dasgupta, ’99) : sep � � µ i − � µ j � sep := min max { σ i , σ j } . i � = j ◮ sep = Ω( d c ) or sep = Ω( k c ) : simple clustering methods, perhaps after dimension reduction (Dasgupta, ’99; Vempala-Wang, ’02; and many more.) Recent developments : ◮ No minimum separation requirement, but current methods require exp (Ω( k )) running time / sample size (Kalai-Moitra-Valiant, ’10; Belkin-Sinha, ’10; Moitra-Valiant, ’10) 5

Overcoming barriers to efficient estimation Information-theoretic barrier : R 1 Gaussian mixtures in can require exp (Ω( k )) samples to estimate parameters, even when components are well-separated (Moitra-Valiant, ’10) . 6

Overcoming barriers to efficient estimation Information-theoretic barrier : R 1 Gaussian mixtures in can require exp (Ω( k )) samples to estimate parameters, even when components are well-separated (Moitra-Valiant, ’10) . These hard instances are degenerate in high-dimensions! 6

Overcoming barriers to efficient estimation Information-theoretic barrier : R 1 Gaussian mixtures in can require exp (Ω( k )) samples to estimate parameters, even when components are well-separated (Moitra-Valiant, ’10) . These hard instances are degenerate in high-dimensions! Our result : efficient algorithms for non-degenerate models in high-dimensions ( d ≥ k ) with spherical covariances . 6

Main result Theorem ( H-Kakade, ’13) µ k ⋆ } linearly independent, w i ⋆ > 0 for µ 1 ⋆ , � µ 2 ⋆ , . . . , � Assume { � all i ∈ [ k ] , and Σ ⋆ i = σ 2 ⋆ I for all i ∈ [ k ] . i There is an algorithm that, given independent draws from a mixture of k spherical Gaussians, returns ε -accurate parameters (up to permutation, under ℓ 2 metric) w.h.p. The running time and sample complexity are poly ( d , k , 1 /ε, 1 / w min , 1 /λ min ) where λ min = k th -largest singular value of [ � µ 1 ⋆ | � µ 2 ⋆ | · · · | � µ k ⋆ ] . (Also using new techniques from Anandkumar-Ge-H-Kakade-Telgarsky, ’12.) 7

2. Learning algorithm Introduction Learning algorithm Method-of-moments Choice of moments Solving the moment equations Concluding remarks 8

Method-of-moments Let S ⊂ R d be an i.i.d. sample from an unknown mixture of spherical Gaussians: k � ⋆ N ( � ⋆ , σ 2 ⋆ I ) . � x ∼ w i µ i i i = 1 9

Method-of-moments Let S ⊂ R d be an i.i.d. sample from an unknown mixture of spherical Gaussians: k � ⋆ N ( � ⋆ , σ 2 ⋆ I ) . � x ∼ w i µ i i i = 1 Estimation via method-of-moments (Pearson, 1894) Find parameters θ such that x ) ] ≈ ˆ E θ [ p ( � x ∈ S [ p ( � x ) ] E � for some functions p : R d → R (typically multivar. polynomials) . 9

Method-of-moments Let S ⊂ R d be an i.i.d. sample from an unknown mixture of spherical Gaussians: k � ⋆ N ( � ⋆ , σ 2 ⋆ I ) . � x ∼ w i µ i i i = 1 Estimation via method-of-moments (Pearson, 1894) Find parameters θ such that x ) ] ≈ ˆ E θ [ p ( � x ∈ S [ p ( � x ) ] E � for some functions p : R d → R (typically multivar. polynomials) . Q1 Which moments to use? Q2 How to (approx.) solve moment equations? 9

Which moments to use? 10

Which moments to use? moment order reliable estimates? unique solution? 1 st , 2 nd 1 st - and 2 nd -order moments ( e.g. , mean, covariance) [Chaudhuri-Rao, ’08] [Achlioptas-McSherry, ’05] [Vempala-Wang, ’02] 1 st 2 nd Ω( k ) th order of moments 10

Which moments to use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ 1 st - and 2 nd -order moments ( e.g. , mean, covariance) ◮ Fairly easy to get reliable estimates. x ∈ S [ � x ⊗ � x ] ≈ E θ ⋆ [ � x ⊗ � E � x ] [Chaudhuri-Rao, ’08] [Achlioptas-McSherry, ’05] [Vempala-Wang, ’02] 1 st 2 nd Ω( k ) th order of moments 10

Which moments to use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ 1 st - and 2 nd -order moments ( e.g. , mean, covariance) ◮ Fairly easy to get reliable estimates. x ∈ S [ � x ⊗ � x ] ≈ E θ ⋆ [ � x ⊗ � E � x ] ◮ Can have multiple solutions to moment equations. E θ 1 [ � x ⊗ � x ∈ S [ � x ⊗ � x ] ≈ E θ 2 [ � x ⊗ � x ] ≈ E � x ] , θ 1 � = θ 2 [Chaudhuri-Rao, ’08] [Achlioptas-McSherry, ’05] [Vempala-Wang, ’02] 1 st 2 nd Ω( k ) th order of moments 10

Which moments to use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th Ω( k ) th -order moments ( e.g. , E θ [ degree- k -poly ( � x )] ) [Belkin-Sinha, ’10] [Chaudhuri-Rao, ’08] [Moitra-Valiant, ’10] [Achlioptas-McSherry, ’05] [Lindsay, ’89] [Vempala-Wang, ’02] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 10

Which moments to use? moment order reliable estimates? unique solution? 1 st , 2 nd ✓ ✗ Ω( k ) th ✓ Ω( k ) th -order moments ( e.g. , E θ [ degree- k -poly ( � x )] ) ◮ Uniquely pins down the solution. [Belkin-Sinha, ’10] [Chaudhuri-Rao, ’08] [Moitra-Valiant, ’10] [Achlioptas-McSherry, ’05] [Lindsay, ’89] [Vempala-Wang, ’02] [Prony, 1795] 1 st 2 nd Ω( k ) th order of moments 10

Learning Mixtures of Spherical Gaussians: Moment Methods and - PowerPoint PPT Presentation

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Daniel Hsu and Sham M. Kakade Microsoft Research, New England Also based on work with Anima Anandkumar (UCI) , Rong Ge (Princeton) , Matus Telgarsky (UCSD) . 1

k-Maximum Likelihood Estimator for mixtures of generalized Gaussians ICPR 2012, Tokyo, Japan

Lecture 1: Review of DTFT, Gaussians, and Linear Algebra Mark Hasegawa-Johnson ECE 417:

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision Jia-Bin Huang,

CSC 411: Lecture 13: Mixtures of Gaussians and EM Class based on Raquel Urtasun & Rich

Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden

Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt Eric Price Google Research

Tutorial on Estimation and Multivariate Gaussians STAT 27725/CMSC 25400: Machine Learning

6.1 Gaussians 6 Bayesian Kernel Methods Alexander Smola Introduction to Machine Learning 10-701

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

15-388/688 - Practical Data Science: Anomaly detection and mixture of Gaussians J. Zico Kolter

Lattice-Based Cryptography: Trapdoors, Discrete Gaussians, and Applications Chris Peikert

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

CS 287 Lecture 11 (Fall 2019) Probability Review, Bayes Filters, Gaussians Pieter Abbeel UC

Surfaces How to carry to surface? Texture Synthesis: Surfaces and RD from Wei & Levoy = +

Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* Ph.D. Candidate Kexin Rong*,

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent Konstantin Mishchenko, KAUST

CSC2/458 Parallel and Distributed Systems Automatic Parallelization in Hardware Sreepathi Pai

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University

A Network Coding Approach A Network Coding Approach to IP Traceback Pegah Sattari, Minas Gjokas,

Lecture 7.6: Rings of fractions Matthew Macauley Department of Mathematical Sciences Clemson

Learning Mixtures of Spherical Gaussians: Moment Methods and - PowerPoint PPT Presentation

Learning Mixtures of Spherical Gaussians: Moment Methods and Spectral Decompositions Daniel Hsu and Sham M. Kakade Microsoft Research, New England Also based on work with Anima Anandkumar (UCI) , Rong Ge (Princeton) , Matus Telgarsky (UCSD) . 1

k-Maximum Likelihood Estimator for mixtures of generalized Gaussians ICPR 2012, Tokyo, Japan

Lecture 1: Review of DTFT, Gaussians, and Linear Algebra Mark Hasegawa-Johnson ECE 417:

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision Jia-Bin Huang,

CSC 411: Lecture 13: Mixtures of Gaussians and EM Class based on Raquel Urtasun &amp; Rich

Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden

Tight Bounds for Learning a Mixture of Two Gaussians Moritz Hardt Eric Price Google Research

Tutorial on Estimation and Multivariate Gaussians STAT 27725/CMSC 25400: Machine Learning

6.1 Gaussians 6 Bayesian Kernel Methods Alexander Smola Introduction to Machine Learning 10-701

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

15-388/688 - Practical Data Science: Anomaly detection and mixture of Gaussians J. Zico Kolter

Lattice-Based Cryptography: Trapdoors, Discrete Gaussians, and Applications Chris Peikert

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

CS 287 Lecture 11 (Fall 2019) Probability Review, Bayes Filters, Gaussians Pieter Abbeel UC

Surfaces How to carry to surface? Texture Synthesis: Surfaces and RD from Wei &amp; Levoy = +

Rehashing Kernel Evaluation in High Dimensions Paris Siminelakis* Ph.D. Candidate Kexin Rong*,

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent Konstantin Mishchenko, KAUST

CSC2/458 Parallel and Distributed Systems Automatic Parallelization in Hardware Sreepathi Pai

CS 6316 Machine Learning Dimensionality Reduction Yangfeng Ji Department of Computer Science

Deep Approximation via Deep Learning Zuowei Shen Department of Mathematics National University

A Network Coding Approach A Network Coding Approach to IP Traceback Pegah Sattari, Minas Gjokas,

Lecture 7.6: Rings of fractions Matthew Macauley Department of Mathematical Sciences Clemson

CSC 411: Lecture 13: Mixtures of Gaussians and EM Class based on Raquel Urtasun & Rich

Surfaces How to carry to surface? Texture Synthesis: Surfaces and RD from Wei & Levoy = +