Sharp bounds for learning a mixture
- f two Gaussians
Moritz Hardt Eric Price
IBM Almaden
2014-05-28
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 1 / 25
Sharp bounds for learning a mixture of two Gaussians Moritz Hardt - - PowerPoint PPT Presentation
Sharp bounds for learning a mixture of two Gaussians Moritz Hardt Eric Price IBM Almaden 2014-05-28 Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 1 / 25 Problem 140 140 160 160 180 180
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 1 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 2 / 25
◮ Male/female heights are very close to Gaussian distribution.
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 2 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 3 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 4 / 25
◮ “Method of moments”
◮ Royce ’58, Gridgeman ’70, Gupta-Huang ’80
◮ Clustering: Dasgupta ’99, DA ’00 ◮ Spectral methods: VW ’04, AK ’05, KSV ’05, AM ’05, VW ’05
◮ Extended to general k mixtures: Moitra-Valiant ’10, Belkin-Sinha ’10
◮ Our result: tight upper and lower bounds for the sample complexity. ◮ For k = 2 mixtures, arbitrary d dimensions. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 5 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 6 / 25
◮ Male/female average heights, std. deviations.
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 6 / 25
◮ Male/female average heights, std. deviations.
◮ Quite general: for any mixture of known unimodal distributions.
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 6 / 25
◮ µi to ±ǫσ ◮ σ2
i to ±ǫ2σ2
◮ Previously: O(1/ǫ300). ◮ Moreover: algorithm is almost the same as Pearson (1894).
◮ “σ2” is max variance in any coordinate. ◮ Get each entry of covariance matrix to ±ǫ2σ2. ◮ Previously: O((d/ǫ)300,000).
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 7 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 8 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 9 / 25
140 160 180 200
Height (cm)
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 10 / 25
◮ Convert to “central moments” ◮ M′
2 = M2 − M2 1 is independent of translation.
◮ X4 = M4 − 3M2
2 is independent of adding N(0, σ2).
◮ “Excess kurtosis” coined by Pearson, appearing in every Wikipedia
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 11 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 12 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 13 / 25
◮ Positive roots correspond to mixtures that match on five moments. ◮ Usually have two roots. ◮ Pearson’s proposal: choose candidate with closer 6th moment.
◮ Usually works well Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 14 / 25
◮ Positive roots correspond to mixtures that match on five moments. ◮ Usually have two roots. ◮ Pearson’s proposal: choose candidate with closer 6th moment.
◮ Usually works well ◮ Not when there’s a double root. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 14 / 25
◮ Then for |
◮ Compactness: true for any closed and bounded region.
◮ For unbounded variables, dominating terms show q → ∞.
◮ Issue is that x > 0 isn’t closed. ◮ Can use X3, X4 to get an O(1) approximation α to α. ◮ x ∈ [α/10, α] is closed. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 15 / 25
Large ∆ Small ∆
◮ If components are Ω(1) standard deviations apart, O(1/ǫ2) samples
◮ In general, O(1/ǫ12) samples suffice to get ǫσ accuracy. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 16 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 17 / 25
◮ Can match up {µ1,i, µ2,i} with {µ1,j, µ2,j}.
◮ Project x → v, x for many random v. ◮ For µ′ = µ, will have µ′, v = µ′, v with constant probability.
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 18 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 19 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ Necessary to get sixth moment to ±(ǫσ)6.
◮ Constant means and variances. ◮ Add N(0, σ2) to each mixture as σ grows.
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 20 / 25
◮ H2(P, Q) := 1
2
◮ H2 is subadditive on product measures ◮ Sample complexity is Ω(1/H2(F, F ′)) ◮ H2 TV H, but often H ≈ TV. Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 21 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 22 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 22 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 22 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 22 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 22 / 25
x∼G[∆(x)2] 1/σ2k+2
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 23 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 24 / 25
◮ Θ(ǫ−12 log d) samples necessary and sufficient to estimate µi to
i to ±ǫ2σ2.
◮ If the means have ∆σ separation, just O(ǫ−2∆−12) for ǫ∆σ
◮ Lower bound extends, so Ω(ǫ−6k). ◮ Do we really care about finding an O(ǫ−18) algorithm? ◮ Solving the system of equations gets nasty.
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 25 / 25
Moritz Hardt, Eric Price (IBM) Sharp bounds for learning a mixture of two Gaussians 2014-05-28 26 / 25