advances in ml theory meets practice
play

Advances in ML: Theory Meets Practice Julie Josse Review on Missing - PowerPoint PPT Presentation

Advances in ML: Theory Meets Practice Julie Josse Review on Missing Values Methods with Demos Lausanne, 26 January Julie Josse Advances in ML: Theory Meets Practice Dealing with missing values PCA with missing values/Matrix completion


  1. Advances in ML: Theory Meets Practice Julie Josse Review on Missing Values Methods with Demos Lausanne, 26 January Julie Josse Advances in ML: Theory Meets Practice

  2. Dealing with missing values PCA with missing values/Matrix completion Categorical/mixed data Julie Josse Advances in ML: Theory Meets Practice

  3. PCA imputation

  4. PCA (complete) Find the subspace t ghat best represents the data Figure 1: Camel or dromedary? ⇒ Best approximation with projection ⇒ Best representation of the variability ⇒ Do not distort the distances between individuals Julie Josse Advances in ML: Theory Meets Practice

  5. PCA (complete) Find the subspace t ghat best represents the data Figure 1: Camel or dromedary? source J.P. Fénelon ⇒ Best approximation with projection ⇒ Best representation of the variability ⇒ Do not distort the distances between individuals Julie Josse Advances in ML: Theory Meets Practice

  6. PCA reconstruction X -2.00 -2.74 -1.56 -0.77 3 -1.11 -1.59 V' -0.67 -1.13 -0.22 -1.22 2 0.22 -0.52 0.67 1.46 1 1.11 0.63 1.56 1.10 x2 2.00 1.00 0 μ ^ ≈ -1 X F μ ^ -2 -2.16 -2.58 -0.96 -1.35 -3 -1.15 -1.55 -0.70 -1.09 -3 -2 -1 0 1 2 3 -0.53 -0.92 x1 0.04 -0.34 1.24 0.89 1.05 0.69 1.50 1.15 1.67 1.33 ⇒ Minimizes distance between observations and their projection ⇒ Approx X n × p with a low rank matrix S < p � A � 2 2 = tr( AA ⊤ ): � � � X − µ � 2 argmin µ 2 : rank ( µ ) ≤ S Julie Josse Advances in ML: Theory Meets Practice

  7. PCA reconstruction X -2.00 -2.74 NA -0.77 3 -1.11 -1.59 V' -0.67 -1.13 -0.22 NA 2 0.22 -0.52 0.67 1.46 1 NA 0.63 1.56 1.10 x2 2.00 1.00 0 μ ^ ≈ -1 X F μ ^ -2 -2.16 -2.58 -0.96 -1.35 -3 -1.15 -1.55 -0.70 -1.09 -3 -2 -1 0 1 2 3 -0.53 -0.92 x1 0.04 -0.34 1.24 0.89 1.05 0.69 1.50 1.15 1.67 1.33 ⇒ Minimizes distance between observations and their projection ⇒ Approx X n × p with a low rank matrix S < p � A � 2 2 = tr( AA ⊤ ): � � � X − µ � 2 argmin µ 2 : rank ( µ ) ≤ S 1 µ PCA = U n × S Λ ′ 1 SVD X : ˆ S × S V 2 F = U Λ PC - scores p × S 2 ′ V principal axes - loadings = F n × S V p × S Julie Josse Advances in ML: Theory Meets Practice

  8. Missing values in PCA ⇒ PCA: least squares � � � X n × p − µ n × p � 2 argmin µ 2 : rank ( µ ) ≤ S ⇒ PCA with missing values: weighted least squares � � � W n × p ∗ ( X − µ ) � 2 argmin µ 2 : rank ( µ ) ≤ S with W ij = 0 if X ij is missing, W ij = 1 otherwise; ∗ elementwise multiplication Many algorithms: weighted alternating least squares (Gabriel & Zamir, 1979); iterative PCA (Kiers, 1997) Julie Josse Advances in ML: Theory Meets Practice

  9. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 2 1 x2 0 -1 -2 -2 -1 0 1 2 3 x1 Julie Josse Advances in ML: Theory Meets Practice

  10. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.00 1 2.0 1.98 x2 0 -1 -2 -2 -1 0 1 2 3 x1 Initialization ℓ = 0: X 0 (mean imputation) Julie Josse Advances in ML: Theory Meets Practice

  11. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.00 1 2.0 1.98 x2 x1 x2 -1.98 -2.04 0 -1.44 -1.56 0.15 -0.18 1.00 0.57 -1 2.27 1.67 -2 -2 -1 0 1 2 3 x1 PCA on the completed data set → ( U ℓ , Λ ℓ , V ℓ ); Julie Josse Advances in ML: Theory Meets Practice

  12. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.00 1 2.0 1.98 x2 x1 x2 -1.98 -2.04 0 -1.44 -1.56 0.15 -0.18 1.00 0.57 2.27 1.67 -1 -2 -2 -1 0 1 2 3 x1 µ ℓ = U ℓ Λ 1 / 2 ℓ V ℓ ′ Missing values imputed with the fitted matrix ˆ Julie Josse Advances in ML: Theory Meets Practice

  13. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.00 1 2.0 1.98 x2 x1 x2 -1.98 -2.04 0 -1.44 -1.56 0.15 -0.18 1.00 0.57 -1 2.27 1.67 x1 x2 -2.0 -2.01 -2 -1.5 -1.48 0.0 -0.01 -2 -1 0 1 2 3 1.5 0.57 2.0 1.98 x1 X ℓ = W ∗ X + ( 1 − W ) ∗ ˆ The new imputed dataset is ˆ µ ℓ Julie Josse Advances in ML: Theory Meets Practice

  14. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.57 1 2.0 1.98 x2 0 -1 x1 x2 -2.0 -2.01 -2 -1.5 -1.48 0.0 -0.01 -2 -1 0 1 2 3 1.5 0.57 2.0 1.98 x1 Julie Josse Advances in ML: Theory Meets Practice

  15. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.57 1 2.0 1.98 x2 x1 x2 -2.00 -2.01 0 -1.47 -1.52 0.09 -0.11 1.20 0.90 2.18 1.78 -1 x1 x2 -2.0 -2.01 -2 -1.5 -1.48 0.0 -0.01 -2 -1 0 1 2 3 1.5 0.90 2.0 1.98 x1 Julie Josse Advances in ML: Theory Meets Practice

  16. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 NA 3 2.0 1.98 x1 x2 2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 1.5 0.00 1 2.0 1.98 x2 x1 x2 -1.98 -2.04 0 -1.44 -1.56 0.15 -0.18 1.00 0.57 2.27 1.67 -1 x1 x2 -2.0 -2.01 -2 -1.5 -1.48 0.0 -0.01 -2 -1 0 1 2 3 1.5 0.57 2.0 1.98 x1 Steps are repeated until convergence Julie Josse Advances in ML: Theory Meets Practice

  17. Iterative PCA x1 x2 -2.0 -2.01 -1.5 -1.48 0.0 -0.01 3 1.5 NA 2.0 1.98 2 1 x2 0 -1 x1 x2 -2.0 -2.01 -2 -1.5 -1.48 0.0 -0.01 -2 -1 0 1 2 3 1.5 1.46 2.0 1.98 x1 PCA on the completed data set → ( U ℓ , Λ ℓ , V ℓ ) µ ℓ = U ℓ Λ 1 / 2 ℓ V ℓ ′ Missing values imputed with the fitted matrix ˆ Julie Josse Advances in ML: Theory Meets Practice

  18. Iterative PCA 1 initialization ℓ = 0: X 0 (mean imputation) 2 step ℓ : (a) PCA on the completed data → ( U ℓ , Λ ℓ , V ℓ ); S dimensions kept µ S ) ℓ = U ℓ Λ 1 / 2 ℓ V ℓ ′ (b) missing values are imputed with (ˆ X ℓ = W ∗ X + ( 1 − W ) ∗ (ˆ the new imputed data is ˆ µ S ) ℓ 3 steps of estimation and imputation are repeated Julie Josse Advances in ML: Theory Meets Practice

  19. Iterative PCA 1 initialization ℓ = 0: X 0 (mean imputation) 2 step ℓ : (a) PCA on the completed data → ( U ℓ , Λ ℓ , V ℓ ); S dimensions kept µ S ) ℓ = U ℓ Λ 1 / 2 ℓ V ℓ ′ (b) missing values are imputed with (ˆ X ℓ = W ∗ X + ( 1 − W ) ∗ (ˆ the new imputed data is ˆ µ S ) ℓ 3 steps of estimation and imputation are repeated � 0 , σ 2 � iid ⇒ ˆ µ from incomplete data: EM algo X = µ + ε, ε ij ∼ N � ˜ with µ of low rank , x ij = � S λ s ˜ u is ˜ v js + ε ij s =1 ⇒ Completed data: good imputation (matrix completion, Netflix) Julie Josse Advances in ML: Theory Meets Practice

  20. Iterative PCA 1 initialization ℓ = 0: X 0 (mean imputation) 2 step ℓ : (a) PCA on the completed data → ( U ℓ , Λ ℓ , V ℓ ); S dimensions kept µ S ) ℓ = U ℓ Λ 1 / 2 ℓ V ℓ ′ (b) missing values are imputed with (ˆ X ℓ = W ∗ X + ( 1 − W ) ∗ (ˆ the new imputed data is ˆ µ S ) ℓ 3 steps of estimation and imputation are repeated � 0 , σ 2 � iid ⇒ ˆ µ from incomplete data: EM algo X = µ + ε, ε ij ∼ N � ˜ with µ of low rank , x ij = � S λ s ˜ u is ˜ v js + ε ij s =1 ⇒ Completed data: good imputation (matrix completion, Netflix) Reduction of variability (imputation by U Λ 1 / 2 V ′ ) Selecting S ? Generalized cross-validation (J. & Husson, 2012) Julie Josse Advances in ML: Theory Meets Practice

  21. Soft thresholding iterative SVD ⇒ Overfitting issues of iterative PCA: many parameters ( U n × S , V S × p )/observed values ( S large - many NA); noisy data ⇒ Regularized versions. Init - estimation - imputation steps: √ λ s u is v js is replaced by = � S µ PCA imputation ˆ ij s =1 = � p � √ λ s − λ � µ Soft a "shrunk" impute ˆ + u is v js ij s =1 � � � W ∗ ( X − µ ) � 2 X = µ + ε argmin µ 2 + λ � µ � ∗ SoftImpute for large matrices. T. Hastie, R. Mazumber, 2015, Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares. JMLR Implemented in softImpute Julie Josse Advances in ML: Theory Meets Practice

  22. Regularized iterative PCA ⇒ Init. - estimation - imputation steps. In missMDA (Youtube) The imputation step: S � � µ PCA ˆ = λ s u is v js ij s =1 is replaced by a "shrunk" imputation step (Efron & Morris 1972): � λ s − ˆ � � �� � � S � S σ 2 σ 2 λ s − ˆ µ rPCA ˆ = λ s u is v js = √ λ s u is v js ij λ s s =1 s =1 σ 2 small → regularized PCA ≈ PCA σ 2 large → mean imputation n � p s = S +1 λ s σ 2 = RSS ˆ ddl = ( X n × p ; U n × S ; V p × S ) np − p − nS − pS + S 2 + S Julie Josse Advances in ML: Theory Meets Practice

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend