missing data imputation using optimal transport
play

Missing Data Imputation using Optimal Transport Boris Muzellec Julie - PowerPoint PPT Presentation

Missing Data Imputation using Optimal Transport Boris Muzellec Julie Josse Claire Boyer Marco Cuturi <latexit


  1. Missing Data Imputation using Optimal Transport Boris Muzellec Julie Josse Claire Boyer Marco Cuturi

  2. <latexit sha1_base64="xfKy83h87erRWERr/xzJ2E0Lv+E=">AB7HicbVDLTgJBEOzF+IL9ehlIjHxRHaNiR6JXjxiIo8ENmR26IUJM7ObmVkiIfyE8YZe/R3P/o0D7kHBOlV3Vae7K0oFN9b3v7zCxubW9k5xt7S3f3B4VD4+aZok0wbLBGJbkfUoOAKG5Zbge1UI5WRwFY0ul/orTFqwxP1ZCcphpIOFI85o9a12l0u3RY0vXLFr/pLkHUS5KQCOeq98me3n7BMorJMUGM6gZ/acEq15UzgrNTNDKaUjegAO/0xT42iEk04fV6ePCMXcaKJHSJZ1r/tUyqNmcjIeS1Q7OqLZr/aZ3MxrfhlKs0s6iYszgtzgSxCVl8TvpcI7Ni4ghlmrtDCRtSTZl1+ZRcAsHqv+ukeVUN/GrweF2p3eVZFOEMzuESAriBGjxAHRrAQMALzOHNU96rN/fef6wFL585hT/wPr4BZIGPEQ=</latexit> <latexit sha1_base64="xfKy83h87erRWERr/xzJ2E0Lv+E=">AB7HicbVDLTgJBEOzF+IL9ehlIjHxRHaNiR6JXjxiIo8ENmR26IUJM7ObmVkiIfyE8YZe/R3P/o0D7kHBOlV3Vae7K0oFN9b3v7zCxubW9k5xt7S3f3B4VD4+aZok0wbLBGJbkfUoOAKG5Zbge1UI5WRwFY0ul/orTFqwxP1ZCcphpIOFI85o9a12l0u3RY0vXLFr/pLkHUS5KQCOeq98me3n7BMorJMUGM6gZ/acEq15UzgrNTNDKaUjegAO/0xT42iEk04fV6ePCMXcaKJHSJZ1r/tUyqNmcjIeS1Q7OqLZr/aZ3MxrfhlKs0s6iYszgtzgSxCVl8TvpcI7Ni4ghlmrtDCRtSTZl1+ZRcAsHqv+ukeVUN/GrweF2p3eVZFOEMzuESAriBGjxAHRrAQMALzOHNU96rN/fef6wFL585hT/wPr4BZIGPEQ=</latexit> <latexit sha1_base64="xfKy83h87erRWERr/xzJ2E0Lv+E=">AB7HicbVDLTgJBEOzF+IL9ehlIjHxRHaNiR6JXjxiIo8ENmR26IUJM7ObmVkiIfyE8YZe/R3P/o0D7kHBOlV3Vae7K0oFN9b3v7zCxubW9k5xt7S3f3B4VD4+aZok0wbLBGJbkfUoOAKG5Zbge1UI5WRwFY0ul/orTFqwxP1ZCcphpIOFI85o9a12l0u3RY0vXLFr/pLkHUS5KQCOeq98me3n7BMorJMUGM6gZ/acEq15UzgrNTNDKaUjegAO/0xT42iEk04fV6ePCMXcaKJHSJZ1r/tUyqNmcjIeS1Q7OqLZr/aZ3MxrfhlKs0s6iYszgtzgSxCVl8TvpcI7Ni4ghlmrtDCRtSTZl1+ZRcAsHqv+ukeVUN/GrweF2p3eVZFOEMzuESAriBGjxAHRrAQMALzOHNU96rN/fef6wFL585hT/wPr4BZIGPEQ=</latexit> The missing data issue • Big data is plagued with missing values • What to do? Option 1: Remove entries with missing values information loss, not sustainable ⇒ = Example with 25% missing rate: 2d 3d 6d 10d With 1% missing rate: 5d: 95% rows kept 300d: 5% rows kept Option 2: Impute with reasonable guesses

  3. Outline 1. Missing data and Optimal Transport 2. Non-parametric imputation with OT 3. Fitting parametric imputation models with OT

  4. How to impute? - Mean imputation - Regression (conditional expectation) Deforms joint and marginal distributions Preserves distributions • Using a conditional model: - With logistic, multinomial, Poisson regressions: R’s mice (Van Buuren, 2011) • Assuming a joint model: - EM + Gaussian distribution: Amelia (Honacker et al., 2011) - Low-rank models: Softimpute (Mazumder et al., 2010) - VAE and GAN: MIWAE (Mattei & Frellsen, 2019), GAIN (Yoon et al., 2018) - … This work: Preserves distributions Parametric assumption not necessary

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend