Attention-based Learning for Missing Data Imputation in HoloClean - PowerPoint PPT Presentation

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 , Ihab F. Ilyas 1 Theodoros Rekatsinas 2 1 2

Problem Missing data is a persistent problem in many fields ● Sciences ○ ○ Data mining ○ Finance Missing data can reduce downstream statistical power ● Most models require complete data ● 2

Modern ML for Data Cleaning: HoloClean Framework for holistic data repairing driven by probabilistic inference ● Unifies qualitative (integrity constraints and external sources) with ● quantitative data repairing methods (statistical inference) Available at www.holoclean.io 3

Missing Values in Real Data sets 4

Challenges Values may not be missing completely at random (MCAR/i.i.d.) but ● systematically Mixed types (discrete and continuous) introduce mixed distributions ● Drawbacks of current methods: ● Heuristic-based (impute mean/mode ) ○ Requires predefined rules ○ Complex ML models that are difficult to train , slow , hard to interpret ○ 5

Contribution A simple attention architecture that exploits structure across attributes Our results: > 54% lower run time than baselines ● Missing at random (MCAR) : 3% higher ● accuracy and 26.7% reduction in normalized-RMS Systematic : 43% higher accuracy and ● 7.4% reduction in normalized-RMS 6

How does AimNet improve on the MVI problem? Key idea: Exploit the structure in data model that learns schema-level relationships between attributes dot product attention 7

Architecture overview (1) Model mixed data Encode w/ non-linear ● Step 3 layers (continuous) Embedding lookup ● (discrete) Step 2 (2) Identify relevant context Attention helps identify ● schema-level importance (3) Prediction Inverse of encoding ● Step 1 (continuous) Softmax over possible ● values (discrete) Learned via self-supervision: mask and predict observed values 8

How do we encode mixed types? Convert context values to vector embeddings. [0.1, 1.2, -5, 2, 15] Output: [1, 0, -1.3, 5, -7] embeddings Dense layer (5x5) (Name, Joe) [0, 2, -1, 2.5, 1] (City, Chicago) [1, 0, -1.3, 5, -7] Activation (Zip Code, 10010) ... Dense layer (5x2) (City, Chicago) [-12, 3.5] Input: raw data Continuous values Discrete values 9

Attention layer Attention where Q/K are derived from attributes rather than values Output: [-1, 5, 0.5, 1.2, -2] context vector [0.09, 0.90, 0.01] Target: County T ) softmax(QK County (K) (City, Chicago) (Zip code, 60603) (Age, 35) [1, 0, -1.3, 5, -7] [1.2, 0.5, -2, 3, 5] [0, 1, 2, 3, -1.5] (V City ) (V Zip code ) (V age ) 10

Prediction Input: context vector [-1, 5, 0.5, 1.2, -2] matmul County A: [0, 100, 0, 0, 0] T County B: [0, 0, 0, 0, 50] T Dense layer (5x5) softmax Activation [0.99, 0.01] Dense layer (1x5) Output: 100600 Output: County A Salary (continuous) County (discrete) 11

Questions Can AimNet impute missing completely at random ( MCAR/i.i.d. ) values? ● Does AimNet's emphasis on structure help it with systematic bias in missing ● values? Can we interpret the structure that AimNet learns in the data? ● 12

Mostly discrete Experimental setup 14 real data sets ● Missing types ● MCAR/i.i.d. ○ Systematic ○ Evaluation ● Accuracy (discrete) ○ normalized-RMS (continuous) ○ Mostly continuous Training: self-supervised learning where targets = observable values ● 13

Experiment results > 54% lower run time than baselines ● Missing at random (MCAR) : 3% higher accuracy and 26.7% reduction in normalized-RMS ● Systematic : 43% higher accuracy and 7.4% reduction in normalized-RMS ● Attention identifies structure between attributes that helps it deal with systematic bias in missing values 14

HCQ XGB MIDAS GAIN MF MICE MCAR (20%) HoloClean with XGBoost Denoising GAN Random Linear regression with quantization Autoencoder Forest multiple iterations AimNet outperforms on both discrete and continuous attributes on almost all data sets 3% in accuracy ● 26.7% in NRMS ● 15

Chicago taxi data set Benchmark in TFX data validation pipeline ● Pickup/dropoff info, fare, company ● Naturally-occurring missing values w/ ground truth ● Systematic bias between companies ● All within "17031040401" census tract 16

Chicago taxi: naturally-occurring missing data Values are missing systematically (not i.i.d.) ● Attention learns relationship between ● Census Tract and Latitude/Longitude 17

Chicago taxi results AimNet outperforms baselines by a huge margin Accuracy: 73% vs 27% (XGB) ● Run time: 53 mins. vs 124 mins (HoloClean w/ Quantization) ● 18

What if we inject systematic errors into other real data sets? AimNet still outperforms baselines in almost all cases 19

Does the attention layer actually help? As the domain size increases , attention leads to better performance Learns schema-level dependencies ● 5 classes 50 classes 200 classes 20

Architecture summary Encode : learns projections for continuous and embeddings for discrete ● data Structure : new variation of attention to learn structural dependencies ● between attributes Prediction : mixed-type prediction using projections (continuous) and ● softmax classification (discrete) 21

Conclusion A simple attention-based architecture modestly outperforms existing ● methods on i.i.d. missing values AimNet outperforms state of the art in the presence of systematically ● missing values by a large margin Attention mechanism learns structural properties of the data which ● improves MVI with systematic bias 22

Appendix 23

Hyperparameter Sensitivity 24

Multi-task and Single-task 25

MCAR (40% missing) results 26

MCAR (60% missing) results 27

Census Tracts form Voronoi-like cells 28

Attention-based Learning for Missing Data Imputation in HoloClean - PowerPoint PPT Presentation

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 , Ihab F. Ilyas 1 Theodoros Rekatsinas 2 1 2 Problem Missing data is a persistent problem in many fields Sciences Data mining

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Missing data and data imputation with the Swiss Household Panel Andr Berchtold LIVES, LINES,

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Reference based multiple imputation; for sensitivity analysis of clinical trials with missing

Handling missing data in Stata: Imputation and likelihood-based approaches Rose Medeiros

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate

Consistent Variance Estimates for Multiple Multiple imputation Imputation in R MI alternative

Accurate Regression Parameters and Summary Statistics Estimation in Data with Censored Missing

Incremental Algorithms for Missing Data Imputation based on Recursive Partitioning Claudio

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Missing Data Imputation using Optimal Transport Boris Muzellec Julie Josse Claire Boyer

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Show, Attend, and Tell Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy

Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

Modeling Sub-Document Attention Using Viewport Time Max Grusky Jeiran Jahani Josh Schwartz Dan

Hint-Based Training for Non-Autoregressive Translation Zhuohan Li Zi Lin Fei Tian Tao Qin

Attention Attention is the taking possession by the mind, in clear and vivid form, of one

Attention-based Learning for Missing Data Imputation in HoloClean - PowerPoint PPT Presentation

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 , Ihab F. Ilyas 1 Theodoros Rekatsinas 2 1 2 Problem Missing data is a persistent problem in many fields Sciences Data mining

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Missing data and data imputation with the Swiss Household Panel Andr Berchtold LIVES, LINES,

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Performing and tracking imputation Nicholas Tierney Statistician DataCamp Dealing With Missing

Reference based multiple imputation; for sensitivity analysis of clinical trials with missing

Handling missing data in Stata: Imputation and likelihood-based approaches Rose Medeiros

Overview Multiple Imputation for Multilevel Data Bayesian estimation for MLMs Univariate

Consistent Variance Estimates for Multiple Multiple imputation Imputation in R MI alternative

Accurate Regression Parameters and Summary Statistics Estimation in Data with Censored Missing

Incremental Algorithms for Missing Data Imputation based on Recursive Partitioning Claudio

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Missing Values in SAS Magnus Mengelbier Director PhUSE 2011 1 Topics Introduction

Missing Data Imputation using Optimal Transport Boris Muzellec Julie Josse Claire Boyer

The Attention Mechanism &amp; Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Show, Attend, and Tell Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy

Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

Modeling Sub-Document Attention Using Viewport Time Max Grusky Jeiran Jahani Josh Schwartz Dan

Hint-Based Training for Non-Autoregressive Translation Zhuohan Li Zi Lin Fei Tian Tao Qin

Attention Attention is the taking possession by the mind, in clear and vivid form, of one

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to