Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat - PowerPoint PPT Presentation

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari*, Anshul Kundaje†, Avanti Shrikumar *† *co-first authors †co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student Assistant Professor Dept. of Computer Science Depts. of CS & Genetics

Label Shift Illustrated Train Model

Label Shift Illustrated Original model under-predicts

Label Shift Illustrated update

Label Shift Illustrated We don’t have How do we ground-truth update our ? labels for the classifier? new patients!

Main Contributions - An approach that achieves state-of-the-art on label shift adaptation - Scales to datasets with high-dimensional inputs - Does not require model retraining - Combines Max Likelihood with specific types of calibration. - Calibration with Temp. Scaling (TS) was insufficient (& sometimes harmful!) - Achieved state-of-the-art with extensions of TS (one of which we propose) that correct for systematic bias

Formal Definition of Label Shift Let: - 𝑧 denote our labels (whether or not person has disease) - 𝒚 denote the observed symptoms - 𝑞(𝒚, 𝑧) denote joint distribution (𝒚, 𝑧) at beginning of outbreak (“source domain”) - 𝑟(𝒚, 𝑧) denote joint distribution at widespread stage (“target domain”), when we don’t know labels - Goal: adapt source-domain classifier that predicts 𝑞(𝑧|𝒚) to instead predict 𝑟(𝑧|𝒚) for target domain Core assumption: disease has same symptoms irrespective of outbreak stage, i.e. 𝑞 𝒚 𝑧 = 𝑟(𝒚|𝑧) . - Thus, difference between source & target domain is exclusively caused by shift in label proportions 𝑞(𝑧) and 𝑟(𝑧) . Formally, 𝑟 𝒚, 𝑧 = 𝑞 𝒚|𝑧 𝑟 𝑧 - Also called prior probability shift (Amos, 2008), corresponds to “anti-causal learning” i.e. predicting cause 𝑧 from effects 𝒚 (Schloelkopf, 2012). - Anti-causal learning is appropriate here because diseases status 𝑧 cause the symptoms 𝒚 .

Estimating 𝑟 𝑧 𝒚 with Bayes’ Rule - Although 𝑞(𝒚|𝑧) is preserved, computing it is hard when 𝒚 is high-dimensional. - Much easier to estimate 𝑞(𝑧|𝒚) and 𝑞(𝑧) from the source domain, as 𝑧 is lower-dimensional. - If we know 𝑟(𝑧) , we can retrieve 𝑟 𝑧 𝑦 without ever estimating 𝑞 𝒚 𝑧 using Bayes’ Rule (first shown in Saerens et al., 2002): !(#,𝒚) !(𝒚|#)!(#) We first write 𝑟 𝑧 𝒚 = !(𝒚) = ∑ !∗ !(𝒚|# ∗ )!(# ∗ ) (terms in red are not explicitly known) )(𝒚|#)!(#) Substituting 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧) (label shift assumption), we have 𝑟 𝑧 𝒚 = ∑ !∗ )(𝒚|# ∗ )!(# ∗ ) Through Bayes’ rule, observe that 𝑞 𝒚 𝑧 = )(#|𝒚))(𝒚) )(#) #(!|𝒚)#(𝒚) !(#) #(!) Substituting, we get 𝑟 𝑧 𝒚 = Reminders: #(!|𝒚)#(𝒚) ∑ ! !(#) - 𝒚 denotes features (e.g. symptoms) #(!) - 𝑧 denotes labels (e.g. disease status) #(!|𝒚) #(!) !(#) - 𝑞 indicates source-domain (labels known) 𝑞(𝑦) cancels out, giving 𝑟 𝑧 𝒚 = #(!|𝒚) - 𝑟 indicates target domain (labels unknown) ∑ ! #(!) !(#) - Label shift assumes 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧)

Reminders: - 𝒚 denotes features (e.g. symptoms) - 𝑧 denotes labels (e.g. disease status) - 𝑞 indicates source-domain (labels known) - 𝑟 indicates target domain (labels unknown) - Label shift assumes 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧) - If we estimate 𝑞(𝑧|𝒚) , 𝑞(𝑧) from source data & are told 𝑟(𝑧) , we can find 𝑟(𝑧|𝒚) using Bayes’ rule

A Simple Iterative Approach to Label Shift… In practice, we are not told 𝑟(𝑧) – how can we estimate it? - Could use 𝑞(𝑧|𝒚) to predict on test set & average predictions to estimate 𝑟 𝑧 - Could then use 𝑟(𝑧) to update 𝑞(𝑧|𝒚) , and repeat the process until convergence! update Reminders: - 𝒚 denotes features (e.g. symptoms) - 𝑧 denotes labels (e.g. disease status) - 𝑞 indicates source-domain (labels known) - 𝑟 indicates target domain (labels unknown) - Label shift assumes 𝑟 𝒚 𝑧 = 𝑞(𝒚|𝑧) - If we estimate 𝑞(𝑧|𝒚) , 𝑞(𝑧) from source data & are told 𝑟(𝑧) , we can find 𝑟(𝑧|𝒚) using Bayes’ rule

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat - PowerPoint PPT Presentation

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari, Anshul Kundaje, Avanti Shrikumar *co-first authors co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Oregon Bias-Corrected Climate Modeling Methodologies Wednesday, December 4 th , 2019 ASCE

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Lets review our writing process so far: 1. premise 2. 7 major structure steps 3. character

ROBOD: a Real-time Online Beat and Offbeat Drummer ock 1 , Florian Krebs 1 , 2 , Amaury Durand 3 ,

Welcome to the 3 rd Annual Puget Sound Green Infrastructure Summit Ken Workman Duwamish Tribe

Brenda Senturia and Gary Coopers plant list Hayward * = non-native species *Acacia

Homogeneity in donkey sentences Lucas Champollion New York University champollion@nyu.edu 1

Can we Beat the Square Root Bound for ECDLP over F p 2 via Representation? JNCF 2020 , Luminy

Business: Everyday beat Covering business news in these challenging times Jim Pumarlo Our

Modelling & Datatypes Koen Lindstrm Claessen Software Software = Programs + Data

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat - PowerPoint PPT Presentation

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari*, Anshul Kundaje, Avanti Shrikumar * *co-first authors co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Oregon Bias-Corrected Climate Modeling Methodologies Wednesday, December 4 th , 2019 ASCE

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Lets review our writing process so far: 1. premise 2. 7 major structure steps 3. character

ROBOD: a Real-time Online Beat and Offbeat Drummer ock 1 , Florian Krebs 1 , 2 , Amaury Durand 3 ,

Welcome to the 3 rd Annual Puget Sound Green Infrastructure Summit Ken Workman Duwamish Tribe

Brenda Senturia and Gary Coopers plant list Hayward * = non-native species *Acacia

Homogeneity in donkey sentences Lucas Champollion New York University champollion@nyu.edu 1

Can we Beat the Square Root Bound for ECDLP over F p 2 via Representation? JNCF 2020 , Luminy

Business: Everyday beat Covering business news in these challenging times Jim Pumarlo Our

Modelling &amp; Datatypes Koen Lindstrm Claessen Software Software = Programs + Data

Maximum Likelihood with Bias-Corrected Calibration is Hard-To-Beat at Label Shift Adaptation Amr M. Alexandari, Anshul Kundaje, Avanti Shrikumar *co-first authors co-corresponding authors Amr Alexandari Anshul Kundaje PhD Student

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Modelling & Datatypes Koen Lindstrm Claessen Software Software = Programs + Data