An Investigation of Why Overparameterization Exacerbates Spurious - PowerPoint PPT Presentation

An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa* Aditi Raghunathan* Pang Wei Koh* Percy Liang

Models can latch onto spurious correlations Misleading heuristics; might work on most training examples but may not always hold up input : bird image ML label: bird type model waterbird vs landbird Sagawa et al. (2020), Wah et al. (2011), Zhou et al. (2017)

Models can latch onto spurious correlations Misleading heuristics; might work on most training examples but may not always hold up input : bird image spurious correlation: water background ML prediction : waterbird model ✓ waterbird true label : Sagawa et al. (2020), Wah et al. (2011), Zhou et al. (2017)

Models can latch onto spurious correlations Misleading heuristics; might work on most training examples but may not always hold up input : bird image spurious correlation: land background ML prediction : landbird model ✕ true label : waterbird Sagawa et al. (2020), Wah et al. (2011), Zhou et al. (2017)

Models can latch onto spurious correlations input : face image ML label: hair color model blonde hair vs dark hair true label : Sagawa et al. (2020), Liu et al. (2015)

Models can latch onto spurious correlations input : face image spurious correlation: gender ML prediction : dark hair model ✕ true label : blonde hair Sagawa et al. (2020), Liu et al. (2015)

Models can latch onto spurious correlations label: object waterbird landbird water background spurious attribute: background ma majority minority land background minority ma majority Sagawa et al. (2020)

Models perform well on average label: object waterbird landbird water background spurious attribute: background error 0.05 0. 05 0. 0.21 21 land background avg 0. 0.40 40 0. 0.004 004 average error: 0.03 Sagawa et al. (2020)

But models can have high worst-group error label: object waterbird landbird water background spurious attribute: background error 0. 0.05 05 0.21 0. 21 land background avg worst group 0. 0.40 40 0. 0.004 004 worst-group error: 0.40 Sagawa et al. (2020)

Approaches for improving worst-group error fail on high-capacity models • Upweight minority groups: Lo Low-ca capacity models High-ca Hi capacity models Label y Label y error error 1 -1 1 -1 ✓ ✓ ✓ X Attribute a Attribute a 1 1 ✓ ✓ ✓ X avg worst avg worst -1 -1 group group • More robust to spurious correlation • Relies on spurious correlation • Low worst-group error • High worst-group error Sagawa et al. (2020)

Overparameterization hurts worst-group error for models trained with the reweighted objective av average error wo worst-gr grou oup error or Overparameterized is better than Overparameterized is worse than underparameterized underparameterized Our work: Ou : why do does es over erpa parame meter erization n exacer erba bate e worst-gr grou oup error or?

Overview 1. Empirical results 2. Analytical model and theoretical results 3. Subsampling

Overparameterization exacerbates worst- group error ResNet10 Logistic regression on random features

Intuition: overparameterized models learn the spurious attribute and memorize minority groups Overparameterized non-generalizable “memorizing” generalizable minority majority ! = 1 ! = −1 ! = 1 ! = −1 $ = 1 $ = −1 $ = −1 $ = 1

Overview 1. Empirical results 2. Analytical model and theoretical results 3. Subsampling

Toy example: data majority y 1 -1 1 a -1 Majority fraction minority

Toy example: data Spurious-to-core information ratio (SCR) spurious core co sp

Toy example: data For large N>>n, can be “memorized“ … spurious core noise co no sp

Toy example: linear classifier model • Logistic regression • In overparameterized regime, equivalent to … ma max-ma margin cl classifier spurious core noise co no sp

Worst-group error is provably higher in the overparameterized regime Th Theorem (informal). For any High High SCR majority , fraction there exists such that for all , with high probability, High worst-group error for overparameterized However, with and in the asymptotic regime with , Low worst-group error for underparameterized

Underparameterized models need to learn the core feature to achieve low reweighted loss learning core learning spurious ✓ low reweighted loss X high reweighted loss

In overparameterized regime, minimum-norm inductive bias favors less memorization No Norm scales with the number of points “m “memorized” learning core learning spurious memorizing outliers memorizing minority many examples memorized few examples memorized ✓ low norm X high norm

Intuition: memorizing as few examples as possible under the min-norm inductive bias mo model y 1 -1 1 a -1 Tr Train error

Learn spurious à memorize minority, low norm mo model y 1 -1 0 1 1 a 1 0 -1 Tr Train error

Learn spurious à memorize minority, low norm mo model y 1 -1 0 1 1 a 1 0 -1 Tr Train error points to memorize

Learn spurious à memorize minority, low norm ✓ low norm mo model y 1 -1 0 0 1 a 0 0 -1 Tr Train error points to memorize

Learn core à memorize more, high norm mo model y 1 -1 >0 >0 1 a >0 >0 -1 Tr Train error

Learn core à memorize more, high norm model mo y 1 -1 >0 >0 1 a >0 >0 -1 Train error Tr points to memorize

Learn core à memorize more, high norm X high norm model mo y 1 -1 0 0 1 a 0 0 -1 Train error Tr points to memorize

Overview 1. Empirical results 2. Simulations on synthetic data 3. Subsampling

Reweighting vs subsampling upweighting up ng su subsa sampling ! = 1 ! = 1 $ = 1 $ = 1 ! = −1 $ = −1 ! = −1 $ = −1 ! = 1 $ = −1 ! = 1 $ = −1 ! = −1 $ = 1 ! = −1 $ = 1 # examples # examples Reduces majority fraction • Lowers memorization cost of • learning the core feature Chawla et al. (2011)

Reweighting vs subsampling upweighting up ng su subsa sampling ! = 1 ! = 1 $ = 1 $ = 1 ! = −1 $ = −1 ! = −1 $ = −1 ! = 1 $ = −1 ! = 1 $ = −1 ! = −1 $ = 1 ! = −1 $ = 1 # examples # examples Chawla et al. (2011)

Subsampling the majority group à overparameterization helps worst-group error Upweighting Subsampling Potential tension between using all of the data vs. using large overparameterized models. Both help average error, but can’t have both for good worst-group error.

Thanks! Shiori Pang Wei Percy Aditi Thanks! Sagawa* Koh* Liang Raghunathan* Thank you to Yair Carmon, John Duchi, Tatsunori Hashimoto, Ananya Kumar, Yiping Lu, Tengyu Ma, and Jacob Steinhardt Funded by Open Philanthropy Project Award, Stanford Graduate Fellowship, Google PhD Fellowship, Open Philanthropy Project AI Fellowship, and Facebook Fellowship Program.

An Investigation of Why Overparameterization Exacerbates Spurious - PowerPoint PPT Presentation

An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa* Aditi Raghunathan* Pang Wei Koh* Percy Liang Models can latch onto spurious correlations Misleading heuristics; might work on most training

An Investigation of Why Overparameterization Exacerbates Spurious Correlation Authors: Shiori

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models

Laboratory Investigation of Laboratory Investigation of Laboratory Investigation of Laboratory

Case Investigation of Avian in Southeast Asia Influenza Overview Initiating an investigation

Towards Demystifying Overparameterization in Deep Learning Mahdi Soltanolkotabi Department of

The Role of the Investigation Officer Thursday 14 th June 2018 New castle | Leeds | Manchester

INVESTIGATION UNIT INVESTIGATION UNIT Human interaction with backhoes and excavators

Outbreak Investigation Outbreak Investigation Step by Step Step by Step Darin Areechokchai MD.,

INVESTIGATION UNIT INVESTIGATION UNIT Injury resulting in death from a mobile bolter at

INVESTIGATION UNIT INVESTIGATION UNIT Serious crush injury to lower leg surface of underground

INVESTIGATION UNIT INVESTIGATION UNIT Elevated work platform incident resulting in injuries at

INVESTIGATION UNIT INVESTIGATION UNIT Fatal injuries during maintenance of shearer loader at

INVESTIGATION UNIT INVESTIGATION UNIT Serious injuries from fall from height at underground

The Coroners Investigation in Fatal Road Traffic Collisions Medicolegal investigation of sudden,

Investigation #4 Diffusion and Osmosis www.njctl.org Slide 3 / 36 Investigation #4: Diffusion

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Karen and Kevin Capel (Parents and Research Funders) Christophers Smile European Parliament

MutSig http://www.broadinstitute.org/cancer/cga/mutsig Covered Bases/Total Possible Mutations

Application of RNA aptamers to the control of Application of RNA aptamers to the control of the

r srt ss Prt

Common Challenges with CDBG-Disaster Recovery Programs Guidance for Navigating the Recovery

Update on COVID-19 and Report on First Quarter 2020 Operating & Financial Results April 23,

Short Sellers and Financial Misconduct Jonathan Karpoff University of Washington Xiaoxia Lou

POLI 120N: Contention and Conflict in Africa Professor Adida Explaining civil conflict: economic

An Investigation of Why Overparameterization Exacerbates Spurious - PowerPoint PPT Presentation

An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa* Aditi Raghunathan* Pang Wei Koh* Percy Liang Models can latch onto spurious correlations Misleading heuristics; might work on most training

An Investigation of Why Overparameterization Exacerbates Spurious Correlation Authors: Shiori

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models

Laboratory Investigation of Laboratory Investigation of Laboratory Investigation of Laboratory

Case Investigation of Avian in Southeast Asia Influenza Overview Initiating an investigation

Towards Demystifying Overparameterization in Deep Learning Mahdi Soltanolkotabi Department of

The Role of the Investigation Officer Thursday 14 th June 2018 New castle | Leeds | Manchester

INVESTIGATION UNIT INVESTIGATION UNIT Human interaction with backhoes and excavators

Outbreak Investigation Outbreak Investigation Step by Step Step by Step Darin Areechokchai MD.,

INVESTIGATION UNIT INVESTIGATION UNIT Injury resulting in death from a mobile bolter at

INVESTIGATION UNIT INVESTIGATION UNIT Serious crush injury to lower leg surface of underground

INVESTIGATION UNIT INVESTIGATION UNIT Elevated work platform incident resulting in injuries at

INVESTIGATION UNIT INVESTIGATION UNIT Fatal injuries during maintenance of shearer loader at

INVESTIGATION UNIT INVESTIGATION UNIT Serious injuries from fall from height at underground

The Coroners Investigation in Fatal Road Traffic Collisions Medicolegal investigation of sudden,

Investigation #4 Diffusion and Osmosis www.njctl.org Slide 3 / 36 Investigation #4: Diffusion

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Karen and Kevin Capel (Parents and Research Funders) Christophers Smile European Parliament

MutSig http://www.broadinstitute.org/cancer/cga/mutsig Covered Bases/Total Possible Mutations

Application of RNA aptamers to the control of Application of RNA aptamers to the control of the

r srt ss Prt

Common Challenges with CDBG-Disaster Recovery Programs Guidance for Navigating the Recovery

Update on COVID-19 and Report on First Quarter 2020 Operating &amp; Financial Results April 23,

Short Sellers and Financial Misconduct Jonathan Karpoff University of Washington Xiaoxia Lou

POLI 120N: Contention and Conflict in Africa Professor Adida Explaining civil conflict: economic

Update on COVID-19 and Report on First Quarter 2020 Operating & Financial Results April 23,