Learning Models from Data with Measurement Error: Tackling - - PowerPoint PPT Presentation

learning models from data with measurement error tackling
SMART_READER_LITE
LIVE PREVIEW

Learning Models from Data with Measurement Error: Tackling - - PowerPoint PPT Presentation

Learning Models from Data with Measurement Error: Tackling Underreporting Roy Adams, Yuelong Ji, Xiaobin Wang, and Suchi Saria Introduction Goal: Estimate the distribution of outcome Y given exposure A and covariates X from non-experimental data.


slide-1
SLIDE 1

Learning Models from Data with Measurement Error: Tackling Underreporting

Roy Adams, Yuelong Ji, Xiaobin Wang, and Suchi Saria

slide-2
SLIDE 2

Introduction

Goal: Estimate the distribution of outcome Y given exposure A and covariates X from non-experimental data.

slide-3
SLIDE 3

Introduction

Measurement error is common source of bias when using non- experimental data. Goal: Estimate the distribution of outcome Y given exposure A and covariates X from non-experimental data.

slide-4
SLIDE 4

Introduction

Measurement error is common source of bias when using non- experimental data.

  • We focus on underreporting error.

Goal: Estimate the distribution of outcome Y given exposure A and covariates X from non-experimental data.

slide-5
SLIDE 5

Introduction

Measurement error is common source of bias when using non- experimental data.

  • We focus on underreporting error.
  • E.g. survey data of sensitive variables such as drug use.

Goal: Estimate the distribution of outcome Y given exposure A and covariates X from non-experimental data.

slide-6
SLIDE 6

A X Y Ã

Model

Updated goal: Estimate the distribution of outcome Y given exposure A and covariates X when exposure observations à are subject to underreporting errors.

Model

slide-7
SLIDE 7

A X Y Ã

Model

Updated goal: Estimate the distribution of outcome Y given exposure A and covariates X when exposure observations à are subject to underreporting errors. Assumptions:

Model

slide-8
SLIDE 8

A X Y Ã

Model

Updated goal: Estimate the distribution of outcome Y given exposure A and covariates X when exposure observations à are subject to underreporting errors. Assumptions:

  • 1. Strict underreporting (A = 0 ⟹ Ã = 0)

Model

slide-9
SLIDE 9

A X Y Ã

Model

Updated goal: Estimate the distribution of outcome Y given exposure A and covariates X when exposure observations à are subject to underreporting errors. Assumptions:

  • 1. Strict underreporting (A = 0 ⟹ Ã = 0)
  • 2. Ã is independent of X given A

Model

slide-10
SLIDE 10

A X Y Ã

Model

)

Outcome model … p𝜄(Y | A, X) Exposure model … p𝜚(A | X) Error model ……… p𝜐(Ã | A)

Model

slide-11
SLIDE 11

A X Y Ã

Model

max

θ,ϕ,τ ∑ i

log∑

a

pθ(yi|a, xi)pτ( ˜ ai|a)pϕ(a|xi)

)

Maximize the log marginal likelihood: Outcome model … p𝜄(Y | A, X) Exposure model … p𝜚(A | X) Error model ……… p𝜐(Ã | A)

Model

slide-12
SLIDE 12

Identifiability Identifiability

slide-13
SLIDE 13

Identifiability

We prove three separate identifiability conditions:

Identifiability

slide-14
SLIDE 14

Identifiability

We prove three separate identifiability conditions:

  • 1. The error distribution is known

Identifiability

slide-15
SLIDE 15

Identifiability

We prove three separate identifiability conditions:

  • 1. The error distribution is known
  • 2. We have a second error-prone exposure observation

Identifiability

slide-16
SLIDE 16

Identifiability

We prove three separate identifiability conditions:

  • 1. The error distribution is known
  • 2. We have a second error-prone exposure observation
  • 3. Under assumptions about the form of the exposure

distribution (see paper/poster for details)

Identifiability

slide-17
SLIDE 17

Identifiability

We prove three separate identifiability conditions:

  • 1. The error distribution is known
  • 2. We have a second error-prone exposure observation
  • 3. Under assumptions about the form of the exposure

distribution (see paper/poster for details)

Identifiability

In particular: If X is not independent of A and p(A | X) is a logit, probit, or cloglog regression model, then p(Y, Ã | X) is identifiable.

slide-18
SLIDE 18

Drug use and childhood obesity

0.0 0.1 0.2 0.3 0.4 0.5 0.6

τ

0.00 0.05 0.10 0.15 0.20 0.25

5isk difference

6mRking (a)

sensitivity analysis subject-reSRrt subject-reSRrt + cRtinine

Maternal drug use and childhood obesity

Underreporting rate Average causal effect

slide-19
SLIDE 19

Drug use and childhood obesity Thanks!

Come see poster #75