Statistical Modelling under Epistemic Data Imprecision Some Results - - PowerPoint PPT Presentation

statistical modelling under epistemic data imprecision
SMART_READER_LITE
LIVE PREVIEW

Statistical Modelling under Epistemic Data Imprecision Some Results - - PowerPoint PPT Presentation

Statistical Modelling under Epistemic Data Imprecision Some Results on Estimating Multinomial Distributions and Logistic Regression for Coarse Categorical Data Julia Plass*, Thomas Augustin*, Marco Cattaneo**, Georg Schollmeyer* *Department of


slide-1
SLIDE 1

Statistical Modelling under Epistemic Data Imprecision

Some Results on Estimating Multinomial Distributions and Logistic Regression for Coarse Categorical Data Julia Plass*, Thomas Augustin*, Marco Cattaneo**, Georg Schollmeyer*

*Department of Statistics, Ludwigs-Maximilians University and **Department of Mathematics, University of Hull

&

21st of July 2015

1 / 8

slide-2
SLIDE 2

Our working group

2 / 8

slide-3
SLIDE 3

Our working group

Thomas Augustin Marco Cattaneo University of Hull Julia Plass research interests: survey statistics deficient data Georg Schollmeyer Talk on Thursday 2 / 8

slide-4
SLIDE 4

Epistemic vs. ontic interpretation (Couso, Dubois, S´

anchez, 2014)

Epistemic imprecision:

“Imprecise observation of something precise”

OBSERVABLE LATENT Coarsening

  • r
  • r
  • r
  • r

=

  • r

=

⇒ Truth is hidden due to the underlying coarsening mechanism

Ontic imprecision:

“Precise observation of something imprecise”

  • r

=

  • r

=

⇒ Truth is represented by coarse

  • bservation

3 / 8

slide-5
SLIDE 5

Examples of data under epistemic imprecision

Epistemic imprecision:

“Imprecise observation of something precise”

OBSERVABLE LATENT Coarsening

  • r
  • r
  • r
  • r

=

  • r

=

⇒ Truth is hidden due to the underlying coarsening mechanism

Examples: Matched data sets with partially

  • verlapping variables

Coarsening as anonymization technique Missing data as special case

Here: PASS-data ΩY = {<, ≥, na} “< 1000”, “≥ 1000” and “< 1000e or ≥ 1000e”(na)

4 / 8

slide-6
SLIDE 6

Already existing approaches

Still common to enforce precise results ⇒ Biased results:

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

coarsening param. 1 coarsening param. 2 0.2 0.4 0.6

  • abs. value

sign − +

Relative bias of π ^A if CAR is assumed (πA=0.6)

Variety of set-valued approaches

via random sets (e.g. Nguyen, 2006) via likelihood-based belief function (Denœux, 2014) using Bayesian approaches (de Cooman, Zaffalon, 2004) via profile likelihood (Cattaneo, Wiencierz, 2012) Here: Likelihood-based approach influenced by methodology of partial identification (Manski, 2003) coarse categorical data only

5 / 8

slide-7
SLIDE 7

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator

6 / 8

slide-8
SLIDE 8

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator 6 / 8

slide-9
SLIDE 9

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator 6 / 8

slide-10
SLIDE 10

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator 6 / 8

slide-11
SLIDE 11

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator 6 / 8

slide-12
SLIDE 12

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator

6 / 8

slide-13
SLIDE 13

Basic idea for the i.i.d. case (regression cf. poster)

Y

Likelihood for parameters p = (p1, . . . , p|ΩY|−1)T is uniquely maximized by ˆ pY = nY

n , Y ∈ {1, . . ., |ΩY| − 1}

ˆ p|ΩY| = 1 − |ΩY|−1

m=1

ˆ pm.

Observation model Q Y latent variable coarse data

error-freeness qY |y = P(Y = Y |Y = y) pY i = P(Yi = Y i), i = 1, . . . , n coarsening mechanism

Φ(γ) = p

πi1 = π1, . . . , πiK = πK

L(p) ∝

  • Y ∈ΩY p

nY Y

Main goal:

γ= (qT

Y |y, πT

y )T

under parameter transformations, i.e.:

ˆ πy ∈ n{y}

n ,

  • Y ∋y nY

n

  • ˆ

qY |y ∈

  • 0,

nY n{y}+nY

  • LATENT

OBSERVABLE

and thus

and the invariance of the likelihood Use the connection between p and γ

ˆ Γ = {γ | Φ(γ) = ˆ p}

Estimation of πij = P(Yi = j) Use random-set perspective and determine ˆ pY maximum-likelihood estimator Illustration (PASS data)

n< = 238, n≥ = 835, nna = 338

ˆ π< ∈ 238

1411, 238+338 1411

  • 6 / 8
slide-14
SLIDE 14

Reliable incorporation of auxiliary information

Starting from point-identifying assumptions, we use sensitivity parameters to allow inclusion of partial knowledge.

Assumption about exact value

  • f R =

qna|≥ qna|< (Nordheim, 1984): e.g. Q specified by R=1 , R=4 where R=1 corresponds to CAR (Heitjan, Rubin, 1991).

qna|< q

na|≥ 1 1

7 / 8

slide-15
SLIDE 15

Reliable incorporation of auxiliary information

Starting from point-identifying assumptions, we use sensitivity parameters to allow inclusion of partial knowledge.

Assumption about exact value

  • f R =

qna|≥ qna|< (Nordheim, 1984): e.g. Q specified by R=1 , R=4 where R=1 corresponds to CAR (Heitjan, Rubin, 1991).

Rough evaluation of R:

e.g. Q specified by R ≤ 1: low income group has a higher tendency to report “na”

qna|< qna|< qna|< qna|≥

1 1

7 / 8

slide-16
SLIDE 16

Summary and outlook

Via the observation model Q maximum-likelihood estimators referring to the latent variable may be obtained for both cases

... the homogeneous case ... the case with categorical covariates (cf. poster)

Proper inclusion of auxiliary information via further restrictions on Q Next steps: Inclusion of auxiliary information via sets of priors Likelihood-based hypothesis tests and uncertainty regions for coarse categorical data Consideration of other “deficiency” processes

8 / 8

slide-17
SLIDE 17

References

Couso, I., Dubois, D., S´ anchez, L. Random Sets and Random Fuzzy Sets as Ill-Perceived Random Variables, Springer, 2014. Heitjan, D., Rubin, D. Ignorability and Coarse Data, Annals of Statistics, 1991. Manski, C. Partial Identification of Probability Distributions, Springer, 2003.

  • E. Nordheim.

Inference from nonrandomly missing categorical data:An example from a genetic study on Turner’s syndrome, J. Am. Stat. Assoc., 1984. Vansteelandt, S., Goetghebeur, E., Kenward, M., Molenberghs, G. Ignorance and uncertainty regions as inferential tools in a sensitivity analysis, Stat. Sin., 2006.

8 / 8