A Notion of Suffjciency for Statistical Modelling of Interval Data - - PowerPoint PPT Presentation

a notion of suffjciency for statistical modelling of
SMART_READER_LITE
LIVE PREVIEW

A Notion of Suffjciency for Statistical Modelling of Interval Data - - PowerPoint PPT Presentation

A Notion of Suffjciency for Statistical Modelling of Interval Data T. Augustin, E. Endres, M.E.G.V. Cattaneo, P. Fink, J. Pla, U. Ptter, M. Seitz, G. Schollmeyer, A. Wiencierz Durham, WPMSIIP 2016 Augustin et al. A Notion of Suffjciency


slide-1
SLIDE 1

A Notion of Suffjciency for Statistical Modelling

  • f Interval Data
  • T. Augustin, E. Endres, M.E.G.V. Cattaneo, P. Fink, J. Plaß, U. Pötter,
  • M. Seitz, G. Schollmeyer, A. Wiencierz

Durham, WPMSIIP 2016

Augustin et al. A Notion of Suffjciency for Interval Data 1 / 49

slide-2
SLIDE 2

1

Interval Data

2

Reliable Inference instead of Overprecision

3

Generalized Linear Models; Maximum Likelihood Estimation

4

Collecting Regions from Estimating Equations

5

Envelopes of Estimating Equations: One Dimensional Case

6

Penalty Approach

7

MLE-Equivalence

8

Concluding Remarks

Augustin et al. A Notion of Suffjciency for Interval Data 2 / 49

slide-3
SLIDE 3

Interval Data

Augustin et al. A Notion of Suffjciency for Interval Data 3 / 49

slide-4
SLIDE 4

Interval Data

interval data, more generally “imprecise”, “coarse”, “messy”, “defjcient” data are quite common There is an underlying true value that is not observed in the granularity originally intended. epistemic point of view (cp., e.g., Couso & Dubois (2014, IJAR), Couso, Dubois & Sánchez (2014, Springer) ) fjnite precision of measurements response efgects like heaping anonymization compliance, increase of respond rate special case: missing data categorical data: indecision between certain alternatives matching of data a better name would be “non-idealized data”

Augustin et al. A Notion of Suffjciency for Interval Data 4 / 49

slide-5
SLIDE 5

The two-layers perspective

ideal Yi

efgects

ideal Xi

❄ ❄ ❄ ❄ ✻

data

inference

data

  • bservation model
  • bservation model
  • bservable Yi
  • bservable Xi

Augustin et al. A Notion of Suffjciency for Interval Data 5 / 49

slide-6
SLIDE 6

Interval Data: Example

German General Social Survey (ALLBUS) 2010: 2827 observations from Germany in total, 2000 report personal income (30% missing). An additional 10% report only income brackets.

1000 2000 3000 4000 5000 6000 7000 8000 50 100 Frequencies

Augustin et al. A Notion of Suffjciency for Interval Data 6 / 49

slide-7
SLIDE 7

Interval Data: Example

1 We see heaping at 1000 e, 2000 e, . . ., less so at 500 e, 1500 e, . . . 2 Both heaping and grouping depend on the amount of income reported. 3 Missingness (some 20% of the data) might as well depend on the

amount of income. Consequences:

1 Missingness, grouping, and heaping will rarely conform to the

assumption of “coarsening at random” (CAR).

2 Missingness, grouping, and heaping add an additional type of

uncertainty apart from classical statistical uncertainty. This uncertainty can’t be decreased by sampling more data. Use credible inference procedures that do not rely on unsustainable “assumptions”!

Augustin et al. A Notion of Suffjciency for Interval Data 7 / 49

slide-8
SLIDE 8

Interval Data: Example

1 We see heaping at 1000 e, 2000 e, . . ., less so at 500 e, 1500 e, . . . 2 Both heaping and grouping depend on the amount of income reported. 3 Missingness (some 20% of the data) might as well depend on the

amount of income. Consequences:

1 Missingness, grouping, and heaping will rarely conform to the

assumption of “coarsening at random” (CAR).

2 Missingness, grouping, and heaping add an additional type of

uncertainty apart from classical statistical uncertainty. This uncertainty can’t be decreased by sampling more data. Use credible inference procedures that do not rely on unsustainable “assumptions”!

Augustin et al. A Notion of Suffjciency for Interval Data 7 / 49

slide-9
SLIDE 9

Interval Data: Example

1 We see heaping at 1000 e, 2000 e, . . ., less so at 500 e, 1500 e, . . . 2 Both heaping and grouping depend on the amount of income reported. 3 Missingness (some 20% of the data) might as well depend on the

amount of income. Consequences:

1 Missingness, grouping, and heaping will rarely conform to the

assumption of “coarsening at random” (CAR).

2 Missingness, grouping, and heaping add an additional type of

uncertainty apart from classical statistical uncertainty. This uncertainty can’t be decreased by sampling more data. Use credible inference procedures that do not rely on unsustainable “assumptions”!

Augustin et al. A Notion of Suffjciency for Interval Data 7 / 49

slide-10
SLIDE 10

Probability Model

Joint distribution of exact and interval-valued random variables with marginal distributions P (exact data) and P* (observable, e.g. coarsened data): (Ω, ˚ F, ˚ P) ((X * × Y*), F*, P*) ((X × Y), F, P)

ideal, exact model

(X, Y) (❳, ❨ )

Assumptions defjciency model X* ⊂ P(X), Y* ⊂ P(Y)

For coarse data: consistency condition (error freeness) Pr(X ∈ X, Y ∈ Y) = 1

Augustin et al. A Notion of Suffjciency for Interval Data 8 / 49

slide-11
SLIDE 11

Reliable Inference instead of Overprecision

Augustin et al. A Notion of Suffjciency for Interval Data 9 / 49

slide-12
SLIDE 12

Interval Data: Representations

(1) (2) (3)

Epistemic point of view: Couso & Dubois (2014, IJAR), Couso, Dubois & Sánchez (2014, Springer) We represent interval-valued data as follows: x := [x, x] = {(x1, . . . , xn) | x1 ≤ x1 ≤ x1, . . . , xn ≤ xn ≤ xn} where it is assumed that the intervals contain the actual, underlying, “true” x ∈ x. Analogously for Y -variable.

Augustin et al. A Notion of Suffjciency for Interval Data 10 / 49

slide-13
SLIDE 13

Manski’s Law of Decreasing Credibility

Reliability !? Credibility ? "The credibility of inference decreases with the strength of the assumptions maintained." (Manski (2003, p. 1))

Augustin et al. A Notion of Suffjciency for Interval Data 11 / 49

slide-14
SLIDE 14

Reliable Inference Instead of Overprecision!!

Consequences from Manski’s Law of Decreasing Credibility: Adding untenable assumptions to produce precise solution may distroy credibility of statistical analysis, and therefore its relevance for the subject matter questions. Make realistic assumptions and let the data speak for themselves! Extreme case: Consider the set of all models that are compatible with the data (and then add successively additional assumptions, if desirable) The results may be imprecise, but are more reliable The extent of imprecision is related to the data quality! As a welcome by-product: clarifjcation of the implication of certain assumptions Often still suffjcient to answer subjective matter question

Augustin et al. A Notion of Suffjciency for Interval Data 12 / 49

slide-15
SLIDE 15

Work in that direction

Interval analysis/reliable computing, i.i.d. case, e.g. Nguyen, Kreinovich, Wu, Xiang (2011, Springer) Linear regression, e.g.,

◮ Rohwer & Pötter (2001, Juventa) ◮ Manski & Tamer (2002, Econometrica) ◮ Chernozhukov Hong &Tamer (2007, Econometrica) ◮ Beresteanu & Molinari (2008, Econometrica) ◮ Cattaneo & Wiencierz (2012, IntJAproxReason) ◮ Beresteanu, Molchanov,& Molinari. (2012, J Econometrics) ◮ Bontemps, Magnac & Maurin (2012, Econometrica) ◮ Schollmeyer & Augustin (2015, IntJAproxReason)

What to do with generalized linear models?

◮ logit regression: Plass, Augustin, Cattaneo, Schollmeyer (2015,

ISIPTA)

◮ ◮ Seitz (2015, Springer Best Masters) Augustin et al. A Notion of Suffjciency for Interval Data 13 / 49

slide-16
SLIDE 16

Generalized Linear Models; Maximum Likelihood Estimation

Augustin et al. A Notion of Suffjciency for Interval Data 14 / 49

slide-17
SLIDE 17

Basic Notation, Regression Models

n observations („large “) ❨ = (Y1, · · · , Yn)T response variable ❳ = (X1, · · · , Xn)T covariates (Xi, Yi)i=1,··· ,n i.i.d here Yi one dimensional, of metrical, ordinal, or categorical scale Xi p-dimensional, (metric or binary) joint distribution: density with respect to appropriate dominating measure f(❳,❨ )(①, ②) =

n

∏︂

i=1

f(Xi,Yi)(xi, yi) =

n

∏︂

i=1

fYi|Xi(yi|xi) ⏟ ⏞

model

·fXi(xi)

Augustin et al. A Notion of Suffjciency for Interval Data 15 / 49

slide-18
SLIDE 18

Typically parametrization of fY |X(·) only, fX(·) is assumed to contain ancillary information regression parameters 𝛾 = (𝛾0, 𝛾1, . . . , 𝛾p)T, further parameter 𝛿 parametric model for [Yi|Xi] Here generalized linear model

Augustin et al. A Notion of Suffjciency for Interval Data 16 / 49

slide-19
SLIDE 19

Generalized Linear Models

E.g. Fahrmeir, Kneib, Lang, Marx (2013, Spinger) Generalizing linear regression Yi = 𝛾0 + 𝛾′

1Xi + 𝜁i ⇐

⇒ Yi|Xi ∼ N(X ′

i 𝛾, 𝜏2)

to other distributions

* Gamma distribution, inverted Gaussian, Beta distribution * Poisson distribution − → count data * Bernoulli/Multinomial distribution − → categorical data: logit/Probit model

f (yi||𝜉i, 𝛿) = const(yi, 𝛿) · exp(𝜉iyi − b(𝜘i) 𝛿 ), i = 1, · · · , n 𝜉i = 𝛾0 + 𝛾1 · xi1 + · · · + 𝛾p · xip exponential family with individual canonical parameter 𝜉i = (︃ 1 X ′

i

)︃′ 𝛾 ("canonical link")

Augustin et al. A Notion of Suffjciency for Interval Data 17 / 49

slide-20
SLIDE 20

DGP

❄ ✻

DGP Data

Augustin et al. A Notion of Suffjciency for Interval Data 18 / 49

slide-21
SLIDE 21

Maximum Likelihood Estimation

After having observed the data, reinterpret the density as a function of the parameters, describing how likely each parameter has produced the data. Maximum Likelihood-Estimator (MLE): root of the derivative of the logarithmized likelihood − → score function score(𝛾) = 1 𝛿

n

∑︂

i=1

(︃ 1 Xi )︃ (Yi − E (Yi|Xi))

Augustin et al. A Notion of Suffjciency for Interval Data 19 / 49

slide-22
SLIDE 22

For discussion later; general form score(𝛾) = ❳❉(𝛾)𝜏2(𝛾) · (❨ − E(Yi|Xi) Quasi-likelihood models multivariate Y “Weibull-type”: Y α

i , Yi ≥ 0

Augustin et al. A Notion of Suffjciency for Interval Data 20 / 49

slide-23
SLIDE 23

E(Yi|Xi) = h(𝜃i) response function and g(E(Yi|Xi)) = 𝜃i link function E(Yi|Xi) = b′(𝜘i), 𝜘i = 𝜔(E(Yi|Xi)) Var(Yi|Xi) = 𝜒 · · ·

Augustin et al. A Notion of Suffjciency for Interval Data 21 / 49

slide-24
SLIDE 24

Collecting Regions from Estimating Equations

Augustin et al. A Notion of Suffjciency for Interval Data 22 / 49

slide-25
SLIDE 25

Estimating Equations–> Collection Regions

Generalizing from the linear case, suppose there is a consistent (score-) estimating equation for the ideal model {Pϑ | 𝜘 ∈ Θ}, i.e.: ∀𝜘 ∈ Θ : Eϑ (𝜔(❳, ❨ ; 𝜘)) = 0 Then ˆ 𝜘 := root (𝜔(❳, ❨ ; 𝜘)) With interval data, one gets a set of estimating equations, one for each random vector (selection) (❳, ❨ ) ∈ (X, Y): Ψ(X, Y; 𝜘) := {Ψ(❳, ❨ ; 𝜘) | ❳ ∈ X, ❨ ∈ Y} ˆ Θ := {︂ ˆ 𝜘 ⃒ ⃒ ⃒ ∃❳ ∈ X, ❨ ∈ Y : ˆ 𝜘 = root (𝜔(❳, ❨ ; 𝜘)) }︂ Named “collection region” in Schollmeyer & Augustin (2015, IntJAproxReason)

Augustin et al. A Notion of Suffjciency for Interval Data 23 / 49

slide-26
SLIDE 26

Envelopes of Estimating Equations: One Dimensional Case

Augustin et al. A Notion of Suffjciency for Interval Data 24 / 49

slide-27
SLIDE 27

Envelopes of Estimating Equations: One Dimensional Case

Seitz (2015, Springer Best Masters, § 3.1) Common form of estimating function 𝜔(X, Y ; 𝜘) =

n

∑︂

i=1

𝜔i(Xi, Yi; 𝜘). 𝜘 one-dimensional then min

(X,Y )∈(X,Y) 𝜔(X, Y ; 𝜘) = n

∑︂

i=1

min

(X,Y )∈(X,Y) 𝜔i(Xi, Yi, 𝜘)

If sign of derivative of the score function does not change, Fisher scoring; based on the sum of the individual lower and upper envelopes

  • f the score functions, which usually can be calculated analytically

Augustin et al. A Notion of Suffjciency for Interval Data 25 / 49

slide-28
SLIDE 28

One Parameter Case

−1.0 0.5 2.0 3.5 5.0 −1.0 0.5 2.0 3.5 5.0

  • −2

−1 1 2 −4 −2 2 4

  • Figure: Simulation; linear model without intercept.

Augustin et al. A Notion of Suffjciency for Interval Data 26 / 49

slide-29
SLIDE 29

Exponential

1 2 3 4 2 4 6 8

  • −2

−1 1 2 1 2 3 4

  • Figure: Exponential case

Augustin et al. A Notion of Suffjciency for Interval Data 27 / 49

slide-30
SLIDE 30

Penalty Approach

Augustin et al. A Notion of Suffjciency for Interval Data 28 / 49

slide-31
SLIDE 31

Parameter Estimation, Basic Form

Linear objective function with nonlinear equality constraint and box constraints: 𝜘l → min / max subject to 𝜔k(x, y; 𝜘) = with k = 1, . . . , q xi ∈ Xi with i = 1, . . . , n yi ∈ Yi with i = 1, . . . , n.

Augustin et al. A Notion of Suffjciency for Interval Data 29 / 49

slide-32
SLIDE 32

Parameter Estimation, Penalty Form

Seitz (2015, Springer Best Masters, § 3.5, 4) ˆ 𝜘 root of function 𝜔(·) ⇐ ⇒ ˆ 𝜘 := argminϑ (𝜔)2 Nonlinear objective function with box constraints: 𝜘l ±

q

∑︂

k=1

𝜍k (𝜔k(x, y; 𝜘))2 → min / max subject to x ∈ x, y ∈ y 𝜍k, k = 1, . . . , q penalties

Augustin et al. A Notion of Suffjciency for Interval Data 30 / 49

slide-33
SLIDE 33

Parameter Estimation: Heuristic Search

Sequential evaluation Fix X, Y Search for optimal vertex in (X1 × Y1) Fix this optimum and search for optimal vertex in (X2 × Y2) etc. Repeat until no considerable change in optimal solution

Augustin et al. A Notion of Suffjciency for Interval Data 31 / 49

slide-34
SLIDE 34

MLE-Equivalence

Augustin et al. A Notion of Suffjciency for Interval Data 32 / 49

slide-35
SLIDE 35

Def: MLE-equivalence for Aθ

Let P be a family of distributions parametrized in 𝜘 ∈ Θ ⊆ Rq and denote for each sample (❳, ❨ ) ∼ pϑ ∈ P the maximum likelihood estimator for 𝜘 by ˆ 𝜘(❳, ❨ ). For a matrix A ∈ R˜

q×q, ˜

q ≤ q call two samples (❳ (1), ❨ (1)) and (❳ (2), ❨ (2)) MLE-equivalent for A𝜄 if Aˆ 𝜘 (︂ ❳ (1), ❨ (1))︂ = Aˆ 𝜘 (︂ ❳ (2), ❨ (2))︂

Augustin et al. A Notion of Suffjciency for Interval Data 33 / 49

slide-36
SLIDE 36

Examples

For arbitrary A and sample (❳, ❨ ), let (︂ ❳ (1), ❨ (1))︂ = (❳, ❨ ) and (︂ ❳ (2), ❨ (2))︂ be an order statistic of (❳, ❨ ) with respect to one of its components Of particular interest are specifjc A’s such that certain subvectors of components of 𝜘 = (𝛾T, 𝜂T)T are selected, in particular A such that A𝜘 = 𝛾 ⇒ MLE-equivalent for 𝛾

Augustin et al. A Notion of Suffjciency for Interval Data 34 / 49

slide-37
SLIDE 37

Theorem

GLM with canonical link functions and ❳ treated as fjxed all (︂ ❳ (1), ❨ (1))︂ and (︂ ❳ (2), ❨ (2))︂ with

n

∑︂

i=1

⎛ ⎜ ⎜ ⎜ ⎜ ⎝ 1 X (1)

i1

. . . X (1)

ip

⎞ ⎟ ⎟ ⎟ ⎟ ⎠ · Y (1)

i

=

n

∑︂

i=1

⎛ ⎜ ⎜ ⎜ ⎜ ⎝ 1 X (2)

i1

. . . X (2)

ip

⎞ ⎟ ⎟ ⎟ ⎟ ⎠ · Y (2)

i

are MLE-equivalent for 𝛾.

Augustin et al. A Notion of Suffjciency for Interval Data 35 / 49

slide-38
SLIDE 38

For the proof remember:

MLE for 𝛾 from the score function score(𝛾) = 1 𝛿

n

∑︂

i=1

(︃ 1 Xi )︃ (Yi − E (Yi|Xi))

Augustin et al. A Notion of Suffjciency for Interval Data 36 / 49

slide-39
SLIDE 39

Corollary

To calculate the collection region for fjxed covariates and interval valued response it suffjces to consider certain single representers of MLE equivalent samples.

Augustin et al. A Notion of Suffjciency for Interval Data 37 / 49

slide-40
SLIDE 40

Algorithm (❳ precise)

Instead of solving the nonlinear (even nonconvex!) optimization problem in the penalty approach with n box constraints, determine the p-dimensional “variational areaťť of

n

∑︂

i=1

⎛ ⎜ ⎜ ⎜ ⎝ 1 Xi1 . . . Xip ⎞ ⎟ ⎟ ⎟ ⎠ · Yi. This is linear and even can be described explicitly. ((One dimensional X, w.l.o.g. X > 0: Sort by X: Start with taking all minimal Y ’s. The next point is as large (small) as possible by using that unit with the highest (the smallest) X value and the corresponding Ymax (Ymin).)) Then work with representers from there.

Augustin et al. A Notion of Suffjciency for Interval Data 38 / 49

slide-41
SLIDE 41

Lemma

If domain of covariates is compact, then, without loss of generality, all covariates can be taken to be positive for one dimension min X := mini=1,...,n Xi > 0 else consider X +

i

:= Xi − min X > 0 regression with 𝛾+

0 + 𝛾+ 1 Xi = 𝛾+ 0 + 𝛾+Xi − 𝛾+min X = ˜

𝛾0 + 𝛾+Xi

Augustin et al. A Notion of Suffjciency for Interval Data 39 / 49

slide-42
SLIDE 42

Corollary

Consider only regression model with a linear predictor and regression parameter (𝛾0, 𝛾1, . . . , 𝛾p)

′:

( ˜ Xi, Yi)i=1,...,n and (Xi, Yi)i=1,...,n, where ˜ Xi = Xi + c, c ∈ R, are MLE-equivalent for (𝛾1, . . . , 𝛾p)

′. Augustin et al. A Notion of Suffjciency for Interval Data 40 / 49

slide-43
SLIDE 43

In more detail

Let ❳ be one dimensional. Consider for X = (X1, ..., Xn) the order statistics ❳ ↑:= (X(1), . . . , X(n)) and the reverse order statistics ❳ ↓:= (X(n), . . . , X(1))

Augustin et al. A Notion of Suffjciency for Interval Data 41 / 49

slide-44
SLIDE 44

Sort Y and Y accordingly ❨ ↑① = (Y [1], Y [2], . . . , Y [n]) ❨ ↑① = (Y [1], Y [2], . . . , Y [n])

Augustin et al. A Notion of Suffjciency for Interval Data 42 / 49

slide-45
SLIDE 45

Describe vertices of "upper polygon"’, starting from (︄ n ∑︂

i=1

Y i ,

n

∑︂

i=1

Y iXi )︄

Augustin et al. A Notion of Suffjciency for Interval Data 43 / 49

slide-46
SLIDE 46
  • rder statistics:

❳ = (X(1), . . . , X(n)) sort ❨ , ❨ accordingly ❨ ↑x = (Y [1], Y [2], . . . , Y [n]), i.e. ❨ ↓x = (Y[n], Y[n−1], . . . , Y[1]) etc.

Augustin et al. A Notion of Suffjciency for Interval Data 44 / 49

slide-47
SLIDE 47

fjrst vertex further on: increase

n

∑︁

i=1

Y i by 𝜗 highest (lowest) point i put all mass into the largest (smallest) ❳-value

Augustin et al. A Notion of Suffjciency for Interval Data 45 / 49

slide-48
SLIDE 48

vertices of lower envelope (∑︁

φ

:= 0) ⎛ ⎝

j

∑︂

i=1

Y [i] +

n

∑︂

i=j+1

Y [i],

j

∑︂

i=1

Y [i]X(i) +

n

∑︂

i=j+1

Y [i]X(i) ⎞ ⎠ vertices of upper envelope ⎛ ⎝

j

∑︂

i=1

Y [n+1−i] +

n

∑︂

i=j+1

Y [n+1−i],

j

∑︂

i=1

Y [n+1−i] · X(n+1−i) +

n

∑︂

i=j+1

Y [n+1−i] · X(n

1

Explicit characterization of vertices.

Augustin et al. A Notion of Suffjciency for Interval Data 46 / 49

slide-49
SLIDE 49

⇒ check for given ⃗ 𝛾* whether or or not it is in the collection region.

Augustin et al. A Notion of Suffjciency for Interval Data 47 / 49

slide-50
SLIDE 50

Concluding Remarks

Augustin et al. A Notion of Suffjciency for Interval Data 48 / 49

slide-51
SLIDE 51

Concluding Remarks

Interval (coarse(ned)) data in generalized linear models Optimization approach based on score function Try to make it more tractable by „MLE-equivalence “ ⇒ Suffjciency concept for coarse data (interval data)

Augustin et al. A Notion of Suffjciency for Interval Data 49 / 49