Strategy-proof estimators for simple regression By Javier Perote - - PowerPoint PPT Presentation

strategy proof estimators for simple regression
SMART_READER_LITE
LIVE PREVIEW

Strategy-proof estimators for simple regression By Javier Perote - - PowerPoint PPT Presentation

Strategy-proof estimators for simple regression By Javier Perote (University of Salamanca) and Juan Perote-Pea (University of Zaragoza) MOTIVATION First, this is the continuation of a research project consisting in introducing private


slide-1
SLIDE 1

Strategy-proof estimators for simple regression

By Javier Perote (University of Salamanca) and Juan Perote-Peña (University of Zaragoza)

slide-2
SLIDE 2

MOTIVATION

  • First, this is the continuation of a research project

consisting in introducing private information and strategic considerations into well-known “aggregation” and “decision” techniques like:

– Operations Research (PERT, queuing theory, linear programming,…) – Multicriteria decision making – Clustering techniques – Econometrics

  • Are these techniques “robust” to individual

manipulation using the private information?

slide-3
SLIDE 3

MOTIVATION

  • Secondly, strategic data manipulation evokes the

literature on “robustness” to avoid random contamination and outlier detection: most of the estimators proposed in that literature use the properties of the median to aggregate data

  • Interestingly, the median as an allocation device to

aggregate information is strategy-proof in some contexts: i.e., when individuals have “single- peaked” preferences on a single dimension in public goods allocation problems

  • Can the incentives literature (from social choice

theory) answer questions on econometrics?

slide-4
SLIDE 4

STRUCTURE OF THE PAPER

  • First, we argue that the informational problem can

be very important in some econometric studies. Therefore, designing estimators that are robust to data manipulation can be useful

  • Secondly, we examine the most popular

estimators, OLS and show that they may lead to sample contamination (they’re NOT robust)

  • Then, we propose a whole family of estimators for

the simple regression case that can be proved to be immune to this kind of data contamination

  • Finally, we’ll confront some of them with OLS in

a Monte Carlo experiment

slide-5
SLIDE 5

WHAT KIND OF PROBLEM?

  • Some econometric problems use reported or

declared information (that cannot be easily and costlessly observed or verified) from agents or individuals (like questionnaires i.e., it is the agent’s private information)

  • The information extracted from the data is (or can

be) used to allocate “something” or to assess policies that might be important to the agents

  • Therefore, the agents might be tempted to report

false information if they think that the data managing process can be profitably manipulated

slide-6
SLIDE 6

AN EXAMPLE

  • A big firm or a government department has a

number of divisions (perhaps located in different regions)

  • Measures of the output “produced” by the

divisions cannot be verified without important costs (inventory costs, monitoring costs, etc.). For instance, number of clients served in a month

  • Therefore, the information about each division’s
  • utput is privately owned by the division manager

and is reported by him to the firm’s manager

slide-7
SLIDE 7

THE MODEL WITH THE EXAMPLE

  • Some of the inputs affecting each division’s
  • utput are known to the planner (firm’s boss),

maybe because the planner himself “allocated” then in the past (i.e., the number of workers in each division, the estimated demand in each region, the monthly division’s budget, etc.)

  • set of divisions (= agents)
  • each agent is also an “observation”
  • division i’s measure of (true) output
  • division i’s reported output

{ }:

,..., 2 , 1 n N =

: , N j i ∈

: ,

i

y N i∈ ∀ : ~ ,

i

y N i∈ ∀

slide-8
SLIDE 8

THE MODEL WITH THE EXAMPLE

  • publicly known explanatory variable
  • True data generating process:
  • where

and is an i.i.d. random variable (error term or random shock)

  • Let

and : : ,

i

x N i∈ ∀

i i i

e x y + + =

1

β β

n i ,..., 1 =

) , ( : σ N ei

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ =

n i

x x x X 1 ... ... 1 ... ... 1

1

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ =

n i

y y y Y ... ...

1

) , ( Y X

True sample

slide-9
SLIDE 9

THE MODEL WITH THE EXAMPLE

  • A regression estimator is a function “T” of the

sample :

  • The estimated or predicted values of the response

variable for each observation are generated as:

  • And the residuals

are the differences:

  • . The most widely used estimator is

the OLS one: ) , ( ) ˆ , ˆ ( ˆ

1

Y X T = = ′ β β β

. ˆ ˆ ˆ

1

N i x y

i i

∈ ∀ + = β β

) , ( Y X

N i ei ∈ ∀ , ˆ

i i i

y y e ˆ ˆ − =

) , ( , ˆ arg ˆ

1 2

Y X e min

n i i OLS

∀ =

=

β

slide-10
SLIDE 10

THE MODEL WITH THE EXAMPLE

  • When the true sample

is known to the planner, the OLS estimator is the unbiased one with minimum variance (good properties)

  • But when the true sample is unknown, the only

information received by the planner is instead of . Applying OLS to the reported sample

  • nly maintain the good poperties

when all agents do not lie! (i.e., )

  • QUESTION: In which cases will the agents lie?

) , ( Y X ) ~ , ( Y X ) , ( Y X ) ~ , ( Y X

Y Y = ~

slide-11
SLIDE 11

THE MODEL WITH THE EXAMPLE

  • We must assume some “preferences” guiding the

agents’ declaring behaviour. We opt by the…

  • SINGLE-PEAKEDNESS ASSUMPTION:
  • Agent

with true response value has single- peaked preferences

  • n the real line E if:
  • (i)
  • (ii) and

N i∈

i

y

i

y i

R

i y i i

y v E v v P y

i

≠ ∈ ∀ ,

) ( ) ( , , v y P v y v v v v

i y i i

i

+ + → > > ∀

). ( ) ( v y P v y

i y i i

i

− −

slide-12
SLIDE 12

EXAMPLE OF SINGLE-PEAKEDNESS

  • Possible single-peaked preferences for

The real line representing predicted values

i

y

i

y ˆ

Preference “intensity”

N i∈

E

slide-13
SLIDE 13

THE MODEL WITH THE EXAMPLE

  • Let us use the partitioned notation:
  • Def: Regression estimator
  • is manipulable at sample

by observation

  • if

such that

  • Def: Regression estimator
  • is strategy-proof if it is NOT manipulable at any

sample for any observation

( )

i i Y

y Y

= ,

) ~ , ~ , ( ) ˆ , ˆ ( ˆ

1 i i Y

y X T

= = ′ β β β

Z Y X ∈ ) ~ , (

{ }

n i ,..., 1 ∈

) ~ ( ,

~ ~ i i i y y i

y y E y R

i i

≠ ∈ ∃ ℜ ∈ ∃

[ ] [ ]

i i i i i y i i i i i i

x Y y X Y y X P x Y y X Y y X

i

) ~ , ~ , ( ˆ ) ~ , ~ , ( ˆ ) ~ , , ( ˆ ) ~ , , ( ˆ

~ − − − −

+ + β β β β

) ~ , ~ , ( ) ˆ , ˆ ( ˆ

1 i i Y

y X T

= = ′ β β β

Z Y X ∈ ) ~ , (

{ }

n i ,..., 1 ∈

slide-14
SLIDE 14

SOME EXAMPLES

  • The workers’ union’s wage setting problem

i

L

i i i

rK q p −

i i i i

FB L w y + = ~

i

L

i

w

slide-15
SLIDE 15

SOME EXAMPLES

  • The efficiency frontier estimation problem

i

σ log

i

r log

i i i i

FB L w y + = ~

i

σ

1

ˆ β

i i i

e r DGP + + = σ β β log log :

1

ˆ β

slide-16
SLIDE 16

SOME EXAMPLES

  • The tax pay-as-you-go rates allocation problem

rate tax average PAYG ti :

i i i i

FB L w y + = ~ $ 000 , 10

1

ˆ β ˆ β

income Ii :

30% 20%

i i

I t schedule tax PAYG

1

ˆ ˆ : β β + =

slide-17
SLIDE 17

OLS IS NOT STRATEGY-PROOF

  • Example:

i

x

i

y ~

True response variables for 5

  • bservations

2

y

2

x

slide-18
SLIDE 18

OLS IS NOT STRATEGY-PROOF

  • Example:

i

x

i

y ~

The OLS estimator generates the regression line

2

y

2

x

slide-19
SLIDE 19

OLS IS NOT STRATEGY-PROOF

  • Example:

i

x

i

y ~

By lying and under- estimating , agent 2 can be better off

2

y

2

x

2

~ y

The regression line slightly shifts downwards And the new prediction for is closer to true

2

x

2

y

2

y

Lie: :

2 2

~ y y ≠

slide-20
SLIDE 20

A STRATEGY-PROOF ESTIMATOR

  • Only recommended for the case of

such that and : it is an extension of the median voter theorem: the MV estimator, defined as:

) ~ , ( Y X Z = N i xi ∈ ∀ > 0

0 =

β

) ˆ ~ ( ˆ , ~ ˆ

1 1 i i N i i i

x y med x y med β β β − = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ =

slide-21
SLIDE 21

A STRATEGY-PROOF ESTIMATOR

  • Only recommended for the case of

such that and : it is an extension of the median voter theorem: the MV estimator, defined as:

) ~ , ( Y X Z = N i xi ∈ ∀ > 0

0 =

β

) ˆ ~ ( ˆ , ~ ˆ

1 1 i i N i i i

x y med x y med β β β − = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ =

i

x

i

y ~

Case of 5

  • bservations

2

x

2

~ y

slide-22
SLIDE 22

A STRATEGY-PROOF ESTIMATOR

  • Only recommended for the case of

such that and : it is an extension of the median voter theorem: the MV estimator, defined as:

) ~ , ( Y X Z = N i xi ∈ ∀ > 0

0 =

β

) ˆ ~ ( ˆ , ~ ˆ

1 1 i i N i i i

x y med x y med β β β − = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ =

i

x

i

y ~

is the median

  • f the slopes

1

ˆ β

slide-23
SLIDE 23

A STRATEGY-PROOF ESTIMATOR

  • Only recommended for the case of

such that and : it is an extension of the median voter theorem: the MV estimator, defined as:

) ~ , ( Y X Z = N i xi ∈ ∀ > 0

0 =

β

) ˆ ~ ( ˆ , ~ ˆ

1 1 i i N i i i

x y med x y med β β β − = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ =

i

x

i

y ~

is the median

  • f the slopes

1

ˆ β

1 2 3 4 5

slide-24
SLIDE 24

A STRATEGY-PROOF ESTIMATOR

  • Only recommended for the case of

such that and : it is an extension of the median voter theorem: the MV estimator, defined as:

) ~ , ( Y X Z = N i xi ∈ ∀ > 0

0 =

β

) ˆ ~ ( ˆ , ~ ˆ

1 1 i i N i i i

x y med x y med β β β − = ⎭ ⎬ ⎫ ⎩ ⎨ ⎧ =

i

x

i

y ~

and is always the origin

ˆ β

1 2 3 4 5

slide-25
SLIDE 25

CRM ESTIMATORS

  • The clockwise repeated median estimators (CRM)

is a family of strategy-proof estimators valid for every sample such that

  • They are parameterised by two sets

with either

  • r

. First we calculate the clockwise angle of any pair of declared

  • bservations

:

) ~ , ( Y X Z =

N j i x x

j i

∈ ∀ ≠ ,

N S S ⊆ ′ ,

∅ = ′ ∩ S S S S ′ ⊆

N j i ∈ ,

= )) ~ , ( ), ~ , ((

j j i i

y x y x CWA ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − + − + =

i j i j i j i j i j

x x y y arctan x x y y sign x x sign ~ ~ ~ ~ 2 ) ( π π

slide-26
SLIDE 26

CRM ESTIMATORS

  • Then, we define the directing angle,
  • And finally, the regression estimator is obtained as
  • Some members of this class are known estimators:

)) ~ , ( ), ~ , ((

j j i i i j S j S i

y x y x CWA med med

≠ ′ ∈ ∈

=

= ) ~ , ( Y X DA

) ˆ ~ ( ˆ ) ) ~ , ( ( 2 ) ~ , ( ˆ

1 1 i i S i

x y med Y X DA sign Y X DA tan β β π π π β − = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − − − =

slide-27
SLIDE 27

CRM ESTIMATORS

  • If

, we obtain a clockwise extension of the repeated median estimator (Siegel, 1982)

  • If

, we obtain a clockwise extension of the median star estimator (Simon, 1986)

  • If
  • we obtain Brown and Mood (1951) technique and

slightly changed, Tukey’s (1970/71) resistant line method

N S S = ′ =

{ } { }

h S h N S = ′ = , \

{ } { }

j j h j j h

x med x N h S x med x N h S > ∋ ∈ = ′ ≤ ∋ ∈ = ,

slide-28
SLIDE 28

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

5 declared

  • bservations. First,

we calculate each

  • ne’s clockwise

angle 1 2 3 4 5

slide-29
SLIDE 29

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

We start with the first one: first, find the vectors connecting 1 with any

  • ther observation…

1 2 3 4 5

slide-30
SLIDE 30

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

Now, look at the clockwise angle of 1 with 2 1 2 3 4 5

slide-31
SLIDE 31

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

We first find the median of the 4 clockwise angles of

  • bservation 1

1 2 3 4 5 Note: when there’s an even number of angles, we take the highest median (convention)

slide-32
SLIDE 32

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

We represent the median angle with a small arrow pointing to the corresponding

  • bservation

1 2 3 4 5 And we proceed to find the median angle for each Observation: 1 to 5

slide-33
SLIDE 33

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

We represent the median angle with a small arrow pointing to the corresponding

  • bservation

1 2 3 4 5 And we proceed to find the median angle for each Observation: 1 to 5

slide-34
SLIDE 34

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

We re-order the arrows from the one with the biggest clockwise angle to the one with the smallest 1 2 3 4 5 And we find the median of all

  • f them, i.e., the one starting

from observation 3 1 2 3 4 5

slide-35
SLIDE 35

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

Observation 3 is called the directing

  • bservation pointing

to observation 4 and its clockwise angle the directing angle 1 2 3 4 5 The slope of the regression line is given by the directing angle and the intercept is immediate 1 2 3 4 5

slide-36
SLIDE 36

CRM ESTIMATORS: AN EXAMPLE

  • Let’s consider

and sample

N S S = ′ =

) ~ , ( Y X Z =

i

y ~

i

x

The CRM estimators are always such that the regression line passes through two different

  • bservations (3and 4)

1 2 3 4 5 1 2 3 4 5 Observations below (above) the regression line have always bigger (smaller) angles than the directing one

slide-37
SLIDE 37

CRM ESTIMATORS: OTHER EXAMPLE

  • Let’s consider

and : resistant line

{ }

2 , 1 = S

i

y ~

i

x

1 2 3 4 5

{ }

5 , 4 = ′ S

slide-38
SLIDE 38

CRM ESTIMATORS: OTHER EXAMPLE

  • Let’s consider

and : resistant line

{ }

2 , 1 = S

i

y ~

i

x

1 2 3 4 5

{ }

5 , 4 = ′ S

The resistant line regression line Agent 3 cannot change the line and agents 2 and 5 cannot be better off. Only 1 and 4 might want to lie

slide-39
SLIDE 39

CRM ESTIMATORS: OTHER EXAMPLE

  • Let’s consider

and : resistant line

{ }

2 , 1 = S

i

y ~

i

x

1 2 3 4 5

{ }

5 , 4 = ′ S

Agent 4’s lies below the regression line will not change it If 4 report a

  • ver the

regression line, will only shift it upwards

4

~ y

slide-40
SLIDE 40

CRM ESTIMATORS: OTHER EXAMPLE

  • Let’s consider

and : resistant line

{ }

2 , 1 = S

i

y ~

i

x

1 2 3 4 5

{ }

5 , 4 = ′ S

Agent 4’s lies below the regression line will not change it If 4 report a

  • ver the

regression line, will only shift it upwards

4

~ y

slide-41
SLIDE 41

CRM ESTIMATORS: OTHER EXAMPLE

  • Let’s consider

and : resistant line

{ }

2 , 1 = S

i

y ~

i

x

1 2 3 4 5

{ }

5 , 4 = ′ S

New regression line with lie ( ): 4 is now worse off since his prediction is even further

4

~ y

slide-42
SLIDE 42

CRM ESTIMATORS: OTHER EXAMPLE

  • Let’s consider

and : resistant line

{ }

2 , 1 = S

i

y ~

i

x

1 2 3 4 5

{ }

5 , 4 = ′ S

Agent 1 cannot change the line with lies

  • ver it

And can only shift it downwards by using lies below the regression line, therefore lying does not pay

1

~ y

slide-43
SLIDE 43

THE SIMULATION RESULTS

  • We undertake a Monte Carlo experiment

comparing the OLS estimates when the sample will be strategically manipulated with some CRM estimators that avoid manipulation but are biased. Two DGP:

  • DGP1: where

and i.i.d

  • DGP2: where

and i.i.d

  • We must also assume a sample contamination

process for OLS regression (somehow arbitrary). In particular, less than 1/3 of the observations on average were strategically contaminated

i i i

e x y + − = 5 . 5

) 1 , ( : N ei

i i i

e x y + + − = 5 . 5

) 1 , ( : N ei

slide-44
SLIDE 44

THE SIMULATION RESULTS

FIGURE 4: DGP yi=5-0.5xi+ei; V(ei)=1.

  • 15
  • 10
  • 5

5 10 15 1 3 5 7 9 11 13 15 17 19 Observations OLS Repeated Median Resistant Line Median Star Contaminated OLS

slide-45
SLIDE 45

THE SIMULATION RESULTS

FIGURE 5: DGP yi=-5+0.5xi+ei; V(ei)=1.

  • 6
  • 4
  • 2

2 4 6 1 3 5 7 9 11 13 15 17 19 Observations OLS Repeated Median Resistant Line Median Star Contaminated OLS

slide-46
SLIDE 46

THE SIMULATION RESULTS

FIGURE 6: SIMULATED HISTOGRAMS FOR THE REGRESSION INTERCEPT (yi=5-0.5xi+ei; V(ei)=0.01)

1 2 3 4 5 6 7 8 9 10 4,0 4,2 4,4 4,6 4,8 5,0 5,2 5,4 5,6 5,8 OLS Contaminated OLS Resistant Line Repeated Median Median Star

slide-47
SLIDE 47

THE SIMULATION RESULTS

FIGURE 7: SIMULATED HISTOGRAMS FOR THE REGRESSION SLOPE (yi=5-0.5xi+ei; V(ei)=0.01)

10 20 30 40 50 60 70 80 90 100

  • 0,60 -0,58 -0,55 -0,53 -0,50 -0,48 -0,46 -0,43 -0,41

OLS Contaminated OLS Resistant Line Repeated Median Median Star

slide-48
SLIDE 48

CONCLUSIONS

  • In some contexts, strategy-proofness might be an

important desirable property to hold when the information extraction is linked to resource allocation or policy assessment and part of the sample is private information

  • In these cases, a loss in consistency by using a

CRM estimator instead of OLS might be a low price to pay for a “honestly revealed” sample

  • The commitment to use the CRM estimator for the

resource allocation after extracting the information must be clear: there might be an inconsistency problem

slide-49
SLIDE 49

CONCLUSIONS

  • Most CRM estimators have also high breakdown

points and are robust to contamination by outliers

  • CRM estimators only work for single-peaked
  • preferences. If agents have other objectives like

minimising instead of , for example, the search for strategy-proof estimators must start again

i i

y y ~ ˆ −

i i

y y − ˆ