Multi-Observation Elicitation July 2017 Sebastian Casalaina-Martin - - PowerPoint PPT Presentation

multi observation elicitation
SMART_READER_LITE
LIVE PREVIEW

Multi-Observation Elicitation July 2017 Sebastian Casalaina-Martin - - PowerPoint PPT Presentation

Multi-Observation Elicitation July 2017 Sebastian Casalaina-Martin Colorado Rafael Frongillo Colorado Tom Morgan Harvard Bo Waggoner UPenn 1 / 11 Background: Properties of distributions Property or statistic of a probability distribution:


slide-1
SLIDE 1

Multi-Observation Elicitation

Sebastian Casalaina-Martin Colorado Rafael Frongillo Colorado Tom Morgan Harvard Bo Waggoner UPenn July 2017

1 / 11

slide-2
SLIDE 2

Background: Properties of distributions

Property or statistic of a probability distribution:

Γ : ∆Y → R

Examples: Γ(p) = EY ∼p Y

mean

Γ(p) =

y p(y) log 1 p(y)

entropy

Γ(p) = argmaxy p(y)

mode

Γ(p) = EY ∼p(Y − E Y )2

variance

2 / 11

slide-3
SLIDE 3

Background: Elicitation (1)

If we minimize expected loss, what do we get?

3 / 11

slide-4
SLIDE 4

Background: Elicitation (1)

If we minimize expected loss under a distribution p, what property of p do we get?

r∗ = argmin

r∈R

E

Y ∼p ℓ(r, Y )

minimize loss

slide-5
SLIDE 5

Background: Elicitation (1)

If we minimize expected loss under a distribution p, what property of p do we get?

r∗ = argmin

r∈R

E

Y ∼p ℓ(r, Y )

minimize loss

Γ(p) = ψ(r∗)

link

3 / 11

slide-6
SLIDE 6

Background: Elicitation (1)

If we minimize expected loss under a distribution p, what property of p do we get?

r∗ = argmin

r∈R

E

Y ∼p ℓ(r, Y )

minimize loss

Γ(p) = ψ(r∗)

link

Motivation: statistically consistent losses. Finite property space: classification, ranking, . . . Γ(p) ∈ Rd: regression, . . .

3 / 11

slide-7
SLIDE 7

Background: Elicitation (2)

If we minimize expected loss under a distribution p, what property of p do we get?

r∗ = argmin

r∈R

E

Y ∼p ℓ(r, Y )

minimize loss

Γ(p) = ψ(r∗)

link

Examples: The mean is elicited by squared loss. Variance: elicit mean and second moment, then link. Any property is a link from the whole distribution . . . but dimension of prediction r is unbounded. . .

4 / 11

slide-8
SLIDE 8

This paper

What if the loss takes multiple i.i.d. observations? r∗ = argmin

r∈R

E

Y1,...,Ym∼p ℓ(r, Y1, . . . , Ym)

5 / 11

slide-9
SLIDE 9

This paper

What if the loss takes multiple i.i.d. observations? r∗ = argmin

r∈R

E

Y1,...,Ym∼p ℓ(r, Y1, . . . , Ym)

Examples: Var(p) = argminr E

  • r − 1

2(Y1 − Y2)22.

2-norm: unbounded dimension → 1 dimension, 2 observations!

5 / 11

slide-10
SLIDE 10

This paper

What if the loss takes multiple i.i.d. observations? r∗ = argmin

r∈R

E

Y1,...,Ym∼p ℓ(r, Y1, . . . , Ym)

Examples: Var(p) = argminr E

  • r − 1

2(Y1 − Y2)22.

2-norm: unbounded dimension → 1 dimension, 2 observations! Motivating applications: Crowd labeling Numerical simulations

climate science, engineering, . . .

Regression?

5 / 11

slide-11
SLIDE 11

Key concepts from prior research

Elicitable properties have convex level sets, linear structures. Simplex on Y = {1, 2, 3}:

1 2 3 all distributions with mode 3

Mode

1 2 3 all distributions with mean 2

6 / 11

slide-12
SLIDE 12

Key concepts from prior research

Elicitable properties have convex level sets, linear structures. Simplex on Y = {1, 2, 3}:

1 2 3 all distributions with mode 3

Mode

1 2 3 all distributions with mean 2

Mean

6 / 11

slide-13
SLIDE 13

Results (1)

Geometric approach

Summary: k-observation level sets ↔ zeros of degree-k polynomials

7 / 11

slide-14
SLIDE 14

Results (2)

Upper and lower bounds.

Key example: (integer) k-norm(p) =

  • y p(y)k1/k

. Idea: 1[Y1 = · · · = Yk] is an unbiased estimator for pk

k.

8 / 11

slide-15
SLIDE 15

Results (2)

Upper and lower bounds.

Key example: (integer) k-norm(p) =

  • y p(y)k1/k

. Idea: 1[Y1 = · · · = Yk] is an unbiased estimator for pk

k.

Loss(r, Y1, . . . , Yk) =

  • r − 1[Y1 − · · · = Yk]

2 . Link(r) = r1/k.

8 / 11

slide-16
SLIDE 16

Results (2)

Upper and lower bounds.

Key example: (integer) k-norm(p) =

  • y p(y)k1/k

. Idea: 1[Y1 = · · · = Yk] is an unbiased estimator for pk

k.

Loss(r, Y1, . . . , Yk) =

  • r − 1[Y1 − · · · = Yk]

2 . Link(r) = r1/k. Similar approach for products of expectations. Lower bound: k-norm requires k observations. Lower bound approach is general (algebraic geometry).

8 / 11

slide-17
SLIDE 17

Why could this be useful?

Problem: Regress x vs Var(y|x).

1 2 3 4 5

x

3 2 1 1 2 3 4 5 6

y

Mean Fitted mean 2nd moment Fitted 2nd

1 2 3 4 5

x

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Var(y|x)

Fitted variance: old approach Variance: new approach/true

9 / 11

slide-18
SLIDE 18

Why could this be useful?

Problem: Regress x vs Var(y|x). Old approach: Regress on mean and second moment, then link.

1 2 3 4 5

x

3 2 1 1 2 3 4 5 6

y

Mean Fitted mean 2nd moment Fitted 2nd

1 2 3 4 5

x

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Var(y|x)

Fitted variance: old approach Variance: new approach/true

9 / 11

slide-19
SLIDE 19

Why could this be useful?

Problem: Regress x vs Var(y|x). Old approach: Regress on mean and second moment, then link.

1 2 3 4 5

x

3 2 1 1 2 3 4 5 6

y

Mean Fitted mean 2nd moment Fitted 2nd

1 2 3 4 5

x

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

Var(y|x)

Fitted variance: old approach Variance: new approach/true

= ⇒ Requires good modeling and sufficient data for these (unimportant) proxies!

9 / 11

slide-20
SLIDE 20

Future directions

Elicitation frontiers and (d, m)-elicitability

In paper: central moments

Regression

In paper: preliminary results

Additional useful examples

e.g. expected max of k draws; risk measures

Lots of COLT questions for multi-observation losses! Thanks!

10 / 11

slide-21
SLIDE 21

Aside - comparison to property testing

Property Testing

Algorithmic problem Distribution p is initially unknown Algorithm draws samples to estimate property or test hypothesis

11 / 11

slide-22
SLIDE 22

Aside - comparison to property testing

Property Testing

Algorithmic problem Distribution p is initially unknown Algorithm draws samples to estimate property or test hypothesis

Property Elicitation

Existential questions, e.g.. . . . . . does there exist a one-dim. loss function eliciting variance? no . . . two-dimensional?

yes

. . . describe all losses directly eliciting the mean

divergences

11 / 11