Estimating Discrete Choice Models with Market Level Zeroes: An - - PowerPoint PPT Presentation

estimating discrete choice models with market level
SMART_READER_LITE
LIVE PREVIEW

Estimating Discrete Choice Models with Market Level Zeroes: An - - PowerPoint PPT Presentation

Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data Amit Gandhi, Zhentong Lu, Xiaoxia Shi University of Wisconsin-Madison February 3, 2015 Introduction I Zeroes are highly prevalent in choice data I


slide-1
SLIDE 1

Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data

Amit Gandhi, Zhentong Lu, Xiaoxia Shi

University of Wisconsin-Madison

February 3, 2015

slide-2
SLIDE 2

Introduction

I Zeroes are highly prevalent in choice data

I =Discrete choice models (a la McFadden) were designed to

explain corner solutions in individual demand

I Our research program: The empirical analysis of choice

data with market zeroes.

I Zero demand for a choice alternative after summing over

sample of consumers in a market

I A major feature of choice data from a diversity of

environments

I Causes serious problems for standard estimation

techniques

slide-3
SLIDE 3

Scanner Data

I Store level scanner data covering all Dominick’s Finer

Foods (DFF) stores in Chicago from 1989-1997

I ⇡ 80 stores over 300 weeks

I For each week/store/UPC (universal product code)

  • bservation:

I price I quantity I marketing (display, feature etc) I product characteristics (brand, size, premium etc) I wholesale price

slide-4
SLIDE 4

Product Variety

Category Avg No of UPC’s in a Store/Week Percent of Total Sale of the Top 20% UPC’s Percent of Zero Sales Analgesics 224 80.12% 58.02% Beer 179 87.18% 50.45% Bottled Juices 187 74.40% 29.87% Cereals 212 72.08% 27.14% Canned Soup 218 76.25% 19.80% Fabric Softeners 123 65.74% 43.74% Laundry Detergents 200 65.52% 50.46% Refrigerated Juices 91 83.18% 27.83% Soft Drinks 537 91.21% 38.54% Toothbrushes 137 73.69% 58.63% Canned Tuna 118 82.74% 35.34% Toothpastes 187 74.19% 51.93% Bathroom Tissues 50 84.06% 28.14%

slide-5
SLIDE 5

Long Tail

slide-6
SLIDE 6

Long Tail

slide-7
SLIDE 7

A “Big Data” Problem

I Quan and Williams (2014): data on 13.5 million shoe sales

across 100,000 products from online retailer

slide-8
SLIDE 8

A “Big Data” Problem

I Marwell (2014): collects daily data on project donations to

Kickstarter

Kickstarter #Daily %Zero #Projects 90,876 Projects Contribution #Days 555 Mean 4,713 0.59 Project-Day Obs. 2,615,839

  • Std. Dev

1,706 0.14

slide-9
SLIDE 9

A “not so Big Data” Problem

I Nurski and Verboven (2013): Belgian data on 488 car

models in 588 towns for 2 consumer types (men and women).

slide-10
SLIDE 10

Discrete Choice Model

I Classic McFadden (1973, 1980) discrete choice model I Markets are t = 1, . . . , T are the store/week realizations (a

menu of products, prices, and promotion)

I Products j = 1, . . . , Jt with attributes xjt 2 Rdw I Consumers i = 1, . . . , Nt with “demographics” wit 2 Rdz

uijt = ( djt + witΓxjt + eijt if j > 0 ei0t if j = 0

I BLP (1995) add the new layer:

djt = bxjt + xjt

slide-11
SLIDE 11

The Zeroes Problem

I Consider simplest case of “simple logit” (Γ = 0)

log ✓ sjt s0t ◆ = bxjt + xjt jt = 1, . . . , JT where E [xjt | zjt] = 0.

I If sjt = 0 then log (sjt) does not exist (or can only be

defined as ∞).

I However dropping zeroes induces selection bias

E [xjt | zjt, sjt > 0] 6= 0

I IV estimation asymptotically biased for b (which can be

severe).

I Will depend on the strength of this selection effect.

slide-12
SLIDE 12

Questions

I Why would the model generate an estimating equation

that can’t be estimated with the data?

I Is it a deep rejection of the choice model? I Is it a problem with the empirical strategy of taking model

to data?

slide-13
SLIDE 13

Identification

I For simplicity focus on “simple logit” (Γ = 0)

pjt = edjt ∑Jt

k=0 edkt

j = 0, . . . , Jt.

I consumer variation

djt = log ✓ pjt p0t ◆ = s1

j

(pt)

I products/markets variation

djt = bxjt + xjt = ) b =

  • E

⇥ z0

jtxjt

⇤1 E ⇥ z0

jtdjt

slide-14
SLIDE 14

Estimation

I Standard estimation (aka BLP) uses sample analogues of

both stages.

I Replace pjt with ˆ

pMLE

j

= sjt = ∑i yijt nt which implies ˆ dMLE

jt

= s1

j

(sjt)

I Plug ˆ

dMLE

jt

into 2SLS b b =

T

t=1 Jt

j=1

⇥ z0

jtxjt

⇤ !1 T

t=1 Jt

j=1

h z0

jt ˆ

dMLE

jt

i

slide-15
SLIDE 15

What is happening?

I Source of problem is ˆ

djt = s1

j

ˆ pMLE

t

  • I does not exist when ˆ

pMLE

jt

= 0

I Why use MLE in the first place?

I MLE is a potentially bad when choice data is sparse.

I Very old problem

I Laplace’s “Law of Succession” I Multinomial cell probabilities and sparse contingency

tables

I Zeroes arise when some pjt’s are small and nt is finite.

I Treating nt as finite but JT ! ∞ makes pjt and hence djt an

incidental parameter.

slide-16
SLIDE 16

Bayesian Analysis of Multinomial Cells

I Consider multinomial probabilities

pt = (p0t, . . . , pJtt) 2 ∆Jt

I We observe quantities qt = (q0t, q1t, . . . , qJtt) for nt

consumers

I The likelihood of pt is qt ⇠ MN (nt, pt) I Conjugate prior is pt ⇠ Dir (a0t, . . . , aJtt)

I Uniform prior: ajt = 1 (Laplace/De Morgan) I Non-informative prior: ajt = .5 (Jeffreys/Bernardo)

I Posterior is

pt | qt, nt ⇠ Dir (a0t + q0t, a1t + q1t, . . . , aJtt + qJtt)

slide-17
SLIDE 17

Laplace’s “Law of Succession”

I “What is the probability the sun will rise tomorrow given

that it has risen everyday until now?”

I He used a uniform prior ajt = 1 I Bayesian estimate ˆ

pjt = E [pjt | qt, nt] = qjt + 1 nt + Jt + 1

I ˆ

pjt “shrinks” empirical share sjt towards prior mean 1/ (Jt + 1)

I ˆ

pjt is consistent (like sjt), i.e., ˆ pjt !

p pjt

I Data dominates the prior in large samples

slide-18
SLIDE 18

Demand Application

I We want to estimate ˆ

djt = E  log ✓ pjt p0t ◆ | qt, nt

  • = y (ajt + qjt) y (a0t + q0t)

where y is the digamma function.

I Use ˆ

dt to compute “optimal market shares” p⇤

kt =

exp ˆ dkt

  • 1 + ∑Jt

j=1 exp

ˆ djt

  • I Plug optimal shares into 2SLS

b b =

T

t=1 Jt

j=1

⇥ z0

jtxjt

⇤ !1 T

t=1 Jt

j=1

h z0

jts1 j

(p⇤

t )

i

slide-19
SLIDE 19

Why is this a good estimator?

I We take a “Frequentist” interpretation of the prior

I “Empirical Bayes” approach.

I Choice probabilities pt are the endogenous variable of the

structural model.

I Let zt = (z1t, . . . , zJtt) be the the collection of exogenous

variables

I Then the conditional distribution pt | zt is the reduced form

  • f the structural model

I Prior distribution = Reduced form

slide-20
SLIDE 20

Asymptotic Bias

I Finite nt implies ˆ

b will in general have asymptotic bias.

I plimJT!∞ ˆ

b = b + Q1

xz E

h z0

jt

⇣ s1

jt ( ˆ

pt) s1

jt (pt)

⌘i where Qxz = E h z0

jtxjt

i .

Theorem

If optimal market shares p⇤

t are constructed from the “correct prior”

Fp|zt = F 0

pt|zt then

E h z0

jt

⇣ s1

jt (p⇤ t ) s1 jt (pt)

⌘i = 0

I Thus optimal market shares give consistent estimates

ˆ b !

p b.

slide-21
SLIDE 21

Robust Prior

I What happens if we are not exactly right about prior, i.e.,

Fp|zt ⇡ F 0

pt|zt? I Use the “Robust Priors” approach of Arellano and

Bonhomme (ECMA 2009).

Theorem

If the prior Ft 6= F 0

t is not exact then

E h z0

jt

⇣ s1

jt ( ˆ

pt) s1

jt (pt)

⌘i = n1

t KLIC

  • F 0

t , Ft

+ o

  • n1

t

  • I So long as prior is sensible (and nt relatively large) the bias

reduction will be good (and much better than the the implicit MLE prior)

slide-22
SLIDE 22

Dirichlet and the Long Tail

I Dirichlet is a conjugate prior

I gives closed form optimal shares p⇤

t I Dirichlet prior also gives rise to the long tail

I A key feature of demand data.

Theorem

If qt ⇠ MN (pt, nt) and pt ⇠ Dir (a · 1Jt+1) (symmetric Dirichlet) then (for large Jt) the quantity histogram will exhibit the long tail shape (Pareto decay)

I A restatement of Chen (1980) on probability foundations

for Zipf’s Law

I a is the concentration parameter

slide-23
SLIDE 23

An Illustration

500 Products and 10,000 consumers

Figure : Zipf’s Law and the Symmetric Dirichlet

slide-24
SLIDE 24

Picking the Prior

I Jeffrey’s prior

pt | zt ⇠ Dir (.5 · 1Jt+1)

I If pt ⇠ Dir (a · 1Jt+1) and qt ⇠ MN (pt, nt) then

qt ⇠ DirichletMultinomial (a)

I ˆ

a can be estimated with MLE.

I More generally we can allow ajt = gzjt and estimate ˆ

g (built into Stata).

I We can also allow for mixtures of Dirichlet priors for

increased flexibility at little analytic cost

I Posterior is also a mixture of Dirichlet distributions

slide-25
SLIDE 25

Mixed Logit

I All the theory generalizes to mixed logit models:

(ˆ l0

Bayes, ˆ

b0

Bayes)0 = arg min l,b ¯

mBayes

T

(l, b)0WT ¯ mBayes

T

(l, b), (0.1) where ¯ mBayes

T

(l, b) = T 1 ∑T

t=1 mBayes t

(l, b) with mBayes

t

(l, b) = J1

t Jt

j=1

zjt[s1

j

(pBayes

t

, xt; l) x0

jtb].

(0.2) and pBayes

t

:= s(dpost

t

(l|qt), xt; l). (0.3)

I s1 j

(pt, xt; l) ⇡ log ⇣ pjt

p0t

⌘ with second order approximation (Gandhi and Nevo 2013)

I Log of zero is the first order problem for mixed logit I Can use logit optimal shares as an approximation to the

  • ptimal shares in general.
slide-26
SLIDE 26

Monte Carlo I: Binary Logit

I DGP

I utility function: uit =

( a + bxt + xt + eit inside good e0t

  • utside good

I random draws: xt ⇠ Uniform [0, 15], eit ⇠ T1EV ,

xt ⇠ N

  • 0, .52 ⇥ xt

I b = 1, a varies to produce different fractions of zeros

I Results

Fraction of Zeros 16.48% 36.90% 49.19% 63.70% Empirical Share .3833 .6589 .7965 .9424 Laplace Share .2546 .5394 .6978 .8476 Optimal Share

  • .0798
  • .0924
  • .0066

.0362 Note: T = 500, n = 10, 000, Number of Repetitions= 1, 000.

slide-27
SLIDE 27

Monte Carlo II: Nested Logit

I DGP

I utility function (Berry 1994):

uit = a + bxt + xt + ⇥ ∑g djgzig + (1 l) eit ⇤

I nesting structure g: {0}, {1, ..., 25}, {26, ..., 50} with nesting

parameter l

I parameter of interest: b (true value = -1), l (true value = .5) I vary a to produce different fractions of zeros

I Results

Fraction b l

  • f Zeros

ES LS OS ES LS OS 13.3%

  • .16

.28 .03

  • .06

.07

  • .01

20.4%

  • .20

.32 .03 0.07 .10

  • .01

32.3%

  • .27

.33 .02 0.11 .13

  • .01

48.1%

  • .38

.31 .00

  • .15

.13

  • .02

Note: J = 50, T = 500, n = 15, 000 Number of Repetitions= 1000

slide-28
SLIDE 28

Application: Loss Leader Hypothesis

10 20 30 40 50 60 70 80 90 100 4 2 2 4 6 Week

price(standarized) quantity(standarized)

slide-29
SLIDE 29

Application: Testing the Loss Leader Hypothesis

I Chevalier, Kashyap, and Rossi [2003] introduce a test:

I When a product becomes more popular its price can fall.

I Category specific effect distinguishes loss-leader from

  • ther theories of countercylical prices (Warner and Barsky

[1995])

I Big empirical effect for tuna during Lent.

slide-30
SLIDE 30

Tuna demand

slide-31
SLIDE 31

What does an ounce of tuna cost?

I Index weeks in the data by t, stores by s, and the UPC’s by

j.

I sjst = market share (in ounces) in week t of tuna j in store s. I pjst = price/oz of tuna for tuna j in store s at time t.

I Price index for tuna in week t is

Pt = ∑

j,s

sjst log pjst.

I Actual average (Nevo and Hatzitaskos [2006])

¯ Pt = 1 Nt ∑

j,s

log pjst.

slide-32
SLIDE 32

Tuna during Lent

Table : Regression of Price Index on Lent

P ¯ P (Price Index) (Average Price) Lent

  • .163
  • .021

s.e. (.0004) (.0003)

slide-33
SLIDE 33

What is happening?

I The market share of cheaper UPC’s in week is going up in

high demand period. Why?

I Demand Story: demand is more elastic in the high

demand period (consistent with Warner and Barsky [1995])

I Supply Story: retailer more aggressively promotes bigger

discounts in the high demand period inducing (consistent with loss leader)

slide-34
SLIDE 34

Logit and Nested Logit

Table : Demand in Lent vs. Non-Lent Price

  • Avg. Own

Coefficient Price Elast. Lent Non Lent Non Share Lent Lent Logit Empirical.

  • .60
  • .50
  • .89
  • .75

(.019) (.005) Optimal.

  • 1.96
  • 2.01
  • 2.90
  • 3.01

(.027) (.008) Nested Empirical.

  • .57
  • .52
  • 1.39
  • 1.54

Logit (.014) (.003) Optimal.

  • 1.02
  • .98
  • 5.81
  • 7.79

(.015) (.003)

slide-35
SLIDE 35

Key Variation in Data

I Sale = 5% reduction (or more) from high price of previous

3 weeks.

Table : Regression of Sales Price Index on Lent

P ¯ P (Price Index) (Average Price) Sale Regular Sale Regular Lent

  • .199

.035 .010 .001 s.e. (.0017) (.0003) (.0016) (.0003)

slide-36
SLIDE 36

How Big is Promotion Effect

I Turn off sales in data.

I Set all psale

jt

= pregular

jt I Leave promotions (deal + unobservable) the same in data. I Predict new quantities ˜

qjt.

I Use ˜

qjt to form counterfactual price index (with original price). Pt = ∑

j,s

˜ sjst log pjst

I Isolates the change in demand (composition) due only to

promotions changing in high demand period.

slide-37
SLIDE 37

Result

Table : Original Regression

P ¯ P (Price Index) (Average Price) Lent

  • .163
  • .021

s.e. (.0004) (.0003)

Table : Counterfactual Regression

P ¯ P (Price Index) (Average Price) Lent

  • .14
  • .02

s.e. (.0005) (.0003)

slide-38
SLIDE 38

Basic Story

I It is not that tuna prices are cheaper during Lent I Instead consumers are more likely to be aware of sales

prices (through promotional effort) in the high demand period.

I This change in promotional effort steers demand to more

discounted products.

I Demand steering explains most of the change in the price

index.

Table : Observed Promotions and Discounts

5% Sale 10% Sale 25% Sale 50% sale Non-Lent .62 .63 .64 .49 Lent .70 .72 .84 .80

slide-39
SLIDE 39

Conclusion

I We provide an approach to estimating discrete choice

models with market zeroes

I We apply our approach to estimating demand with

scanner data

I revisit the loss leader hypothesis debate

I We can nest our approach within a dynamic stockpiling

model (along lines of Hendel an Nevo 2006).

I Market zeroes arise in many settings where our approach

is applicable

I bilateral trade flows I crime regressions

slide-40
SLIDE 40

Judith A Chevalier, Anil K Kashyap, and Peter E Rossi. Why don’t prices rise during periods of peak demand? evidence from scanner data. American Economic Review, 93(1):15–37, 2003. Aviv Nevo and Konstantinos Hatzitaskos. Why does the average price paid fall during high demand periods? Technical report, CSIO working paper, 2006. Elizabeth J Warner and Robert B Barsky. The timing and magnitude of retail store markdowns: evidence from weekends and holidays. The Quarterly Journal of Economics, 110(2):321–352, 1995.