[PPT] - Estimating Discrete Choice Models with Market Level Zeroes: An PowerPoint Presentation

SLIDE 1

Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data

Amit Gandhi, Zhentong Lu, Xiaoxia Shi

University of Wisconsin-Madison

February 3, 2015

SLIDE 2

Introduction

I Zeroes are highly prevalent in choice data

I =Discrete choice models (a la McFadden) were designed to

explain corner solutions in individual demand

I Our research program: The empirical analysis of choice

data with market zeroes.

I Zero demand for a choice alternative after summing over

sample of consumers in a market

I A major feature of choice data from a diversity of

environments

I Causes serious problems for standard estimation

techniques

SLIDE 3

Scanner Data

I Store level scanner data covering all Dominick’s Finer

Foods (DFF) stores in Chicago from 1989-1997

I ⇡ 80 stores over 300 weeks

I For each week/store/UPC (universal product code)

bservation:

I price I quantity I marketing (display, feature etc) I product characteristics (brand, size, premium etc) I wholesale price

SLIDE 4

Product Variety

Category Avg No of UPC’s in a Store/Week Percent of Total Sale of the Top 20% UPC’s Percent of Zero Sales Analgesics 224 80.12% 58.02% Beer 179 87.18% 50.45% Bottled Juices 187 74.40% 29.87% Cereals 212 72.08% 27.14% Canned Soup 218 76.25% 19.80% Fabric Softeners 123 65.74% 43.74% Laundry Detergents 200 65.52% 50.46% Refrigerated Juices 91 83.18% 27.83% Soft Drinks 537 91.21% 38.54% Toothbrushes 137 73.69% 58.63% Canned Tuna 118 82.74% 35.34% Toothpastes 187 74.19% 51.93% Bathroom Tissues 50 84.06% 28.14%

SLIDE 5

Long Tail

SLIDE 6

Long Tail

SLIDE 7

A “Big Data” Problem

I Quan and Williams (2014): data on 13.5 million shoe sales

across 100,000 products from online retailer

SLIDE 8

A “Big Data” Problem

I Marwell (2014): collects daily data on project donations to

Kickstarter

Kickstarter #Daily %Zero #Projects 90,876 Projects Contribution #Days 555 Mean 4,713 0.59 Project-Day Obs. 2,615,839

Std. Dev

1,706 0.14

SLIDE 9

A “not so Big Data” Problem

I Nurski and Verboven (2013): Belgian data on 488 car

models in 588 towns for 2 consumer types (men and women).

SLIDE 10

Discrete Choice Model

I Classic McFadden (1973, 1980) discrete choice model I Markets are t = 1, . . . , T are the store/week realizations (a

menu of products, prices, and promotion)

I Products j = 1, . . . , Jt with attributes xjt 2 Rdw I Consumers i = 1, . . . , Nt with “demographics” wit 2 Rdz

uijt = ( djt + witΓxjt + eijt if j > 0 ei0t if j = 0

I BLP (1995) add the new layer:

djt = bxjt + xjt

SLIDE 11

The Zeroes Problem

I Consider simplest case of “simple logit” (Γ = 0)

log ✓ sjt s0t ◆ = bxjt + xjt jt = 1, . . . , JT where E [xjt | zjt] = 0.

I If sjt = 0 then log (sjt) does not exist (or can only be

defined as ∞).

I However dropping zeroes induces selection bias

E [xjt | zjt, sjt > 0] 6= 0

I IV estimation asymptotically biased for b (which can be

severe).

I Will depend on the strength of this selection effect.

SLIDE 12

Questions

I Why would the model generate an estimating equation

that can’t be estimated with the data?

I Is it a deep rejection of the choice model? I Is it a problem with the empirical strategy of taking model

to data?

SLIDE 13

Identification

I For simplicity focus on “simple logit” (Γ = 0)

pjt = edjt ∑Jt

k=0 edkt

j = 0, . . . , Jt.

I consumer variation

djt = log ✓ pjt p0t ◆ = s1

j

(pt)

I products/markets variation

djt = bxjt + xjt = ) b =

E

⇥ z0

jtxjt

⇤1 E ⇥ z0

jtdjt

⇤

SLIDE 14

Estimation

I Standard estimation (aka BLP) uses sample analogues of

both stages.

I Replace pjt with ˆ

pMLE

j

= sjt = ∑i yijt nt which implies ˆ dMLE

jt

= s1

j

(sjt)

I Plug ˆ

dMLE

jt

into 2SLS b b =

T

∑

t=1 Jt

∑

j=1

⇥ z0

jtxjt

⇤ !1 T

∑

t=1 Jt

∑

j=1

h z0

jt ˆ

dMLE

jt

i

SLIDE 15

What is happening?

I Source of problem is ˆ

djt = s1

j

ˆ pMLE

t

I does not exist when ˆ

pMLE

jt

= 0

I Why use MLE in the first place?

I MLE is a potentially bad when choice data is sparse.

I Very old problem

I Laplace’s “Law of Succession” I Multinomial cell probabilities and sparse contingency

tables

I Zeroes arise when some pjt’s are small and nt is finite.

I Treating nt as finite but JT ! ∞ makes pjt and hence djt an

incidental parameter.

SLIDE 16

Bayesian Analysis of Multinomial Cells

I Consider multinomial probabilities

pt = (p0t, . . . , pJtt) 2 ∆Jt

I We observe quantities qt = (q0t, q1t, . . . , qJtt) for nt

consumers

I The likelihood of pt is qt ⇠ MN (nt, pt) I Conjugate prior is pt ⇠ Dir (a0t, . . . , aJtt)

I Uniform prior: ajt = 1 (Laplace/De Morgan) I Non-informative prior: ajt = .5 (Jeffreys/Bernardo)

I Posterior is

pt | qt, nt ⇠ Dir (a0t + q0t, a1t + q1t, . . . , aJtt + qJtt)

SLIDE 17

Laplace’s “Law of Succession”

I “What is the probability the sun will rise tomorrow given

that it has risen everyday until now?”

I He used a uniform prior ajt = 1 I Bayesian estimate ˆ

pjt = E [pjt | qt, nt] = qjt + 1 nt + Jt + 1

I ˆ

pjt “shrinks” empirical share sjt towards prior mean 1/ (Jt + 1)

I ˆ

pjt is consistent (like sjt), i.e., ˆ pjt !

p pjt

I Data dominates the prior in large samples

SLIDE 18

Demand Application

I We want to estimate ˆ

djt = E  log ✓ pjt p0t ◆ | qt, nt

= y (ajt + qjt) y (a0t + q0t)

where y is the digamma function.

I Use ˆ

dt to compute “optimal market shares” p⇤

kt =

exp ˆ dkt

1 + ∑Jt

j=1 exp

ˆ djt

I Plug optimal shares into 2SLS

b b =

T

∑

t=1 Jt

∑

j=1

⇥ z0

jtxjt

⇤ !1 T

∑

t=1 Jt

∑

j=1

h z0

jts1 j

(p⇤

t )

i

SLIDE 19

Why is this a good estimator?

I We take a “Frequentist” interpretation of the prior

I “Empirical Bayes” approach.

I Choice probabilities pt are the endogenous variable of the

structural model.

I Let zt = (z1t, . . . , zJtt) be the the collection of exogenous

variables

I Then the conditional distribution pt | zt is the reduced form

f the structural model

I Prior distribution = Reduced form

SLIDE 20

Asymptotic Bias

I Finite nt implies ˆ

b will in general have asymptotic bias.

I plimJT!∞ ˆ

b = b + Q1

xz E

h z0

jt

⇣ s1

jt ( ˆ

pt) s1

jt (pt)

⌘i where Qxz = E h z0

jtxjt

i .

Theorem

If optimal market shares p⇤

t are constructed from the “correct prior”

Fp|zt = F 0

pt|zt then

E h z0

jt

⇣ s1

jt (p⇤ t ) s1 jt (pt)

⌘i = 0

I Thus optimal market shares give consistent estimates

ˆ b !

p b.

SLIDE 21

Robust Prior

I What happens if we are not exactly right about prior, i.e.,

Fp|zt ⇡ F 0

pt|zt? I Use the “Robust Priors” approach of Arellano and

Bonhomme (ECMA 2009).

Theorem

If the prior Ft 6= F 0

t is not exact then

E h z0

jt

⇣ s1

jt ( ˆ

pt) s1

jt (pt)

⌘i = n1

t KLIC

F 0

t , Ft

+ o

n1

t

I So long as prior is sensible (and nt relatively large) the bias

reduction will be good (and much better than the the implicit MLE prior)

SLIDE 22

Dirichlet and the Long Tail

I Dirichlet is a conjugate prior

I gives closed form optimal shares p⇤

t I Dirichlet prior also gives rise to the long tail

I A key feature of demand data.

Theorem

If qt ⇠ MN (pt, nt) and pt ⇠ Dir (a · 1Jt+1) (symmetric Dirichlet) then (for large Jt) the quantity histogram will exhibit the long tail shape (Pareto decay)

I A restatement of Chen (1980) on probability foundations

for Zipf’s Law

I a is the concentration parameter

SLIDE 23

An Illustration

500 Products and 10,000 consumers

Figure : Zipf’s Law and the Symmetric Dirichlet

SLIDE 24

Picking the Prior

I Jeffrey’s prior

pt | zt ⇠ Dir (.5 · 1Jt+1)

I If pt ⇠ Dir (a · 1Jt+1) and qt ⇠ MN (pt, nt) then

qt ⇠ DirichletMultinomial (a)

I ˆ

a can be estimated with MLE.

I More generally we can allow ajt = gzjt and estimate ˆ

g (built into Stata).

I We can also allow for mixtures of Dirichlet priors for

increased flexibility at little analytic cost

I Posterior is also a mixture of Dirichlet distributions

SLIDE 25

Mixed Logit

I All the theory generalizes to mixed logit models:

(ˆ l0

Bayes, ˆ

b0

Bayes)0 = arg min l,b ¯

mBayes

T

(l, b)0WT ¯ mBayes

T

(l, b), (0.1) where ¯ mBayes

T

(l, b) = T 1 ∑T

t=1 mBayes t

(l, b) with mBayes

t

(l, b) = J1

t Jt

∑

j=1

zjt[s1

j

(pBayes

t

, xt; l) x0

jtb].

(0.2) and pBayes

t

:= s(dpost

t

(l|qt), xt; l). (0.3)

I s1 j

(pt, xt; l) ⇡ log ⇣ pjt

p0t

⌘ with second order approximation (Gandhi and Nevo 2013)

I Log of zero is the first order problem for mixed logit I Can use logit optimal shares as an approximation to the

ptimal shares in general.

SLIDE 26

Monte Carlo I: Binary Logit

I DGP

I utility function: uit =

( a + bxt + xt + eit inside good e0t

utside good

I random draws: xt ⇠ Uniform [0, 15], eit ⇠ T1EV ,

xt ⇠ N

0, .52 ⇥ xt

I b = 1, a varies to produce different fractions of zeros

I Results

Fraction of Zeros 16.48% 36.90% 49.19% 63.70% Empirical Share .3833 .6589 .7965 .9424 Laplace Share .2546 .5394 .6978 .8476 Optimal Share

.0798
.0924
.0066

.0362 Note: T = 500, n = 10, 000, Number of Repetitions= 1, 000.

SLIDE 27

Monte Carlo II: Nested Logit

I DGP

I utility function (Berry 1994):

uit = a + bxt + xt + ⇥ ∑g djgzig + (1 l) eit ⇤

I nesting structure g: {0}, {1, ..., 25}, {26, ..., 50} with nesting

parameter l

I parameter of interest: b (true value = -1), l (true value = .5) I vary a to produce different fractions of zeros

I Results

Fraction b l

f Zeros

ES LS OS ES LS OS 13.3%

.16

.28 .03

.06

.07

.01

20.4%

.20

.32 .03 0.07 .10

.01

32.3%

.27

.33 .02 0.11 .13

.01

48.1%

.38

.31 .00

.15

.13

.02

Note: J = 50, T = 500, n = 15, 000 Number of Repetitions= 1000

SLIDE 28

Application: Loss Leader Hypothesis

10 20 30 40 50 60 70 80 90 100 4 2 2 4 6 Week

price(standarized) quantity(standarized)

SLIDE 29

Application: Testing the Loss Leader Hypothesis

I Chevalier, Kashyap, and Rossi [2003] introduce a test:

I When a product becomes more popular its price can fall.

I Category specific effect distinguishes loss-leader from

ther theories of countercylical prices (Warner and Barsky

[1995])

I Big empirical effect for tuna during Lent.

SLIDE 30

Tuna demand

SLIDE 31

What does an ounce of tuna cost?

I Index weeks in the data by t, stores by s, and the UPC’s by

j.

I sjst = market share (in ounces) in week t of tuna j in store s. I pjst = price/oz of tuna for tuna j in store s at time t.

I Price index for tuna in week t is

Pt = ∑

j,s

sjst log pjst.

I Actual average (Nevo and Hatzitaskos [2006])

¯ Pt = 1 Nt ∑

j,s

log pjst.

SLIDE 32

Tuna during Lent

Table : Regression of Price Index on Lent

P ¯ P (Price Index) (Average Price) Lent

.163
.021

s.e. (.0004) (.0003)

SLIDE 33

What is happening?

I The market share of cheaper UPC’s in week is going up in

high demand period. Why?

I Demand Story: demand is more elastic in the high

demand period (consistent with Warner and Barsky [1995])

I Supply Story: retailer more aggressively promotes bigger

discounts in the high demand period inducing (consistent with loss leader)

SLIDE 34

Logit and Nested Logit

Table : Demand in Lent vs. Non-Lent Price

Avg. Own

Coefficient Price Elast. Lent Non Lent Non Share Lent Lent Logit Empirical.

.60
.50
.89
.75

(.019) (.005) Optimal.

1.96
2.01
2.90
3.01

(.027) (.008) Nested Empirical.

.57
.52
1.39
1.54

Logit (.014) (.003) Optimal.

1.02
.98
5.81
7.79

(.015) (.003)

SLIDE 35

Key Variation in Data

I Sale = 5% reduction (or more) from high price of previous

3 weeks.

Table : Regression of Sales Price Index on Lent

P ¯ P (Price Index) (Average Price) Sale Regular Sale Regular Lent

.199

.035 .010 .001 s.e. (.0017) (.0003) (.0016) (.0003)

SLIDE 36

How Big is Promotion Effect

I Turn off sales in data.

I Set all psale

jt

= pregular

jt I Leave promotions (deal + unobservable) the same in data. I Predict new quantities ˜

qjt.

I Use ˜

qjt to form counterfactual price index (with original price). Pt = ∑

j,s

˜ sjst log pjst

I Isolates the change in demand (composition) due only to

promotions changing in high demand period.

SLIDE 37

Result

Table : Original Regression

P ¯ P (Price Index) (Average Price) Lent

.163
.021

s.e. (.0004) (.0003)

Table : Counterfactual Regression

P ¯ P (Price Index) (Average Price) Lent

.14
.02

s.e. (.0005) (.0003)

SLIDE 38

Basic Story

I It is not that tuna prices are cheaper during Lent I Instead consumers are more likely to be aware of sales

prices (through promotional effort) in the high demand period.

I This change in promotional effort steers demand to more

discounted products.

I Demand steering explains most of the change in the price

index.

Table : Observed Promotions and Discounts

5% Sale 10% Sale 25% Sale 50% sale Non-Lent .62 .63 .64 .49 Lent .70 .72 .84 .80

SLIDE 39

Conclusion

I We provide an approach to estimating discrete choice

models with market zeroes

I We apply our approach to estimating demand with

scanner data

I revisit the loss leader hypothesis debate

I We can nest our approach within a dynamic stockpiling

model (along lines of Hendel an Nevo 2006).

I Market zeroes arise in many settings where our approach

is applicable

I bilateral trade flows I crime regressions

SLIDE 40

Judith A Chevalier, Anil K Kashyap, and Peter E Rossi. Why don’t prices rise during periods of peak demand? evidence from scanner data. American Economic Review, 93(1):15–37, 2003. Aviv Nevo and Konstantinos Hatzitaskos. Why does the average price paid fall during high demand periods? Technical report, CSIO working paper, 2006. Elizabeth J Warner and Robert B Barsky. The timing and magnitude of retail store markdowns: evidence from weekends and holidays. The Quarterly Journal of Economics, 110(2):321–352, 1995.