Estimating Discrete Choice Models with Market Level Zeroes: An - - PowerPoint PPT Presentation
Estimating Discrete Choice Models with Market Level Zeroes: An - - PowerPoint PPT Presentation
Estimating Discrete Choice Models with Market Level Zeroes: An Application to Scanner Data Amit Gandhi, Zhentong Lu, Xiaoxia Shi University of Wisconsin-Madison February 3, 2015 Introduction I Zeroes are highly prevalent in choice data I
Introduction
I Zeroes are highly prevalent in choice data
I =Discrete choice models (a la McFadden) were designed to
explain corner solutions in individual demand
I Our research program: The empirical analysis of choice
data with market zeroes.
I Zero demand for a choice alternative after summing over
sample of consumers in a market
I A major feature of choice data from a diversity of
environments
I Causes serious problems for standard estimation
techniques
Scanner Data
I Store level scanner data covering all Dominick’s Finer
Foods (DFF) stores in Chicago from 1989-1997
I ⇡ 80 stores over 300 weeks
I For each week/store/UPC (universal product code)
- bservation:
I price I quantity I marketing (display, feature etc) I product characteristics (brand, size, premium etc) I wholesale price
Product Variety
Category Avg No of UPC’s in a Store/Week Percent of Total Sale of the Top 20% UPC’s Percent of Zero Sales Analgesics 224 80.12% 58.02% Beer 179 87.18% 50.45% Bottled Juices 187 74.40% 29.87% Cereals 212 72.08% 27.14% Canned Soup 218 76.25% 19.80% Fabric Softeners 123 65.74% 43.74% Laundry Detergents 200 65.52% 50.46% Refrigerated Juices 91 83.18% 27.83% Soft Drinks 537 91.21% 38.54% Toothbrushes 137 73.69% 58.63% Canned Tuna 118 82.74% 35.34% Toothpastes 187 74.19% 51.93% Bathroom Tissues 50 84.06% 28.14%
Long Tail
Long Tail
A “Big Data” Problem
I Quan and Williams (2014): data on 13.5 million shoe sales
across 100,000 products from online retailer
A “Big Data” Problem
I Marwell (2014): collects daily data on project donations to
Kickstarter
Kickstarter #Daily %Zero #Projects 90,876 Projects Contribution #Days 555 Mean 4,713 0.59 Project-Day Obs. 2,615,839
- Std. Dev
1,706 0.14
A “not so Big Data” Problem
I Nurski and Verboven (2013): Belgian data on 488 car
models in 588 towns for 2 consumer types (men and women).
Discrete Choice Model
I Classic McFadden (1973, 1980) discrete choice model I Markets are t = 1, . . . , T are the store/week realizations (a
menu of products, prices, and promotion)
I Products j = 1, . . . , Jt with attributes xjt 2 Rdw I Consumers i = 1, . . . , Nt with “demographics” wit 2 Rdz
uijt = ( djt + witΓxjt + eijt if j > 0 ei0t if j = 0
I BLP (1995) add the new layer:
djt = bxjt + xjt
The Zeroes Problem
I Consider simplest case of “simple logit” (Γ = 0)
log ✓ sjt s0t ◆ = bxjt + xjt jt = 1, . . . , JT where E [xjt | zjt] = 0.
I If sjt = 0 then log (sjt) does not exist (or can only be
defined as ∞).
I However dropping zeroes induces selection bias
E [xjt | zjt, sjt > 0] 6= 0
I IV estimation asymptotically biased for b (which can be
severe).
I Will depend on the strength of this selection effect.
Questions
I Why would the model generate an estimating equation
that can’t be estimated with the data?
I Is it a deep rejection of the choice model? I Is it a problem with the empirical strategy of taking model
to data?
Identification
I For simplicity focus on “simple logit” (Γ = 0)
pjt = edjt ∑Jt
k=0 edkt
j = 0, . . . , Jt.
I consumer variation
djt = log ✓ pjt p0t ◆ = s1
j
(pt)
I products/markets variation
djt = bxjt + xjt = ) b =
- E
⇥ z0
jtxjt
⇤1 E ⇥ z0
jtdjt
⇤
Estimation
I Standard estimation (aka BLP) uses sample analogues of
both stages.
I Replace pjt with ˆ
pMLE
j
= sjt = ∑i yijt nt which implies ˆ dMLE
jt
= s1
j
(sjt)
I Plug ˆ
dMLE
jt
into 2SLS b b =
T
∑
t=1 Jt
∑
j=1
⇥ z0
jtxjt
⇤ !1 T
∑
t=1 Jt
∑
j=1
h z0
jt ˆ
dMLE
jt
i
What is happening?
I Source of problem is ˆ
djt = s1
j
ˆ pMLE
t
- I does not exist when ˆ
pMLE
jt
= 0
I Why use MLE in the first place?
I MLE is a potentially bad when choice data is sparse.
I Very old problem
I Laplace’s “Law of Succession” I Multinomial cell probabilities and sparse contingency
tables
I Zeroes arise when some pjt’s are small and nt is finite.
I Treating nt as finite but JT ! ∞ makes pjt and hence djt an
incidental parameter.
Bayesian Analysis of Multinomial Cells
I Consider multinomial probabilities
pt = (p0t, . . . , pJtt) 2 ∆Jt
I We observe quantities qt = (q0t, q1t, . . . , qJtt) for nt
consumers
I The likelihood of pt is qt ⇠ MN (nt, pt) I Conjugate prior is pt ⇠ Dir (a0t, . . . , aJtt)
I Uniform prior: ajt = 1 (Laplace/De Morgan) I Non-informative prior: ajt = .5 (Jeffreys/Bernardo)
I Posterior is
pt | qt, nt ⇠ Dir (a0t + q0t, a1t + q1t, . . . , aJtt + qJtt)
Laplace’s “Law of Succession”
I “What is the probability the sun will rise tomorrow given
that it has risen everyday until now?”
I He used a uniform prior ajt = 1 I Bayesian estimate ˆ
pjt = E [pjt | qt, nt] = qjt + 1 nt + Jt + 1
I ˆ
pjt “shrinks” empirical share sjt towards prior mean 1/ (Jt + 1)
I ˆ
pjt is consistent (like sjt), i.e., ˆ pjt !
p pjt
I Data dominates the prior in large samples
Demand Application
I We want to estimate ˆ
djt = E log ✓ pjt p0t ◆ | qt, nt
- = y (ajt + qjt) y (a0t + q0t)
where y is the digamma function.
I Use ˆ
dt to compute “optimal market shares” p⇤
kt =
exp ˆ dkt
- 1 + ∑Jt
j=1 exp
ˆ djt
- I Plug optimal shares into 2SLS
b b =
T
∑
t=1 Jt
∑
j=1
⇥ z0
jtxjt
⇤ !1 T
∑
t=1 Jt
∑
j=1
h z0
jts1 j
(p⇤
t )
i
Why is this a good estimator?
I We take a “Frequentist” interpretation of the prior
I “Empirical Bayes” approach.
I Choice probabilities pt are the endogenous variable of the
structural model.
I Let zt = (z1t, . . . , zJtt) be the the collection of exogenous
variables
I Then the conditional distribution pt | zt is the reduced form
- f the structural model
I Prior distribution = Reduced form
Asymptotic Bias
I Finite nt implies ˆ
b will in general have asymptotic bias.
I plimJT!∞ ˆ
b = b + Q1
xz E
h z0
jt
⇣ s1
jt ( ˆ
pt) s1
jt (pt)
⌘i where Qxz = E h z0
jtxjt
i .
Theorem
If optimal market shares p⇤
t are constructed from the “correct prior”
Fp|zt = F 0
pt|zt then
E h z0
jt
⇣ s1
jt (p⇤ t ) s1 jt (pt)
⌘i = 0
I Thus optimal market shares give consistent estimates
ˆ b !
p b.
Robust Prior
I What happens if we are not exactly right about prior, i.e.,
Fp|zt ⇡ F 0
pt|zt? I Use the “Robust Priors” approach of Arellano and
Bonhomme (ECMA 2009).
Theorem
If the prior Ft 6= F 0
t is not exact then
E h z0
jt
⇣ s1
jt ( ˆ
pt) s1
jt (pt)
⌘i = n1
t KLIC
- F 0
t , Ft
+ o
- n1
t
- I So long as prior is sensible (and nt relatively large) the bias
reduction will be good (and much better than the the implicit MLE prior)
Dirichlet and the Long Tail
I Dirichlet is a conjugate prior
I gives closed form optimal shares p⇤
t I Dirichlet prior also gives rise to the long tail
I A key feature of demand data.
Theorem
If qt ⇠ MN (pt, nt) and pt ⇠ Dir (a · 1Jt+1) (symmetric Dirichlet) then (for large Jt) the quantity histogram will exhibit the long tail shape (Pareto decay)
I A restatement of Chen (1980) on probability foundations
for Zipf’s Law
I a is the concentration parameter
An Illustration
500 Products and 10,000 consumers
Figure : Zipf’s Law and the Symmetric Dirichlet
Picking the Prior
I Jeffrey’s prior
pt | zt ⇠ Dir (.5 · 1Jt+1)
I If pt ⇠ Dir (a · 1Jt+1) and qt ⇠ MN (pt, nt) then
qt ⇠ DirichletMultinomial (a)
I ˆ
a can be estimated with MLE.
I More generally we can allow ajt = gzjt and estimate ˆ
g (built into Stata).
I We can also allow for mixtures of Dirichlet priors for
increased flexibility at little analytic cost
I Posterior is also a mixture of Dirichlet distributions
Mixed Logit
I All the theory generalizes to mixed logit models:
(ˆ l0
Bayes, ˆ
b0
Bayes)0 = arg min l,b ¯
mBayes
T
(l, b)0WT ¯ mBayes
T
(l, b), (0.1) where ¯ mBayes
T
(l, b) = T 1 ∑T
t=1 mBayes t
(l, b) with mBayes
t
(l, b) = J1
t Jt
∑
j=1
zjt[s1
j
(pBayes
t
, xt; l) x0
jtb].
(0.2) and pBayes
t
:= s(dpost
t
(l|qt), xt; l). (0.3)
I s1 j
(pt, xt; l) ⇡ log ⇣ pjt
p0t
⌘ with second order approximation (Gandhi and Nevo 2013)
I Log of zero is the first order problem for mixed logit I Can use logit optimal shares as an approximation to the
- ptimal shares in general.
Monte Carlo I: Binary Logit
I DGP
I utility function: uit =
( a + bxt + xt + eit inside good e0t
- utside good
I random draws: xt ⇠ Uniform [0, 15], eit ⇠ T1EV ,
xt ⇠ N
- 0, .52 ⇥ xt
I b = 1, a varies to produce different fractions of zeros
I Results
Fraction of Zeros 16.48% 36.90% 49.19% 63.70% Empirical Share .3833 .6589 .7965 .9424 Laplace Share .2546 .5394 .6978 .8476 Optimal Share
- .0798
- .0924
- .0066
.0362 Note: T = 500, n = 10, 000, Number of Repetitions= 1, 000.
Monte Carlo II: Nested Logit
I DGP
I utility function (Berry 1994):
uit = a + bxt + xt + ⇥ ∑g djgzig + (1 l) eit ⇤
I nesting structure g: {0}, {1, ..., 25}, {26, ..., 50} with nesting
parameter l
I parameter of interest: b (true value = -1), l (true value = .5) I vary a to produce different fractions of zeros
I Results
Fraction b l
- f Zeros
ES LS OS ES LS OS 13.3%
- .16
.28 .03
- .06
.07
- .01
20.4%
- .20
.32 .03 0.07 .10
- .01
32.3%
- .27
.33 .02 0.11 .13
- .01
48.1%
- .38
.31 .00
- .15
.13
- .02
Note: J = 50, T = 500, n = 15, 000 Number of Repetitions= 1000
Application: Loss Leader Hypothesis
10 20 30 40 50 60 70 80 90 100 4 2 2 4 6 Week
price(standarized) quantity(standarized)
Application: Testing the Loss Leader Hypothesis
I Chevalier, Kashyap, and Rossi [2003] introduce a test:
I When a product becomes more popular its price can fall.
I Category specific effect distinguishes loss-leader from
- ther theories of countercylical prices (Warner and Barsky
[1995])
I Big empirical effect for tuna during Lent.
Tuna demand
What does an ounce of tuna cost?
I Index weeks in the data by t, stores by s, and the UPC’s by
j.
I sjst = market share (in ounces) in week t of tuna j in store s. I pjst = price/oz of tuna for tuna j in store s at time t.
I Price index for tuna in week t is
Pt = ∑
j,s
sjst log pjst.
I Actual average (Nevo and Hatzitaskos [2006])
¯ Pt = 1 Nt ∑
j,s
log pjst.
Tuna during Lent
Table : Regression of Price Index on Lent
P ¯ P (Price Index) (Average Price) Lent
- .163
- .021
s.e. (.0004) (.0003)
What is happening?
I The market share of cheaper UPC’s in week is going up in
high demand period. Why?
I Demand Story: demand is more elastic in the high
demand period (consistent with Warner and Barsky [1995])
I Supply Story: retailer more aggressively promotes bigger
discounts in the high demand period inducing (consistent with loss leader)
Logit and Nested Logit
Table : Demand in Lent vs. Non-Lent Price
- Avg. Own
Coefficient Price Elast. Lent Non Lent Non Share Lent Lent Logit Empirical.
- .60
- .50
- .89
- .75
(.019) (.005) Optimal.
- 1.96
- 2.01
- 2.90
- 3.01
(.027) (.008) Nested Empirical.
- .57
- .52
- 1.39
- 1.54
Logit (.014) (.003) Optimal.
- 1.02
- .98
- 5.81
- 7.79
(.015) (.003)
Key Variation in Data
I Sale = 5% reduction (or more) from high price of previous
3 weeks.
Table : Regression of Sales Price Index on Lent
P ¯ P (Price Index) (Average Price) Sale Regular Sale Regular Lent
- .199
.035 .010 .001 s.e. (.0017) (.0003) (.0016) (.0003)
How Big is Promotion Effect
I Turn off sales in data.
I Set all psale
jt
= pregular
jt I Leave promotions (deal + unobservable) the same in data. I Predict new quantities ˜
qjt.
I Use ˜
qjt to form counterfactual price index (with original price). Pt = ∑
j,s
˜ sjst log pjst
I Isolates the change in demand (composition) due only to
promotions changing in high demand period.
Result
Table : Original Regression
P ¯ P (Price Index) (Average Price) Lent
- .163
- .021
s.e. (.0004) (.0003)
Table : Counterfactual Regression
P ¯ P (Price Index) (Average Price) Lent
- .14
- .02
s.e. (.0005) (.0003)
Basic Story
I It is not that tuna prices are cheaper during Lent I Instead consumers are more likely to be aware of sales
prices (through promotional effort) in the high demand period.
I This change in promotional effort steers demand to more
discounted products.
I Demand steering explains most of the change in the price
index.
Table : Observed Promotions and Discounts
5% Sale 10% Sale 25% Sale 50% sale Non-Lent .62 .63 .64 .49 Lent .70 .72 .84 .80
Conclusion
I We provide an approach to estimating discrete choice
models with market zeroes
I We apply our approach to estimating demand with
scanner data
I revisit the loss leader hypothesis debate
I We can nest our approach within a dynamic stockpiling
model (along lines of Hendel an Nevo 2006).
I Market zeroes arise in many settings where our approach
is applicable
I bilateral trade flows I crime regressions