1
Computer Mediated Transactions
Hal Varian Google April 7
Computer Mediated Transactions Hal Varian Google April 7 1 - - PowerPoint PPT Presentation
Computer Mediated Transactions Hal Varian Google April 7 1 Outline -- what does CMT enable? There is now a computer in the middle of most economic transactions. What does this enable? 1. Data extraction and analysis 2. Personalization and
1
Hal Varian Google April 7
Outline -- what does CMT enable?
Initial claims: good leading indicator for recessions Grey bars indicate recessions
Google Correlate with initial claims data
Initial claims and [unemployment filing]
Nowcasting initial claims
Predict NSA initial claims (yt), using lagged values of initial claims and contemporaneous queries on [unemployment filing] (xt) Base: yt = a0 + a1 yt-1 + a52 yt-52 + et Trends: yt = a0 + a1 yt-1 + a52 yt-52 + b xt + et Result: R2 goes from 80.8% to 87.6%
How can we make variable selection easier? Big data
Rows or columns?
How to choose best predictors?
Simple correlation? Judgment? Stepwise regression? Lasso, LARS, Elastic Net?
Spike-and-slab regression
Kalman filter for trend and seasonality George-McCulloch [1997]) ;Madigan-Raftery [1994] for regression Prior probability variable is included (spike) Prior probability distribution over coefficient value (slab) Sample from simulated posterior, average to get prediction See Scott and Varian (2012, 2013) for details Download R package from CRAN (BoomSpikeSlab, bsts)
New Home Sales in US
Raw correlation
Predictors chosen by model
Visualize how much each predictor contributes to model fit
Trend
Seasonal
[appreciation rate]
[irs 1031]
[century 21 realtors]
[real estate purchase]
[80-20 mortgage]
One month ahead forecast
Does 23% better than simple AR1 model
Geo-amplification
You can do the same thing for any geographically distributed variable Find out queries or query categories that are predictive of that variable Make predictions/extrapolations to other geographies Many applications Social science Policy Marketing Politics Example: New York Times index of “hard places” (June 26, 2014)
Where are the hardest places to live in the U.S.?
What queries are associated with “hard places”?
Based on state level data and Google Correlate
What queries are associated with “easy places”?
Based on state level data and Google Correlate
Assembled in America
Predictors of survey response
Top and bottom cities' predicted score
Kershaw, SC: 83.2 % Summersville, WV: 82.8 % Grundy, VA: 82.8 % Chesnee, SC: 82.7 % Duffield, VA: 82.5 % Norton, VA: 82.3 % Jonesville, VA: 82.2 % Walnut Cove, NC: 82.2 % Weston, WV: 82.2 % Ennice, NC: 82.1 % Calipatria, CA: 40.2 % Fremont, CA: 40.2 % Mountain View, CA: 40.8 % San Jose, CA: 41.4 % Berkeley, CA: 41.4 % Redmond, WA: 41.5 % Glendale, CA: 41.5 % Cupertino, CA: 41.6 % Palo Alto, CA: 41.7 % Daggett, CA: 41.9 %
Top Bottom
Assembled in America by DMA
You want randomized experiments to reduce systematic
Impact of class size on performance
Impact of ad impressions on movie revenue Super Bowl facts
Experimentation capability should be coded in static code: const threshold = 3.14 if (x > threshold) do something learning code: param threshold = {3.13, 3.14, 3.15) performance = (num_right, num_wrong) if (x > threshold) do something report performance Research challenge: How to turn legacy code into learning code? Nice example: Keith Winstein et al, An Experimental Study of the Learnability of Congestion Control
pay you.”
payment.”
discount.”
store, I will pay you.”
treat-compare cycle
become viable
Impact of class size on performance
What would happen to auto fatalities if you changed the minimum drinking age?
causal effect