Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball
Arthur Berg
Pennsylvania State University
Arthur Berg Pennsylvania State University Introduction Bayes - - PowerPoint PPT Presentation
Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg
Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball
Pennsylvania State University
Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg Standing Between a Bayesian and a Frequentist 2 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
English Mathematician Presbyterian Minister P(H∣E) = P(E∣H)P(H) P(E) Sir Ronald Fisher FRS (1890-1962) English Statistician Evolutionary Biologist, Geneticist —Let the data speak for itself.—
Arthur Berg Standing Between a Bayesian and a Frequentist 3 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
1st Goal: List the top 250 movies of all time. Movies are rated on a scale of 1 to 10. Some movies are rated by many people, and some by only a few. Movies with fewer than 3000 votes are not considered. All movies have an average rating of C = 6.9. ⋆ µi represents the mean rating by everyone who has seen movie i. ⋆ The real goal is to construct the best estimate of µi, then pick the top 250. The frequentist approach uses only ¯ Xi, the average rating for movie i. ˆ µ(Fisher)
i
= ¯ Xi The Bayesian approach shrinks ¯ Xi towards C with more shrinking applied when the number of votes for movie i is small. ˆ µ(Bayes)
i
= αi ¯ Xi + (1 − αi)C where αi ∈ (0,1)
Arthur Berg Standing Between a Bayesian and a Frequentist 4 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Rank WR R Title Votes 1 9.2 9.2 The Shawshank Redemption (1994) 546,155 2 9.1 9.2 The Godfather (1972) 427,961 3 9.0 9.0 The Godfather: Part II (1974) 257,643 4 8.9 9.0 The Good, the Bad and the Ugly (1966) 170,045 5 8.9 9.0 Pulp Fiction (1994) 436,456 6 8.9 8.9 Inception (2010) 265,531 7 8.9 8.9 Schindler’s List (1993) 289,170 8 8.9 8.9 12 Angry Men (1957) 126,983 9 8.8 8.9 One Flew Over the Cuckoo’s Nest (1975) 225,419 10 8.8 8.9 The Dark Knight (2008) 487,800 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 85 8.5 8.7 Black Swan (2010) 20,326 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 142 8.2 8.3 Avatar (2009) 285,005 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 240 8.0 8.5 True Grit (2010) 6,444
Arthur Berg Standing Between a Bayesian and a Frequentist 5 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
WRi = viRi + mC vi + m = vi vi + m ÜÜÜÜÜÜÜÜÜÜÜÜÜ
αi
Ri
Xi
+ m vi + m ÜÜÜÜÜÜÜÜÜÜÜÜÜ
1−αi
C
▸ Ri = average rating of the movie i ( ¯
Xi)
▸ vi = total number of votes from regular voters ▸ m = minimum # of votes to make the list = 3000 ▸ C = grand mean across all movies in the database = 6.9
Arthur Berg Standing Between a Bayesian and a Frequentist 6 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Xi = (Xi,1,...,Xi,vi) represents the vi ratings of movie i. prior: µi ∼ N(µ0,σ2
0)
conditional: Xi,j∣µi
iid
∼ N(µi,σ2) (j = 1,...,vi) ˆ µ
(Bayes)
i
= E[µi∣Xi] = ( vi vi + σ2/σ2 ) ¯ Xi + ( σ2/σ2 vi + σ2/σ2 )µ0 = vi vi + mRi + m vi + mC ⇒ µ0 = C, m = σ2/σ2
Arthur Berg Standing Between a Bayesian and a Frequentist 7 / 28
1 ¿Does shrinking really help? 2 ¿How much to shrink by?
Introduction Bayes Estimation Empirical Bayes Basketball
▸ In 1956, Charles Stein proved the existence of an estimator better than
the sample mean under certain assumptions.
▸ In 1961, Willard James and Charles Stein explicitly constructed such an
estimator.
Arthur Berg Standing Between a Bayesian and a Frequentist 9 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
µi ∼ N(µ0,σ2
0)
Xi∣µi
iid
∼ N(µi,σ2) (i = 1,...n) ˆ µ
(Bayes)
i
= E[µi∣Xi] = ( σ2 σ2
0 + σ2
ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
α
)µ0 + ( σ2 σ2
0 + σ2
ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
1−α
)Xi ˆ µ
(JS)
i
= ( (n − 3)σ2 ∑(Xi − ¯ X)2 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
α
) ¯ X + (1 − (n − 3)σ2 ∑(Xi − ¯ X)2 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ
1−α
)Xi In practice, if σ2 is unknown, an estimate is used.
Arthur Berg Standing Between a Bayesian and a Frequentist 10 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
2nd Goal: Predict final batting averages from pre-season performances. Pre-season batting averages for 18 major league players are provided. Season final batting averages for the same players are also recorded. Data is from the 1970 season and is published in JASA (1975) and Scientific American (1977) by Efron and Morris. The frequentist approach uses only Xi, the pre-season batting average for player i. ˆ p(Fisher)
i
= Xi The Emperical Bayes approach shrinks Xi towards ¯ X by some empirically determined amount. ˆ p(Stein)
i
= ˆ αXi + (1 − ˆ α) ¯ X where ˆ α ∈ (0,1)
Arthur Berg Standing Between a Bayesian and a Frequentist 11 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Name hits/AB pre-season (ˆ µ(ML)) season final (µ) 1 Clemente 18/45 0.400 0.346 2 Robinson 17/45 0.378 0.298 3 Howard 16/45 0.356 0.276 4 Johnstone 15/45 0.333 0.222 5 Berry 14/45 0.311 0.273 6 Spencer 14/45 0.311 0.270 7 Kessinger 13/45 0.289 0.263 8 Alvarado 12/45 0.267 0.210 9 Santo 11/45 0.244 0.269 10 Swoboda 11/45 0.244 0.230 11 Unser 10/45 0.222 0.264 12 Williams 10/45 0.222 0.256 13 Scott 10/45 0.222 0.303 14 Petrocelli 10/45 0.222 0.264 15 Rodriguez 10/45 0.222 0.226 16 Campaneris 9/45 0.200 0.286 17 Munson 8/45 0.178 0.316 18 Alvis 7/45 0.156 0.200
Arthur Berg Standing Between a Bayesian and a Frequentist 12 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Batting Average 0.0 0.1 0.2 0.3 0.4 pre−season season final
Arthur Berg Standing Between a Bayesian and a Frequentist 13 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Batting Average 0.0 0.1 0.2 0.3 0.4 pre−season season final
Arthur Berg Standing Between a Bayesian and a Frequentist 14 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
▸ Genome-wide association studies ▸ SNPS: AA/Aa/aa or 0/1/2
(∼ 107)
▸ Estimated effects of the top SNPs
are biased up. (winner’s curse)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Batting Average 0.0 0.1 0.2 0.3 0.4 pre−season season final
▸ ranking bias estimator—
part frequentist, part Bayesian with robust properties
▸ Applied to 2 GWAS studies with
2,000 cases and 3,000 controls
Crohn’s Disease Type 1 Diabetes
Arthur Berg Standing Between a Bayesian and a Frequentist 15 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Arthur Berg Standing Between a Bayesian and a Frequentist 16 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
# alaska anchorage 1 arizona state 1 boise state 1 byu cougars 1 byu hawaii 1 cal poly 7 cal state fullerton 6 cal state northridge 6 clemson 2 cs monterey bay 1 duke 1 green bay 2 idaho 1 idaho state 1 iowa 1 kentucky 1 loyola marymount 2 montana 1 montana state 1 new mexico state 1 north carolina 1 notre dame 1
1 pacific 8 pepperdine 2 saint mary’s 1 saint peter’s 1 san diego state 1 san francisco state 1 syracuse 1 temple 1 texas 1 uc davis 6 uc irvine 6 uc riverside 6 uc santa barbara 7 ucla 1
1 utah state 2 washington 1 weber state 2 west virginia 1 wisconsin 1
Arthur Berg Standing Between a Bayesian and a Frequentist 17 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
All Games All 3 Seasons (93) 56% 08-09 Season (30) 50% 09-10 Season (33) 52% 10-11 Season (30) 67% Conference Games All 3 Seasons 67% 08-09 Season 63% 09-10 Season 50% 10-11 Season 88%
Arthur Berg Standing Between a Bayesian and a Frequentist 18 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Spread = 49ers Score − Opponent Score (10−11 Season)
uc santa barbara cal state northridge uc davis cal poly uc riverside cal state fullerton pacific uc irvine
Arthur Berg Standing Between a Bayesian and a Frequentist 19 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Spread = 49ers Score − Opponent Score (10−11 Season)
uc santa barbara cal state northridge uc davis cal poly uc riverside cal state fullerton pacific uc irvine
Arthur Berg Standing Between a Bayesian and a Frequentist 20 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Over/Under = 49ers Score + Opponent Score (10−11 Season)
uc irvine cal state fullerton cal state northridge uc riverside uc davis pacific uc santa barbara cal poly
Arthur Berg Standing Between a Bayesian and a Frequentist 21 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Over/Under = 49ers Score + Opponent Score (10−11 Season)
uc irvine cal state fullerton cal state northridge uc riverside uc davis pacific uc santa barbara cal poly
Arthur Berg Standing Between a Bayesian and a Frequentist 22 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
x = LB Score y = Opponent Score Over/Under = x + y Spread = x − y x = Over/Under + Spread 2 y = Over/Under − Spread 2
Arthur Berg Standing Between a Bayesian and a Frequentist 23 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Rank Opponent LB Score
Spread
Over Under
2 Cal Poly 66 55 11 121 3 Cal State Northridge 81 66 15 147 4 Pacific 69 68 1 136 5 UC Santa Barbara 72 55 17 126 6 Cal State Fullerton 79 71 7 150 7 UC Riverside 75 66 9 141 8 UC Irvine 82 80 2 161 UC Davis 76 64 13 140
Arthur Berg Standing Between a Bayesian and a Frequentist 24 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Using the 09-10 season to predict the 10-11 season: adjusted prediction error for spread unadjusted prediction error spread = 197 341 = 58% adjusted prediction error for over/under unadjusted prediction error over/under = 513 818 = 63% Using the 08-09 season to predict the 09-10 season: adjusted prediction error for spread unadjusted prediction error spread = 150 194 = 78% adjusted prediction error for over/under unadjusted prediction error over/under = 442 641 = 69%
Arthur Berg Standing Between a Bayesian and a Frequentist 25 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
▸ All bets are “pay $110 to win $100”. ▸ Long Beach is the favorite; UCI is the underdog.
Casino Spread Over/Under LV Hilton
148.5 Wynn
149 MGM Mirage
NA Predicted
161 These predictions recommend betting on UCI (still expecting LB to win) and betting on “over” for the over/under option.
Arthur Berg Standing Between a Bayesian and a Frequentist 26 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
1 I do not necessarily encourage
2 I am not liable for any bets made
Arthur Berg Standing Between a Bayesian and a Frequentist 27 / 28
Introduction Bayes Estimation Empirical Bayes Basketball
Arthur Berg Standing Between a Bayesian and a Frequentist 28 / 28