Arthur Berg Pennsylvania State University Introduction Bayes - - PowerPoint PPT Presentation

arthur berg
SMART_READER_LITE
LIVE PREVIEW

Arthur Berg Pennsylvania State University Introduction Bayes - - PowerPoint PPT Presentation

Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg


slide-1
SLIDE 1

Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball

Arthur Berg

Pennsylvania State University

slide-2
SLIDE 2

Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg Standing Between a Bayesian and a Frequentist 2 / 28

slide-3
SLIDE 3

Introduction Bayes Estimation Empirical Bayes Basketball

Bayesian and Frequentist Representatives

  • Rev. Thomas Bayes FRS (1702-1761)

English Mathematician Presbyterian Minister P(H∣E) = P(E∣H)P(H) P(E) Sir Ronald Fisher FRS (1890-1962) English Statistician Evolutionary Biologist, Geneticist —Let the data speak for itself.—

Arthur Berg Standing Between a Bayesian and a Frequentist 3 / 28

slide-4
SLIDE 4

Introduction Bayes Estimation Empirical Bayes Basketball

Bayes Estimator as a Convex Combination

1st Goal: List the top 250 movies of all time. Movies are rated on a scale of 1 to 10. Some movies are rated by many people, and some by only a few. Movies with fewer than 3000 votes are not considered. All movies have an average rating of C = 6.9. ⋆ µi represents the mean rating by everyone who has seen movie i. ⋆ The real goal is to construct the best estimate of µi, then pick the top 250. The frequentist approach uses only ¯ Xi, the average rating for movie i. ˆ µ(Fisher)

i

= ¯ Xi The Bayesian approach shrinks ¯ Xi towards C with more shrinking applied when the number of votes for movie i is small. ˆ µ(Bayes)

i

= αi ¯ Xi + (1 − αi)C where αi ∈ (0,1)

Arthur Berg Standing Between a Bayesian and a Frequentist 4 / 28

slide-5
SLIDE 5

Introduction Bayes Estimation Empirical Bayes Basketball

Internet Movie Database—Top 250

Rank WR R Title Votes 1 9.2 9.2 The Shawshank Redemption (1994) 546,155 2 9.1 9.2 The Godfather (1972) 427,961 3 9.0 9.0 The Godfather: Part II (1974) 257,643 4 8.9 9.0 The Good, the Bad and the Ugly (1966) 170,045 5 8.9 9.0 Pulp Fiction (1994) 436,456 6 8.9 8.9 Inception (2010) 265,531 7 8.9 8.9 Schindler’s List (1993) 289,170 8 8.9 8.9 12 Angry Men (1957) 126,983 9 8.8 8.9 One Flew Over the Cuckoo’s Nest (1975) 225,419 10 8.8 8.9 The Dark Knight (2008) 487,800 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 85 8.5 8.7 Black Swan (2010) 20,326 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 142 8.2 8.3 Avatar (2009) 285,005 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 240 8.0 8.5 True Grit (2010) 6,444

Arthur Berg Standing Between a Bayesian and a Frequentist 5 / 28

slide-6
SLIDE 6

Introduction Bayes Estimation Empirical Bayes Basketball

IMDb Weighted Ranking—“a true Bayesian estimate”

WRi = viRi + mC vi + m = vi vi + m ÜÜÜÜÜÜÜÜÜÜÜÜÜ

αi

Ri

  • ¯

Xi

+ m vi + m ÜÜÜÜÜÜÜÜÜÜÜÜÜ

1−αi

C

▸ Ri = average rating of the movie i ( ¯

Xi)

▸ vi = total number of votes from regular voters ▸ m = minimum # of votes to make the list = 3000 ▸ C = grand mean across all movies in the database = 6.9

Arthur Berg Standing Between a Bayesian and a Frequentist 6 / 28

slide-7
SLIDE 7

Introduction Bayes Estimation Empirical Bayes Basketball

A Bayesian Calculation

Xi = (Xi,1,...,Xi,vi) represents the vi ratings of movie i. prior: µi ∼ N(µ0,σ2

0)

conditional: Xi,j∣µi

iid

∼ N(µi,σ2) (j = 1,...,vi) ˆ µ

(Bayes)

i

= E[µi∣Xi] = ( vi vi + σ2/σ2 ) ¯ Xi + ( σ2/σ2 vi + σ2/σ2 )µ0 = vi vi + mRi + m vi + mC ⇒ µ0 = C, m = σ2/σ2

Arthur Berg Standing Between a Bayesian and a Frequentist 7 / 28

slide-8
SLIDE 8

1 ¿Does shrinking really help? 2 ¿How much to shrink by?

Prediction Error =

n

i=1

(µi − ˆ µi)2

slide-9
SLIDE 9

Introduction Bayes Estimation Empirical Bayes Basketball

Standing Between a Bayesian and a Frequentist

▸ In 1956, Charles Stein proved the existence of an estimator better than

the sample mean under certain assumptions.

▸ In 1961, Willard James and Charles Stein explicitly constructed such an

estimator.

Arthur Berg Standing Between a Bayesian and a Frequentist 9 / 28

slide-10
SLIDE 10

Introduction Bayes Estimation Empirical Bayes Basketball

The James-Stein Estimator (n ≥ 4)

µi ∼ N(µ0,σ2

0)

Xi∣µi

iid

∼ N(µi,σ2) (i = 1,...n) ˆ µ

(Bayes)

i

= E[µi∣Xi] = ( σ2 σ2

0 + σ2

ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

α

)µ0 + ( σ2 σ2

0 + σ2

ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

1−α

)Xi ˆ µ

(JS)

i

= ( (n − 3)σ2 ∑(Xi − ¯ X)2 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

α

) ¯ X + (1 − (n − 3)σ2 ∑(Xi − ¯ X)2 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

1−α

)Xi In practice, if σ2 is unknown, an estimate is used.

Arthur Berg Standing Between a Bayesian and a Frequentist 10 / 28

slide-11
SLIDE 11

Introduction Bayes Estimation Empirical Bayes Basketball

Predicting Batting Averages

2nd Goal: Predict final batting averages from pre-season performances. Pre-season batting averages for 18 major league players are provided. Season final batting averages for the same players are also recorded. Data is from the 1970 season and is published in JASA (1975) and Scientific American (1977) by Efron and Morris. The frequentist approach uses only Xi, the pre-season batting average for player i. ˆ p(Fisher)

i

= Xi The Emperical Bayes approach shrinks Xi towards ¯ X by some empirically determined amount. ˆ p(Stein)

i

= ˆ αXi + (1 − ˆ α) ¯ X where ˆ α ∈ (0,1)

Arthur Berg Standing Between a Bayesian and a Frequentist 11 / 28

slide-12
SLIDE 12

Introduction Bayes Estimation Empirical Bayes Basketball

Name hits/AB pre-season (ˆ µ(ML)) season final (µ) 1 Clemente 18/45 0.400 0.346 2 Robinson 17/45 0.378 0.298 3 Howard 16/45 0.356 0.276 4 Johnstone 15/45 0.333 0.222 5 Berry 14/45 0.311 0.273 6 Spencer 14/45 0.311 0.270 7 Kessinger 13/45 0.289 0.263 8 Alvarado 12/45 0.267 0.210 9 Santo 11/45 0.244 0.269 10 Swoboda 11/45 0.244 0.230 11 Unser 10/45 0.222 0.264 12 Williams 10/45 0.222 0.256 13 Scott 10/45 0.222 0.303 14 Petrocelli 10/45 0.222 0.264 15 Rodriguez 10/45 0.222 0.226 16 Campaneris 9/45 0.200 0.286 17 Munson 8/45 0.178 0.316 18 Alvis 7/45 0.156 0.200

Arthur Berg Standing Between a Bayesian and a Frequentist 12 / 28

slide-13
SLIDE 13

Introduction Bayes Estimation Empirical Bayes Basketball

Batting Average Dataset

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Batting Average 0.0 0.1 0.2 0.3 0.4 pre−season season final

Arthur Berg Standing Between a Bayesian and a Frequentist 13 / 28

slide-14
SLIDE 14

Introduction Bayes Estimation Empirical Bayes Basketball

James-Stein Estimation of Batting Averages

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Batting Average 0.0 0.1 0.2 0.3 0.4 pre−season season final

− − − − − − − − − − − − − − − − − −

Arthur Berg Standing Between a Bayesian and a Frequentist 14 / 28

slide-15
SLIDE 15

Introduction Bayes Estimation Empirical Bayes Basketball

Ranking Bias—Emperical Bayes + Order Statistics

▸ Genome-wide association studies ▸ SNPS: AA/Aa/aa or 0/1/2

(∼ 107)

▸ Estimated effects of the top SNPs

are biased up. (winner’s curse)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Batting Average 0.0 0.1 0.2 0.3 0.4 pre−season season final

▸ ranking bias estimator—

part frequentist, part Bayesian with robust properties

▸ Applied to 2 GWAS studies with

2,000 cases and 3,000 controls

Crohn’s Disease Type 1 Diabetes

Arthur Berg Standing Between a Bayesian and a Frequentist 15 / 28

slide-16
SLIDE 16

Introduction Bayes Estimation Empirical Bayes Basketball

49ers Statistics—http://www.longbeachstate.com/

Arthur Berg Standing Between a Bayesian and a Frequentist 16 / 28

slide-17
SLIDE 17

Introduction Bayes Estimation Empirical Bayes Basketball

Opponents Over 3 Seasons — 08-09, 09-10, 10-11

  • pponent

# alaska anchorage 1 arizona state 1 boise state 1 byu cougars 1 byu hawaii 1 cal poly 7 cal state fullerton 6 cal state northridge 6 clemson 2 cs monterey bay 1 duke 1 green bay 2 idaho 1 idaho state 1 iowa 1 kentucky 1 loyola marymount 2 montana 1 montana state 1 new mexico state 1 north carolina 1 notre dame 1

  • regon

1 pacific 8 pepperdine 2 saint mary’s 1 saint peter’s 1 san diego state 1 san francisco state 1 syracuse 1 temple 1 texas 1 uc davis 6 uc irvine 6 uc riverside 6 uc santa barbara 7 ucla 1

  • univ. san francisco

1 utah state 2 washington 1 weber state 2 west virginia 1 wisconsin 1

Arthur Berg Standing Between a Bayesian and a Frequentist 17 / 28

slide-18
SLIDE 18

Introduction Bayes Estimation Empirical Bayes Basketball

Winning Percentages

All Games All 3 Seasons (93) 56% 08-09 Season (30) 50% 09-10 Season (33) 52% 10-11 Season (30) 67% Conference Games All 3 Seasons 67% 08-09 Season 63% 09-10 Season 50% 10-11 Season 88%

Arthur Berg Standing Between a Bayesian and a Frequentist 18 / 28

slide-19
SLIDE 19

Introduction Bayes Estimation Empirical Bayes Basketball

Spread = 49ers Score − Opponent Score (10−11 Season)

spread 5 10 15

uc santa barbara cal state northridge uc davis cal poly uc riverside cal state fullerton pacific uc irvine

Arthur Berg Standing Between a Bayesian and a Frequentist 19 / 28

slide-20
SLIDE 20

Introduction Bayes Estimation Empirical Bayes Basketball

Spread = 49ers Score − Opponent Score (10−11 Season)

spread 5 10 15

uc santa barbara cal state northridge uc davis cal poly uc riverside cal state fullerton pacific uc irvine

− − − − − − − −

Arthur Berg Standing Between a Bayesian and a Frequentist 20 / 28

slide-21
SLIDE 21

Introduction Bayes Estimation Empirical Bayes Basketball

Over/Under = 49ers Score + Opponent Score (10−11 Season)

Over/Under (Total Score) 120 140 160

uc irvine cal state fullerton cal state northridge uc riverside uc davis pacific uc santa barbara cal poly

Arthur Berg Standing Between a Bayesian and a Frequentist 21 / 28

slide-22
SLIDE 22

Introduction Bayes Estimation Empirical Bayes Basketball

Over/Under = 49ers Score + Opponent Score (10−11 Season)

Over/Under (Total Score) 120 140 160

uc irvine cal state fullerton cal state northridge uc riverside uc davis pacific uc santa barbara cal poly

− − − − − − − −

Arthur Berg Standing Between a Bayesian and a Frequentist 22 / 28

slide-23
SLIDE 23

Introduction Bayes Estimation Empirical Bayes Basketball

Conversion Formulas

x = LB Score y = Opponent Score Over/Under = x + y Spread = x − y x = Over/Under + Spread 2 y = Over/Under − Spread 2

Arthur Berg Standing Between a Bayesian and a Frequentist 23 / 28

slide-24
SLIDE 24

Introduction Bayes Estimation Empirical Bayes Basketball

Predictions

Rank Opponent LB Score

  • O. Score

Spread

Over Under

2 Cal Poly 66 55 11 121 3 Cal State Northridge 81 66 15 147 4 Pacific 69 68 1 136 5 UC Santa Barbara 72 55 17 126 6 Cal State Fullerton 79 71 7 150 7 UC Riverside 75 66 9 141 8 UC Irvine 82 80 2 161 UC Davis 76 64 13 140

Arthur Berg Standing Between a Bayesian and a Frequentist 24 / 28

slide-25
SLIDE 25

Introduction Bayes Estimation Empirical Bayes Basketball

How good are the predictions?

Using the 09-10 season to predict the 10-11 season: adjusted prediction error for spread unadjusted prediction error spread = 197 341 = 58% adjusted prediction error for over/under unadjusted prediction error over/under = 513 818 = 63% Using the 08-09 season to predict the 09-10 season: adjusted prediction error for spread unadjusted prediction error spread = 150 194 = 78% adjusted prediction error for over/under unadjusted prediction error over/under = 442 641 = 69%

Arthur Berg Standing Between a Bayesian and a Frequentist 25 / 28

slide-26
SLIDE 26

Introduction Bayes Estimation Empirical Bayes Basketball

LB vs UCI—Vegas Odds (as of 3am on game day)

▸ All bets are “pay $110 to win $100”. ▸ Long Beach is the favorite; UCI is the underdog.

Casino Spread Over/Under LV Hilton

  • 10

148.5 Wynn

  • 9.5

149 MGM Mirage

  • 10

NA Predicted

  • 2

161 These predictions recommend betting on UCI (still expecting LB to win) and betting on “over” for the over/under option.

Arthur Berg Standing Between a Bayesian and a Frequentist 26 / 28

slide-27
SLIDE 27

Introduction Bayes Estimation Empirical Bayes Basketball

Disclaimers:

1 I do not necessarily encourage

sports betting.

2 I am not liable for any bets made

based my presentation.

Arthur Berg Standing Between a Bayesian and a Frequentist 27 / 28

slide-28
SLIDE 28

Introduction Bayes Estimation Empirical Bayes Basketball

Thank You!!

Beach.ArthurBerg.com berg@psu.edu

Arthur Berg Standing Between a Bayesian and a Frequentist 28 / 28