Arthur Berg Pennsylvania State University Introduction Bayes - PowerPoint PPT Presentation

Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University

Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg Standing Between a Bayesian and a Frequentist 2 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Bayesian and Frequentist Representatives Sir Ronald Fisher FRS (1890-1962) Rev. Thomas Bayes FRS (1702-1761) English Statistician English Mathematician Evolutionary Biologist, Geneticist Presbyterian Minister P ( H ∣ E ) = P ( E ∣ H ) P ( H ) —Let the data speak for itself.— P ( E ) Arthur Berg Standing Between a Bayesian and a Frequentist 3 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Bayes Estimator as a Convex Combination 1 st Goal: List the top 250 movies of all time. Movies are rated on a scale of 1 to 10. Some movies are rated by many people, and some by only a few. Movies with fewer than 3000 votes are not considered. All movies have an average rating of C = 6 . 9 . ⋆ µ i represents the mean rating by everyone who has seen movie i . ⋆ The real goal is to construct the best estimate of µ i , then pick the top 250. The frequentist approach uses only ¯ X i , the average rating for movie i . µ (Fisher) = ¯ ˆ X i i The Bayesian approach shrinks ¯ X i towards C with more shrinking applied when the number of votes for movie i is small. µ (Bayes) = α i ¯ X i + ( 1 − α i ) C where α i ∈ ( 0 , 1 ) ˆ i Arthur Berg Standing Between a Bayesian and a Frequentist 4 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Internet Movie Database—Top 250 Rank WR R Title Votes 1 9.2 9.2 The Shawshank Redemption (1994) 546,155 2 9.1 9.2 The Godfather (1972) 427,961 3 9.0 9.0 The Godfather: Part II (1974) 257,643 4 8.9 9.0 The Good, the Bad and the Ugly (1966) 170,045 5 8.9 9.0 Pulp Fiction (1994) 436,456 6 8.9 8.9 Inception (2010) 265,531 7 8.9 8.9 Schindler’s List (1993) 289,170 8 8.9 8.9 12 Angry Men (1957) 126,983 9 8.8 8.9 One Flew Over the Cuckoo’s Nest (1975) 225,419 10 8.8 8.9 The Dark Knight (2008) 487,800 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 85 8.5 8.7 Black Swan (2010) 20,326 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 142 8.2 8.3 Avatar (2009) 285,005 ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯ 240 8.0 8.5 True Grit (2010) 6,444 Arthur Berg Standing Between a Bayesian and a Frequentist 5 / 28

Introduction Bayes Estimation Empirical Bayes Basketball IMDb Weighted Ranking—“a true Bayesian estimate” WR i = v i R i + mC v i m = + R i C v i + m v i + m v i + m � �ÜÜÜÜÜÜ�ÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜ�ÜÜÜÜÜÜÜ� ¯ X i α i 1 − α i ▸ R i = average rating of the movie i ( ¯ X i ) ▸ v i = total number of votes from regular voters ▸ m = minimum # of votes to make the list = 3000 ▸ C = grand mean across all movies in the database = 6.9 Arthur Berg Standing Between a Bayesian and a Frequentist 6 / 28

Introduction Bayes Estimation Empirical Bayes Basketball A Bayesian Calculation X i = ( X i, 1 ,...,X i,v i ) represents the v i ratings of movie i . prior: µ i ∼ N( µ 0 ,σ 2 0 ) iid conditional: X i,j ∣ µ i ∼ N( µ i ,σ 2 ) ( j = 1 ,...,v i ) = E [ µ i ∣ X i ] (Bayes) ˆ µ i σ 2 / σ 2 = ( ) ¯ X i + ( ) µ 0 v i 0 v i + σ 2 / σ 2 v i + σ 2 / σ 2 0 0 ⇒ µ 0 = C, m = σ 2 / σ 2 v i m = v i + mR i + v i + mC 0 Arthur Berg Standing Between a Bayesian and a Frequentist 7 / 28

1 ¿Does shrinking really help? 2 ¿How much to shrink by? n ( µ i − ˆ Prediction Error = µ i ) 2 ∑ i = 1

Introduction Bayes Estimation Empirical Bayes Basketball Standing Between a Bayesian and a Frequentist ▸ In 1956, Charles Stein proved the existence of an estimator better than the sample mean under certain assumptions . ▸ In 1961, Willard James and Charles Stein explicitly constructed such an estimator. Arthur Berg Standing Between a Bayesian and a Frequentist 9 / 28

Introduction Bayes Estimation Empirical Bayes Basketball The James-Stein Estimator ( n ≥ 4) iid µ i ∼ N( µ 0 ,σ 2 0 ) X i ∣ µ i ∼ N( µ i ,σ 2 ) ( i = 1 ,...n ) = E [ µ i ∣ X i ] = ( σ 2 ) µ 0 + ( σ 2 ) X i (Bayes) 0 µ ˆ i 0 + σ 2 0 + σ 2 σ 2 σ 2 �ÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜ� α 1 − α ( n − 3 ) σ 2 ( n − 3 ) σ 2 = ( ) ¯ X + ( 1 − ) X i (JS) ˆ ∑ ( X i − ¯ X ) 2 ∑ ( X i − ¯ X ) 2 µ i �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� α 1 − α In practice, if σ 2 is unknown, an estimate is used. Arthur Berg Standing Between a Bayesian and a Frequentist 10 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Predicting Batting Averages 2 nd Goal: Predict final batting averages from pre-season performances. Pre-season batting averages for 18 major league players are provided. Season final batting averages for the same players are also recorded. Data is from the 1970 season and is published in JASA (1975) and Scientific American (1977) by Efron and Morris. The frequentist approach uses only X i , the pre-season batting average for player i . p (Fisher) = X i ˆ i The Emperical Bayes approach shrinks X i towards ¯ X by some empirically determined amount. p (Stein) α ) ¯ = ˆ αX i + ( 1 − ˆ α ∈ ( 0 , 1 ) ˆ X where ˆ i Arthur Berg Standing Between a Bayesian and a Frequentist 11 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Name hits/AB pre-season ( ˆ µ (ML) ) season final ( µ ) 1 Clemente 18/45 0.400 0.346 2 Robinson 17/45 0.378 0.298 3 Howard 16/45 0.356 0.276 4 Johnstone 15/45 0.333 0.222 5 Berry 14/45 0.311 0.273 6 Spencer 14/45 0.311 0.270 7 Kessinger 13/45 0.289 0.263 8 Alvarado 12/45 0.267 0.210 9 Santo 11/45 0.244 0.269 10 Swoboda 11/45 0.244 0.230 11 Unser 10/45 0.222 0.264 12 Williams 10/45 0.222 0.256 13 Scott 10/45 0.222 0.303 14 Petrocelli 10/45 0.222 0.264 15 Rodriguez 10/45 0.222 0.226 16 Campaneris 9/45 0.200 0.286 17 Munson 8/45 0.178 0.316 18 Alvis 7/45 0.156 0.200 Arthur Berg Standing Between a Bayesian and a Frequentist 12 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Batting Average Dataset 0.4 pre − season season final 0.3 Batting Average 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Arthur Berg Standing Between a Bayesian and a Frequentist 13 / 28

Introduction Bayes Estimation Empirical Bayes Basketball James-Stein Estimation of Batting Averages 0.4 pre − season season final 0.3 − − − − − − − − − − − − − − − − − − Batting Average 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Arthur Berg Standing Between a Bayesian and a Frequentist 14 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Ranking Bias—Emperical Bayes + Order Statistics 0.4 pre − season season final ▸ Genome-wide association studies 0.3 Batting Average ▸ SNPS: AA/Aa/aa or 0/1/2 0.2 ( ∼ 10 7 ) 0.1 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ▸ ranking bias estimator — part frequentist, part Bayesian with robust properties ▸ Applied to 2 GWAS studies with ▸ Estimated effects of the top SNPs 2,000 cases and 3,000 controls are biased up. (winner’s curse) Crohn’s Disease Type 1 Diabetes Arthur Berg Standing Between a Bayesian and a Frequentist 15 / 28

Introduction Bayes Estimation Empirical Bayes Basketball 49ers Statistics—http://www.longbeachstate.com/ Arthur Berg Standing Between a Bayesian and a Frequentist 16 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Opponents Over 3 Seasons — 08-09, 09-10, 10-11 iowa 1 opponent # syracuse 1 kentucky 1 alaska anchorage 1 temple 1 loyola marymount 2 texas 1 arizona state 1 montana 1 boise state 1 uc davis 6 montana state 1 uc irvine 6 byu cougars 1 new mexico state 1 byu hawaii 1 uc riverside 6 north carolina 1 cal poly 7 uc santa barbara 7 notre dame 1 cal state fullerton 6 ucla 1 oregon 1 cal state northridge 6 univ. san francisco 1 pacific 8 utah state 2 clemson 2 pepperdine 2 cs monterey bay 1 washington 1 saint mary’s 1 weber state 2 duke 1 saint peter’s 1 green bay 2 west virginia 1 san diego state 1 wisconsin 1 idaho 1 san francisco state 1 idaho state 1 Arthur Berg Standing Between a Bayesian and a Frequentist 17 / 28

Introduction Bayes Estimation Empirical Bayes Basketball Winning Percentages All Games Conference Games All 3 Seasons (93) 56% All 3 Seasons 67% 08-09 Season (30) 50% 08-09 Season 63% 09-10 Season (33) 52% 09-10 Season 50% 10-11 Season (30) 67% 10-11 Season 88% Arthur Berg Standing Between a Bayesian and a Frequentist 18 / 28

Arthur Berg Pennsylvania State University Introduction Bayes - PowerPoint PPT Presentation

Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg

Ask Arthur Ask Arthur Arthurs Story Ask ArthurThe First Year Resources

C2 language Bas van den Berg Fosdem 2015, Brussels Bas van den Berg C2 language Goal Goal of

Fail-Safe Strategies for FPGA Devices Targeted for Critical Applications Melanie Berg, AS&D

Reliable Design Versus Trust Melanie Berg AS&D in support of NASA/GSFC

Challenges Regarding IP Core Functional Reliability. Melanie Berg 1 , Kenneth LaBel 2 1.AS&D

A New Approach to System-Level Single Event Survivability Prediction Melanie Berg 1 , Kenneth

ASIC/FPGA Trust Assessment Framework Melanie Berg AS&D in support of NASA/GSFC

Unawareness In Multi-Agent Systems with Partial Valuations Line van den Berg, Manuel Atencia and

When NOT to Use ASICs When NOT to Use ASICs Rick Van Berg HEPIC2013 When NOT to Use ASICs When

Are There True Contradictions? Paraconsistent Logic and Dialetheism Asgeir Berg Matth

Prolog sword(Lancelot) sword(Arthur) shield(Lancelot) shield(Arthur)

Echidna Mixed Model Software Arthur Gilmour February 2018 Arthur Gilmour Echidna February

Welfare, Inequality & Poverty 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality & Poverty # 1 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality & Poverty, # 3 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality & Poverty, # 2 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Its Not Contagious: Connecting With Customers Who Have Mental Health Problems James Hudson

SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, David Broneske, Marcus Pinnecke,

Introduction: Japanese Language and Culture, Mathematical Linguistics. Hilofumi Yamamoto, Ph. D.

Efficient Join Processing across Heterogeneous Processors Henning Funke, Sebastian Bre, Stefan

Graphs with = have big cliques Daniel W. Cranston Virginia Commonwealth University

Private Set In Intersection (PSI): in the Cloud, or using Circuits Benny Pinkas September 10,

Cyclotron Based High Intensity Proton Accelerators Mike Seidel, PSI October 20, 2009, Fermilab

HE Emission from Magnetars Zorawar Wadiasingh Matthew G. Baring Peter L. Gonthier Alice K.

Sambuz

Useful Links

Newsletter

Mail Us

Arthur Berg Pennsylvania State University Introduction Bayes - PowerPoint PPT Presentation

Standing Between a Bayesian and a Frequentist: An Emperical Bayes Exploration of Movies, Baseball, and Long Beach Basketball Arthur Berg Pennsylvania State University Introduction Bayes Estimation Empirical Bayes Basketball Arthur Berg

Ask Arthur Ask Arthur Arthurs Story Ask ArthurThe First Year Resources

C2 language Bas van den Berg Fosdem 2015, Brussels Bas van den Berg C2 language Goal Goal of

Fail-Safe Strategies for FPGA Devices Targeted for Critical Applications Melanie Berg, AS&amp;D

Reliable Design Versus Trust Melanie Berg AS&amp;D in support of NASA/GSFC

Challenges Regarding IP Core Functional Reliability. Melanie Berg 1 , Kenneth LaBel 2 1.AS&amp;D

A New Approach to System-Level Single Event Survivability Prediction Melanie Berg 1 , Kenneth

ASIC/FPGA Trust Assessment Framework Melanie Berg AS&amp;D in support of NASA/GSFC

Unawareness In Multi-Agent Systems with Partial Valuations Line van den Berg, Manuel Atencia and

When NOT to Use ASICs When NOT to Use ASICs Rick Van Berg HEPIC2013 When NOT to Use ASICs When

Are There True Contradictions? Paraconsistent Logic and Dialetheism Asgeir Berg Matth

Prolog sword(Lancelot) sword(Arthur) shield(Lancelot) shield(Arthur)

Echidna Mixed Model Software Arthur Gilmour February 2018 Arthur Gilmour Echidna February

Welfare, Inequality &amp; Poverty 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality &amp; Poverty # 1 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality &amp; Poverty, # 3 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality &amp; Poverty, # 2 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Its Not Contagious: Connecting With Customers Who Have Mental Health Problems James Hudson

SIMD Vectorized Hashing for Grouped Aggregation Bala Gurumurthy, David Broneske, Marcus Pinnecke,

Introduction: Japanese Language and Culture, Mathematical Linguistics. Hilofumi Yamamoto, Ph. D.

Efficient Join Processing across Heterogeneous Processors Henning Funke, Sebastian Bre, Stefan

Graphs with = have big cliques Daniel W. Cranston Virginia Commonwealth University

Private Set In Intersection (PSI): in the Cloud, or using Circuits Benny Pinkas September 10,

Cyclotron Based High Intensity Proton Accelerators Mike Seidel, PSI October 20, 2009, Fermilab

HE Emission from Magnetars Zorawar Wadiasingh Matthew G. Baring Peter L. Gonthier Alice K.

Sambuz

Useful Links

Newsletter

Mail Us

Fail-Safe Strategies for FPGA Devices Targeted for Critical Applications Melanie Berg, AS&D

Reliable Design Versus Trust Melanie Berg AS&D in support of NASA/GSFC

Challenges Regarding IP Core Functional Reliability. Melanie Berg 1 , Kenneth LaBel 2 1.AS&D

ASIC/FPGA Trust Assessment Framework Melanie Berg AS&D in support of NASA/GSFC

Welfare, Inequality & Poverty 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality & Poverty # 1 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality & Poverty, # 3 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Welfare, Inequality & Poverty, # 2 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty