1 Whered Ya Get Them P( )? Its Like Having Twins is the - PDF document

Two Envelopes Revisited Subjectivity of Probability • The “two envelopes” problem set -up • Belief about contents of envelopes  Since implied distribution over X is not a true probability  Two envelopes: one contains $X, other contains $2X distribution, what is our distribution over X?  You select an envelope and open it o Frequentist : play game infinitely many times and see how often o Let Y = $ in envelope you selected different values come up. o Let Z = $ in other envelope o Problem: I only allow you to play the game once  1  Y  1   5 E [ Z | Y ] 2 Y Y 2 2 2 4  Bayesian probability  Before opening envelope, think either equally good o Have prior belief of distribution for X (or anything for that matter) o So, what happened by opening envelope? o Prior belief is a subjective probability  E[Z | Y] above assumes all values X (where 0 < X <  ) • By extension, all probabilities are subjective are equally likely o Allows us to answer question when we have no/limited data o Note: there are infinitely many values of X • E.g., probability a coin you’ve never flipped lands on heads o So, not true probability distribution over X (doesn’t integrate to 1) o As we get more data, prior belief is “swamped” by data The Envelope, Please Revisting Bayes Theorem • Bayes Theorem (  = model parameters, D = data): • Bayesian : have prior distribution over X, P(X) “Posterior” “Likelihood” “Prior”  Let Y = $ in envelope you selected  Let Z = $ in other envelope P(D |  ) P(  ) P(  | D) =  Open your envelope to determine Y P(D)  If Y > E[Z | Y], keep your envelope, otherwise switch  Likelihood : you’ve seen this before (in context of MLE) o No inconsistency! o Probability of data given probability model (parameter  )  Opening envelop provides data to compute P(X | Y)  Prior: before seeing any data, what is belief about model and thereby compute E[Z | Y] o I.e., what is distribution over parameters   Of course, there’s the issue of how you determined  Posterior: after seeing data, what is belief about model your prior distribution over X… o After data D observed, have posterior distribution p(  | D) over o Bayesian: Doesn’t matter how you determined prior, but you parameters  conditioned on data. Use this to predict new data. must have one (whatever it is) o Here, we assume prior and posterior distribution have same parametric form (we call them “conjugate”) Computing P( θ | D) P( θ | D) for Beta and Bernoulli • Bayes Theorem (  = model parameters, D = data):  Prior:  ~ Beta( a , b ); D = { n heads, m tails} P(D |  ) P(  )     f ( D | p ) f ( p )      D | P(  | D) = ( | ) f p D  | D f ( D ) P(D) D       a 1 ( 1 ) b 1 p p    n m     n m n ( 1 ) m   p p • We have prior P(  ) and can compute P(D |  )     n C n        1 n m a 1 b 1 p ( 1 p ) p ( 1 p ) C C C 2 1 2 • But how do we calculate P(D)?       n a 1 m b 1 C p ( 1 p ) 3      • Complicated answer: ( ) ( | ) ( ) P D P D P d  By definition, this is Beta( a + n , b + m ) • Easy answer: It is does not depend on  , so ignore it  All constant factors combine into a single constant • Just a constant that forces P(  | D) to integrate to 1  Could just ignore constant factors along the way 1

Where’d Ya Get Them P(  )? It’s Like Having Twins   is the probability a coin turns up heads  Model  with 2 different priors:  P 1 (  ) is Beta(3,8) (blue)  P 2 (  ) is Beta(7,4) (red)  They look pretty different!  Now flip 100 coins; get 58 heads and 42 tails  As long as we collect enough data, posteriors will  What do posteriors look like? converge to the correct value! From MLE to Maximum A Posteriori Conjugate Distributions Without Tears • Recall Maximum Likelihood Estimator (MLE) of  • Just for review… n     • Have coin with unknown probability  of heads arg max f ( X | ) MLE i   i 1  Our prior (subjective) belief is that  ~ Beta( a , b ) • Maximum A Posteriori (MAP) estimator of  :    Now flip coin k = n + m times, getting n heads, m tails f ( X , X ,..., X | ) g ( )     1 2 arg max f ( | X , X ,..., X ) arg max n  Posterior density: (  | n heads, m tails)~Beta( a+n , b+ \ m ) 1 2 MAP n h ( X , X ,..., X )     1 2 n n      ( | ) ( )  f X  g o Beta is conjugate for Bernoulli, Binomial, Geometric, and i   n    1    Negative Binomial arg max i arg max ( ) ( | ) g f X i ( , ,..., ) h X X X     a and b are called “hyperparameters” 1 2 n i 1 where g(  ) is prior distribution of  . o Saw ( a + b – 2) imaginary trials, of those ( a – 1) are “successes”  For a coin you never flipped before, use Beta( x , x ) to  As before, can often be more convenient to use log:   n  denote you think coin likely to be fair        arg max log( ( )) log( ( | )) g f X MAP i    o How strongly you feel coin is fair is a function of x  i 1  MAP estimate is the mode of the posterior distribution Mo’ Beta Multinomial is Multiple Times the Fun • Dirichlet( a 1 , a 2 , ..., a m ) distribution  Conjugate for Multinomial o Dirichlet generalizes Beta in same way Multinomial generalizes Bernoulli/Binomial 1 n    a 1 ( , ,..., ) f x x x x i 1 2 n i B ( a , a ,..., a )  i 1 1 2 n  Intuitive understanding of hyperparameters: m  o Saw imaginary trials, with ( a i – 1) of outcome i a i  m  i 1  Updating to get the posterior distribution o After observing n 1 + n 2 + ... + n m , new trials with n i of outcome i ... o ... posterior distribution is Dirichlet( a 1 + n 1 , a 2 + n 2 , ..., a m + n m ) 2

Best Short Film in the Dirichlet Category Getting Back to your Happy Laplace • And now a cool animation of Dirichlet( a , a , a ) • Recall example of 6-sides die rolls:  This is actually log density (but you get the idea…)  X ~ Multinomial(p 1 , p 2 , p 3 , p 4 , p 5 , p 6 )  Roll n = 12 times  Result: 3 ones, 2 twos, 0 threes, 3 fours, 1 fives, 3 sixes o MLE: p 1 =3/12, p 2 =2/12, p 3 =0/12, p 4 =3/12, p 5 =1/12, p 6 =3/12  Dirichlet prior allows us to pretend we saw each  X k  outcome k times before. MAP estimate: p i i  n mk o Laplace’s “law of succession”: idea above with k = 1  X 1  i o Laplace estimate: p  i n m o Laplace: p 1 =4/18, p 2 =3/18, p 3 =1/18, p 4 =4/18, p 5 =2/18, p 6 =4/18 Thanks o No longer have 0 probability of rolling a three! Wikipedia! It’s Normal to Be Normal Good Times With Gamma • Gamma( a , l ) distribution • Normal( m 0 , s 0 2 ) distribution  Conjugate for Normal (with unknown m , known s 2 )  Conjugate for Poisson o Also conjugate for Exponential, but we won’t delve into that  Intuitive understanding of hyperparameters:  Intuitive understanding of hyperparameters: o A priori, believe true m distributed ~ N( m 0 , s 02 ) o Saw a total imaginary events during l prior time periods  Updating to get the posterior distribution  Updating to get the posterior distribution o After observing n data points... o After observing n events during next k time periods... o ... posterior distribution is: o ... posterior distribution is Gamma( a + n , l + k )       n     1 m   x  1 n 1 n          0 1 i N i   ,       s 2 s 2 s 2 s 2 s 2 s 2 o Example: Gamma(10, 5)         0 0 0 o Saw 10 events in 5 time periods. Like observing at rate = 2 o Now see 11 events in next 2 time periods  Gamma(21, 7) o Equivalent to updated rate = 3 3

1 Whered Ya Get Them P( )? Its Like Having Twins is the - PDF document

Two Envelopes Revisited Subjectivity of Probability The two envelopes problem set -up Belief about contents of envelopes Since implied distribution over X is not a true probability Two envelopes: one contains $X, other

Beyond Fermats Last Theorem David Zureick-Brown Slides available at

Borel partitions of Rado graphs are Ramsey Natasha Dobrinen University of Denver 15th

Very Flat, Locally Very Flat, and Contraadjusted Modules Alexander Sl avik (joint work with

Information Economics The Continuous-type Screening Model Ling-Chieh Kung Department of

Singular metrics and the Calabi-Yau theorem in non-Archimedean geometry Mattias Jonsson

loc C 1 , 1 convex extensions of jets, and some applications Daniel Azagra Based on joint works

On the finite big Ramsey degrees for the universal triangle-free graph: A progress report Natasha

Classification of C*-envelopes of tensor algebras arising from stochastic matrices Daniel

Functional envelopes of dynamical systems old and new results L ubom r Snoha Matej

Enumerating small quandles David Stanovsk y Charles University, Prague, Czech Republic &

Uniform classification of classical Banach spaces B unyamin Sar University of North Texas

MDS codes, arcs and tensors Michel Lavrauw Sabanc University based on joint work with Simeon

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (

Normal forms for planar string diagrams Antonin Delpeuch, Jamie Vicary SYCO 2 Background: word

The reverse mathematics of Ekelands variational principle Paul Shafer University of Leeds

Health Impacts of Climate Change Michael Greenstone 3M Professor of Environmental Economics MIT

Fracking: The Economic and Environmental Impact February 2014 CPES Dinner Meeting Patrick

Minnesota Pathways to Decarbonizing Transportation: Technical Stakeholder Meeting #1 April 18,

G LO W A Danube Integrative Techniques, Scenarios and Strategies for the Future of Water in the

Overview of HHS D&I Program Tuesday, April 3, 2018 10:10 a.m. 11:00 a.m. NIH Natcher

NAVIGATING THE WAVES OF CHANGE Annual Conference Clemson University December 5, 2019 WELCOME

Board Committee Forum: Nominating and Governance National Harbor, Maryland / Oct. 13, 2013

Pastor Mark Bankson 1/64 OF DAILY WAGE $15 X 8 HOURS = $120 / 64 = $1.88 WHAT ARE WE TO LEARN

CAP Twelve Years Later: How the Rules Have Changed Ikechi Iwuagwu CAP THEOREM Any

1 Whered Ya Get Them P( )? Its Like Having Twins is the - PDF document

Two Envelopes Revisited Subjectivity of Probability The two envelopes problem set -up Belief about contents of envelopes Since implied distribution over X is not a true probability Two envelopes: one contains $X, other

Beyond Fermats Last Theorem David Zureick-Brown Slides available at

Borel partitions of Rado graphs are Ramsey Natasha Dobrinen University of Denver 15th

Very Flat, Locally Very Flat, and Contraadjusted Modules Alexander Sl avik (joint work with

Information Economics The Continuous-type Screening Model Ling-Chieh Kung Department of

Singular metrics and the Calabi-Yau theorem in non-Archimedean geometry Mattias Jonsson

loc C 1 , 1 convex extensions of jets, and some applications Daniel Azagra Based on joint works

On the finite big Ramsey degrees for the universal triangle-free graph: A progress report Natasha

Classification of C*-envelopes of tensor algebras arising from stochastic matrices Daniel

Functional envelopes of dynamical systems old and new results L ubom r Snoha Matej

Enumerating small quandles David Stanovsk y Charles University, Prague, Czech Republic &amp;

Uniform classification of classical Banach spaces B unyamin Sar University of North Texas

MDS codes, arcs and tensors Michel Lavrauw Sabanc University based on joint work with Simeon

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (

Normal forms for planar string diagrams Antonin Delpeuch, Jamie Vicary SYCO 2 Background: word

The reverse mathematics of Ekelands variational principle Paul Shafer University of Leeds

Health Impacts of Climate Change Michael Greenstone 3M Professor of Environmental Economics MIT

Fracking: The Economic and Environmental Impact February 2014 CPES Dinner Meeting Patrick

Minnesota Pathways to Decarbonizing Transportation: Technical Stakeholder Meeting #1 April 18,

G LO W A Danube Integrative Techniques, Scenarios and Strategies for the Future of Water in the

Overview of HHS D&amp;I Program Tuesday, April 3, 2018 10:10 a.m. 11:00 a.m. NIH Natcher

NAVIGATING THE WAVES OF CHANGE Annual Conference Clemson University December 5, 2019 WELCOME

Board Committee Forum: Nominating and Governance National Harbor, Maryland / Oct. 13, 2013

Pastor Mark Bankson 1/64 OF DAILY WAGE $15 X 8 HOURS = $120 / 64 = $1.88 WHAT ARE WE TO LEARN

CAP Twelve Years Later: How the Rules Have Changed Ikechi Iwuagwu CAP THEOREM Any

Enumerating small quandles David Stanovsk y Charles University, Prague, Czech Republic &

Overview of HHS D&I Program Tuesday, April 3, 2018 10:10 a.m. 11:00 a.m. NIH Natcher