Statistics & Bayesian Inference Lecture 1
Joe Zuntz
Statistics & Bayesian Inference Lecture 1 Joe Zuntz Lecture 1 - - PowerPoint PPT Presentation
Statistics & Bayesian Inference Lecture 1 Joe Zuntz Lecture 1 Essentials of probability Some analytic Motivations distributions Definitions Bayes Theorem Probability Models & Parameter Distributions Spaces
Joe Zuntz
Distributions
distributions
Spaces
probability
from our (expensive) data
in models
models
numbers
H0 = (72 ± 8) km s−1Mpc−1
member of a sample space (discrete or continuous, finite or infinite)
probability mass function (PMF)
height of next person to walk through door
member of a sample space (discrete or continuous, finite or infinite)
probability mass function (PMF)
height of next person to walk through door
to randomness, e.g.:
was the sequence Heads-Tails? were both tosses the same?
we have to integrate to answer questions)
Usually just write P(X) = f(x)
X
x∈X
P(x) = 1 Z
x∈X
P(x)dx = 1
P(XY) P(X=x and Y=y) P(X∩Y)
P(X=x or Y=y) P(X∪Y)
P(X=x given Y=y) P(X|Y)
is given by:
E(X) = X P(X)X E(X) = Z P(X)XdX E(f(X)) = Z P(X)f(X)dX E(f(X)) = X P(X)f(X)
measure if centrality, and not always a good one.
exist
distribution
MODE MEAN
P(x) = Z P(x|y)P(y)dy P(x) = X
i
P(x|yi)P(yi)
must be conserved, not density
Jacobian
in more dimensions u = f(x) P(u)du = P(x)dx P(u) = P(x)dx du = P(x)/du dx = P(x)/f 0(x)
P(X)
P(x) = 1 b − a, x ∈ [a, b]
P(x) = δ(x − x0)
P(x) = 1 √ 2πσ2 exp −(x − µ)2 2σ2
P(x) = λe−λx, x > 0
P(n) = λne−λ n!
P(AB) =P(A|B)P(B) =P(B|A)P(A)
P(AB) =P(A|B)P(B) =P(B|A)P(A) ∴ P(A|B) = P(B|A)P(A) P(B)
P(p|dM) = P(d|pM)P(p|M) P(d|M) ∝ P(d|pM)P(p|M) Observed data Parameters Model Likelihood Prior
What you know after looking at the data = what you knew before + what the data told you
theory that describes how your data arose.
what you wanted to measure arose.
deterministic and some stochastic parts.
many (most?) astrophysical models also have others too
your model
distributions
information about all your parameters
parameters as dimensions in an abstract space
functions of many variables: P(uvwxyz)
space increases your intuition becomes worse m c
characteristic numbers
describe a distribution
= estimators/approximations to underlying distribution stats
E[X] = Z XP(X)dX
¯ X = P Xi N
misleading!
asymmetric
σ2
X =
P(Xi − ¯ X)2 N Var(X) = E[(X − ¯ X)2] s2
X =
P(Xi − ¯ X)2 N − 1 = Z (X − ¯ X)2P(X)dX
Cov(X, Y ) = E[(X − ¯ X)(Y − ¯ Y )] = Z (X − ¯ X)(Y − ¯ Y )P(XY )dXdY σXY = P(Xi − ¯ X)(Yi − ¯ Y ) N
X Y X Y
σXY < 0
continuous PDF
Mean μ Standard deviation σ
P(x; µ, σ) = 1 √ 2πσ exp −(x − µ)2 2σ2
in number of standard deviations sigma
68% 95% 99.7%
cumulative integral of Gaussian
can be read off
and leads to formula that error on the mean ~ n1/2 X ∼ N(µx, σ2
x)
Y ∼ N(µy, σ2
y)
= ⇒ X + Y ∼ N(µx + µy, σ2
x + σ2 y)
Given a collection of random variables Xi:
1 sn
n
X
i=1
(Xi − µi) → N(0, 1) s2
n = n
X
i=1
σ2
i
1 s2
n
X E ⇥ (X − µi)2⇤ → 0
Single distribution Mean of 2 Mean of 3 Mean of 4
between quantities
errors P(x; µ, C) = 1 (2π)
n 2 |C| exp
−1 2(x − µ)T C−1(x − µ)
Frequentists Bayesians Use probabilities to … describe frequencies quantify information Think model parameters are … fixed unknowns random variables with probabilities Think data is … a repeatable random variable
fixed Call their work … “Statistics" “Inference" Make statements about … intervals covering the truth x% of the time constraints on model parameters Have … many approaches with lots of implicit choices
explicit choices
hypothetical ensembles of experiments
derived from your data points
hypotheses and see how often measured estimator value appears
the data
parameter space to see if they are good fits
these approaches
things you half remember from undergrad
Model the process that led to your data.
Distrust point estimates.
All probabilities are conditional.
in college. As a student, she was deeply concerned with racial discrimination and other social issues, and participated in anti-nuclear
(1) Linda is active in the feminist movement. (2) Linda is a bank teller. (3) Linda is a bank teller and active in the feminist movement.
expectation of the result?
with λ =1 photon/s Each photon has an energy drawn from a Gaussian distribution with μ = 1000 eV and σ=100 eV. Plot the probability distribution of the amount of energy arriving per second. The energy of each photon is independent of the number that arrive.
3 minutes of my walk towards it. On my first day I saw one bus go past it before I got
next bus?
is reasonable for British buses.
describe and justify them.