statistics bayesian inference lecture 1
play

Statistics & Bayesian Inference Lecture 1 Joe Zuntz Lecture 1 - PowerPoint PPT Presentation

Statistics & Bayesian Inference Lecture 1 Joe Zuntz Lecture 1 Essentials of probability Some analytic Motivations distributions Definitions Bayes Theorem Probability Models & Parameter Distributions Spaces


  1. Statistics & Bayesian Inference Lecture 1 Joe Zuntz

  2. Lecture 1 Essentials of probability • Some analytic • Motivations distributions • Definitions • Bayes Theorem • Probability • Models & Parameter Distributions Spaces • Basic probability • How scientists can use operations probability

  3. Motivations • Learn as much as possible from our (expensive) data H 0 = (72 ± 8) km s − 1 Mpc − 1 • Constrain parameters in models • Test & compare models • Characterize collections of numbers

  4. Probability Distributions: Definitions • Assign real number P ≥ 0 to each H H } 0.25 member of a sample space 
 (discrete or continuous, finite or infinite) • P=probability density function (PDF) or H T } 0.25 probability mass function (PMF) • This set represents possible outcomes T H } 0.25 of an experiment/game/event/situation • e.g. possible results tossing two coins, height of next person to walk through T T } 0.25 door

  5. Probability Distributions: Definitions • Assign real number P ≥ 0 to each member of a sample space 
 (discrete or continuous, finite or infinite) • P=probability density function (PDF) or probability mass function (PMF) • This set represents possible outcomes of an experiment/game/event/situation • e.g. possible results tossing two coins, height of next person to walk through door

  6. Probability Distributions: Definitions • A random variable X is any value subject to randomness, e.g.: • was first toss heads? 
 was the sequence Heads-Tails? 
 were both tosses the same? • Discrete X: P is a list of values • Continuous X: P is a function, PDF, (which we have to integrate to answer questions)

  7. Probability Distributions: Basic properties • Since X must have exactly one value: X P ( x ) = 1 • Discrete: x ∈ X • Continuous: Z P ( x )d x = 1 x ∈ X • P(X=x) = f(x) 
 Usually just write P(X) = f(x) • 0 ≤ P(x) ≤ 1

  8. Probability Distributions: Combining Probabilities • Joint probability 
 P(XY) 
 P(X=x and Y=y) 
 P(X ∩ Y) • Union 
 P(X=x or Y=y) 
 P(X ∪ Y)

  9. Probability Distributions: Combining Probabilities • Conditional 
 P(X=x given Y=y) 
 P(X|Y) • Independence: • P(X|Y) = P(X) • X independent of Y

  10. Probability Distributions: Identities • P(not X) = 1-P(X) • P(XY) = P(X|Y) P(Y) • P(XY) = P(X)+P(Y)-P(X ∩ Y)

  11. Probability Distributions: Expectations • The expectation (or mean) of a random variable X is given by: Z X E ( X ) = P ( X ) X E ( X ) = P ( X ) X d X � • Or a function of it by: Z X E ( f ( X )) = P ( X ) f ( X )d X E ( f ( X )) = P ( X ) f ( X )

  12. Probability Distributions: Expectations MODE • Expectations are one measure if centrality, and not always a good one. • Mode and median also exist • All just ways of reducing or characterizing a distribution MEAN

  13. Probability Distributions: Marginalizing • Discrete: X P ( x ) = P ( x | y i ) P ( y i ) � i • Continuous: Z P ( x ) = P ( x | y ) P ( y )d y � • If you don’t care about something, marginalize over it

  14. Probability Distributions: Changing variables u = f ( x ) • Probability mass P ( u )d u = P ( x )d x must be conserved, not density P ( u ) = P ( x )d x d u • Relate with a = P ( x ) / d u Jacobian d x = P ( x ) /f 0 ( x ) • Be especially careful in more dimensions

  15. Probability Distributions: Drawing samples • Generate values of X with probability specified by P(X) • Draw enough samples: histogram looks like PDF • See lecture 3

  16. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson 1 P ( x ) = b − a, x ∈ [ a, b ]

  17. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson P ( x ) = δ ( x − x 0 )

  18. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson 2 πσ 2 exp − ( x − µ ) 2 1 P ( x ) = √ 2 σ 2

  19. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson P ( x ) = λ e − λ x , x > 0

  20. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson P ( n ) = λ n e − λ n !

  21. Bayes Theorem 
 and Inference P ( AB ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A )

  22. Bayes Theorem 
 and Inference P ( AB ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A ) ∴ P ( A | B ) = P ( B | A ) P ( A ) P ( B )

  23. Bayes Theorem 
 and Inference P ( p | dM ) = P ( d | pM ) P ( p | M ) P ( d | M ) Prior Likelihood ∝ P ( d | pM ) P ( p | M ) Model Observed data Parameters

  24. 
 Bayes Theorem 
 and Inference What you know after looking at the data = 
 what you knew before 
 + what the data told you

  25. Models & Parameters • A model is the mathematical theory that describes how your data arose. • It is not a theory of how what you wanted to measure arose. • Non-trivial models include some deterministic and some stochastic parts. • Noise is one stochastic; many (most?) astrophysical models also have others too

  26. Models & Parameters • Parameters are any unknown numerical values in your model • A parameter can have probability distributions • You need (and have) some prior (background) information about all your parameters • This may be subjective!

  27. Parameter Spaces • Can use continuous parameters as dimensions in an abstract space c • Probabilities become functions of many variables: 
 P(uvwxyz) m • As the dimension of this space increases your intuition becomes worse

  28. Descriptive Statistics • Reduce samples or distribution to set of characteristic numbers • In a analytic cases this is all you need to describe a distribution • Statistics of samples 
 = estimators/approximations to underlying distribution stats

  29. Descriptive Statistics: Mean Z E [ X ] = XP ( X )d X • Distribution mean P X i • Sample mean ¯ X = N

  30. Descriptive Statistics: Mean • Means can be 
 misleading! • Most distributions are asymmetric

  31. Descriptive Statistics: Variance Var( X ) = E [( X − ¯ X ) 2 ] • Distribution variance Z ( X − ¯ X ) 2 P ( X )d X = � • Sample variance P ( X i − ¯ X ) 2 σ 2 X = N � P ( X i − ¯ • Population variance X ) 2 s 2 X = N − 1

  32. Descriptive Statistics: Covariance Cov( X, Y ) = E [( X − ¯ X )( Y − ¯ Y )] Z ( X − ¯ X )( Y − ¯ = Y ) P ( XY )d X d Y • Covariance P ( X i − ¯ X )( Y i − ¯ Y ) σ XY = N

  33. Descriptive Statistics: Covariance σ XY > 0 σ XY < 0 Y Y X X

  34. Gaussians: 
 The Basics • One dimensional − ( x − µ ) 2 1  � P ( x ; µ, σ ) = 2 πσ exp continuous PDF √ 2 σ 2 • Two parameters: 
 Mean μ 
 Standard deviation σ • Symmetric • Common! But often an over-simplification.

  35. Gaussians: 
 Sigma numbers • Distance from mean defined in number of standard deviations sigma • Probability mass: 68% • 68% within 1 σ 95% • 95% within 2 σ 99.7% • 99.7% within 3 σ

  36. Gaussians: 
 Properties • Error function is cumulative integral of Gaussian • Sigma numbers can be read off

  37. Gaussians: 
 Properties • Sum of Gaussians has simple form: � X ∼ N ( µ x , σ 2 x ) Y ∼ N ( µ y , σ 2 y ) � ⇒ X + Y ∼ N ( µ x + µ y , σ 2 x + σ 2 = y ) � • Especially useful for sum of identical Gaussians, and leads to formula that error on the mean ~ n 1/2

  38. Gaussians: 
 Properties • Central limit theorem: 
 Given a collection of random variables X i : n 1 X � ( X i − µ i ) → N (0 , 1) s n i =1 � n X s 2 σ 2 n = i � i =1 • Provided that: 1 X ( X − µ i ) 2 ⇤ ⇥ → 0 E s 2 n

  39. Gaussians: 
 Properties • Central limit theorem: Single 
 Mean of 2 distribution Mean of 3 Mean of 4

  40. Gaussians: 
 Multivariate  � 1 − 1 2( x − µ ) T C − 1 ( x − µ ) P ( x ; µ , C ) = 2 | C | exp n (2 π ) • C is the covariance matrix - describes correlations between quantities • For example: data points often have correlated errors

  41. Interpretations of Probability Frequentists Bayesians Use probabilities to … describe frequencies quantify information Think model random variables with fixed unknowns parameters are … probabilities a repeatable random observed and therefore Think data is … variable fixed Call their work … “Statistics" “Inference" Make statements 
 intervals covering the truth constraints on model about … x% of the time parameters many approaches with 
 one approach with 
 Have … lots of implicit choices explicit choices

  42. Why Bayesian probability for science? • Answers the right question • We want facts about the world, not about hypothetical ensembles of experiments • The ideal process is always clear • Practical implementations more difficult • Problems and questions are more explicit

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend