Probability Review Applied Bayesian Statistics Dr. Earvin Balderama - PowerPoint PPT Presentation

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 31, 2017 Applied Bayesian Statistics 1 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Random Variables Mathematically, a random variable is a function that maps a sample space into the real numbers: X : S → R . Countable (discrete). 1 Uncountable (continuous). 2 Example: 3 coin tosses S = { HHH , HHT , HTH , THH , THT , TTH , TTT } We may want to create a random variable, X , defined as the number of tails . X = { 0 , 1 , 2 , 3 } Applied Bayesian Statistics 2 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Probability Mathematically, a probability function assigns numbers (between 0 and 1) to subsets of a sample space: P : B → [ 0 , 1 ] , ∀B ⊆ S . Two interpretations: ( Frequentist ) Based on long-run relative frequencies of possible 1 outcomes. ( Bayesian ) Based on belief about how likely each possible outcome is. 2 Applied Bayesian Statistics 3 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Probability Mathematically, a probability function assigns numbers (between 0 and 1) to subsets of a sample space: P : B → [ 0 , 1 ] , ∀B ⊆ S . Two interpretations: ( Frequentist ) Based on long-run relative frequencies of possible 1 outcomes. ( Bayesian ) Based on belief about how likely each possible outcome is. 2 Regardless of interpretation, same basic probability laws apply, e.g., P ( A ) ≥ 0, P ( S ) = 1, P ( A ∪ B ) = P ( A ) + P ( B ) , for mutually exclusive A and B . Applied Bayesian Statistics 3 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Probability distributions A probability distribution is a list of all possible values of a random variable and their corresponding probabilities. Discrete random variable: probability mass function (PMF) 1 PMF: f ( x ) = Prob ( X = x ) ≥ 0 � Mean: E ( X ) = xf ( x ) x � [ x − E ( X )] 2 f ( x ) Variance: V ( X ) = x Continuous random variable: probability density function (PDF) 2 Prob ( X = x ) = 0 for all x � PDF: f ( x ) ≥ 0 , Prob ( X ∈ B ) = f ( x ) dx B � Mean: E ( X ) = xf ( x ) dx � � � 2 Variance: V ( X ) = x − E ( X ) f ( x ) dx Applied Bayesian Statistics 4 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Parametric families of distributions A statistical analysis typically proceeds by selecting a PMF (or PDF) that seems to match the distribution of a sample. We rarely know the PMF (or PDF) exactly, but we may assume it is from a parametric family of distributions, and estimate the parameters. Discrete random variables 1 Binomial (Bernoulli is a special case) Poisson NegativeBinomial Continuous random variables 2 Normal Gamma (Exponential and χ 2 are special cases) InverseGamma Beta (Uniform is a special case) Applied Bayesian Statistics 5 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ Bernoulli ( θ ) Only two outcomes, (success/failure, 0/1, zero/nonzero, etc.), where θ is the probability of success. X ∈ { 0 , 1 } � 1 − θ, if x = 0 , PMF: f ( x ) = Prob ( X = x ) = θ, if x = 1 . � Mean: E ( X ) = xf ( x ) = 0 ( 1 − θ ) + 1 θ = θ x � [ x − θ ] 2 f ( x ) = ( 0 − θ ) 2 ( 1 − θ ) + ( 1 − θ ) 2 θ = θ ( 1 − θ ) Variance: V ( X ) = x Applied Bayesian Statistics 6 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ Binomial ( n , θ ) X = number of “successes” in n independent “Bernoulli trials,” where θ is the probability of success on each trial. X ∈ { 0 , 1 , . . . , n } � n � θ x ( 1 − θ ) n − x . PMF: f ( x ) = Prob ( X = x ) = x Mean: E ( X ) = n θ Variance: V ( X ) = n θ ( 1 − θ ) Applied Bayesian Statistics 7 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ Poisson ( λ ) X = number of events that occur in a unit of time. X ∈ { 0 , 1 , . . . } PMF: f ( x ) = Prob ( X = x ) = λ x e − λ . x ! Mean: E ( X ) = λ Variance: V ( X ) = λ Note: Can be parameterized with λ = n θ , where θ is the expected number of events per unit time. E ( X ) = V ( X ) = n θ . Applied Bayesian Statistics 8 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ NegativeBinomial ( r , θ ) X = number of “failures” until r “successes” in a sequence of independent “Bernoulli trials,” where θ is the probability of success on each trial. X ∈ { 0 , 1 , . . . , n } � x + r − 1 � θ r ( 1 − θ ) x . PMF: f ( x ) = Prob ( X = x ) = x Mean: E ( X ) = r ( 1 − θ ) θ Variance: V ( X ) = r ( 1 − θ ) θ 2 Note: The geometric distribution is a special case: Geom ( θ ) = NB ( 1 , θ ) . Note: MANY different ways to specify the NB distribution. The important thing to note is that NB is a discrete count distribution that is a more flexible model than Poisson. Applied Bayesian Statistics 9 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ Normal ( µ, σ 2 ) X ∈ ( −∞ , ∞ ) � � 2 � � x − µ 1 − 1 PDF: f ( x ) = √ exp . 2 σ 2 πσ Mean: E ( X ) = µ Variance: V ( X ) = σ 2 Applied Bayesian Statistics 10 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ Gamma ( a , b ) X ∈ ( 0 , ∞ ) b a Γ( a ) x a − 1 e − bx . PDF: f ( x ) = Mean: E ( X ) = a b Variance: V ( X ) = a b 2 Parameters: shape a > 0, rate b > 0. Applied Bayesian Statistics 11 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ InverseGamma ( a , b ) If Y ∼ Gamma ( a , b ) , then X = 1 Y ∼ InverseGamma ( a , b ) . X ∈ ( 0 , ∞ ) b a Γ( a ) x − a − 1 e − b / x . PDF: f ( x ) = b Mean: E ( X ) = a − 1 , for a > 1. b 2 Variance: V ( X ) = ( a − 1 ) 2 ( a − 2 ) , for a > 2. Parameters: shape a > 0, rate b > 0. Applied Bayesian Statistics 12 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

X ∼ Beta ( a , b ) X ∈ [ 0 , 1 ] Γ( a + b ) Γ( a )Γ( b ) x a − 1 ( 1 − x ) b − 1 . PDF: f ( x ) = a Mean: E ( X ) = a + b ab Variance: V ( X ) = ( a + b ) 2 ( a + b + 1 ) Parameters: a > 0, b > 0. Applied Bayesian Statistics 13 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Joint distributions A random vector of p random variables: X = ( X 1 , X 2 , . . . , X p ) . For now, suppose we have just p = 2 random variables, X and Y . ( X , Y ) can be discrete or continuous. Applied Bayesian Statistics 14 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Joint distributions Discrete ( X , Y ) 1 joint PMF : f ( x , y ) = Prob ( X = x , Y = y ) � marginal PMF for X : f X ( x ) = Prob ( X = x ) = f ( x , y ) y � marginal PMF for Y : f Y ( y ) = Prob ( Y = y ) = f ( x , y ) x Continuous ( X , Y ) 2 joint PDF : f ( x , y ) � Prob [( X , Y ) ∈ B ] = f ( x , y ) dxdy B � marginal PDF for X : f X ( x ) = f ( x , y ) dy � marginal PDF for Y : f Y ( y ) = f ( x , y ) dx Applied Bayesian Statistics 15 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Discrete random variables Example Patients are randomly assigned a dose and followed to determine whether they develop a tumor. X ∈ { 5 , 10 , 20 } is the dose; Y ∈ { 0 , 1 } is 1 if a tumor develops and 0 otherwise. X Y 5 10 20 The joint PMF is given by 0 0.469 0.124 0.049 1 0.231 0.076 0.051 Applied Bayesian Statistics 16 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Discrete random variables Example Find the marginal PMFs of X and Y . Applied Bayesian Statistics 17 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Discrete random variables Example Find the marginal PMFs of X and Y . � f Y ( 0 ) = f ( x , 0 ) = 0 . 469 + 0 . 124 + 0 . 049 = 0 . 642 x � f Y ( 1 ) = f ( x , 1 ) = 0 . 231 + 0 . 076 + 0 . 051 = 0 . 358 x Applied Bayesian Statistics 17 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Discrete random variables Example Find the marginal PMFs of X and Y . � f Y ( 0 ) = f ( x , 0 ) = 0 . 469 + 0 . 124 + 0 . 049 = 0 . 642 x � f Y ( 1 ) = f ( x , 1 ) = 0 . 231 + 0 . 076 + 0 . 051 = 0 . 358 x f X ( 5 ) = 0 . 7 , f X ( 10 ) = 0 . 2 , f X ( 20 ) = 0 . 1 X Y 5 10 20 0 0.469 0.124 0.049 0.642 1 0.231 0.076 0.051 0.358 0.7 0.2 0.1 1 Applied Bayesian Statistics 17 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Discrete random variables conditional PMF of Y given X : f ( y | x ) = Prob ( Y = y | X = x ) = Prob ( X = x , Y = y ) = f ( x , y ) Prob ( X = x ) f X ( x ) joint conditional = marginal Applied Bayesian Statistics 18 Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama - PowerPoint PPT Presentation

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 31, 2017 Applied Bayesian Statistics 1 Last edited September 8, 2017 by Earvin Balderama

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

CS70: Jean Walrand: Lecture 21. Events, Conditional Probability 1. Probability Basics Review 2.

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Probability Review CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner Probability

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

DATA MINING TECHNIQUES Review of Probability Theory Yijun Zhao Northeastern University spring

Probability Probability Random variables Atomic events Sample space Probability

Foundations of Computer Science Lecture 16 Conditional Probability Updating a Probability when

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil February 2, 2016 The

An extension of the MoreauJean scheme based on the generalized schemes for the numerical

Advances in Possible Orders of Circulant Hadamard Matrices, and Sequences with Large Merit Factor

On Hadamards Maximal Determinant Problem Judy-anne Osborn MSI, ANU April 2009 Judy-anne

OSGi / Java in Industrial IoT More than a Solid Trend - Essential to Scale into the World of

Efficient Private Information Retrieval protocols based on transversal designs Julien Lavauzelle

Towards an Algebraic Network Information Theory Bobak Nazer Boston University Charles River

AIP (PLEXOS) Market Simulation Model Validation Project Workshop 2 Initial Findings Mike