chapter 2 basics from probability theory and statistics
play

Chapter 2: Basics from Probability Theory and Statistics 2.1 - PowerPoint PPT Presentation

Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events, Probabilities, Random Variables, Distributions, Moments Generating Functions, Deviation Bounds, Limit Theorems Basics from Information Theory 2.2


  1. Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events, Probabilities, Random Variables, Distributions, Moments Generating Functions, Deviation Bounds, Limit Theorems Basics from Information Theory 2.2 Statistical Inference: Sampling and Estimation Moment Estimation, Confidence Intervals Parameter Estimation, Maximum Likelihood, EM Iteration 2.3 Statistical Inference: Hypothesis Testing and Regression Statistical Tests, p-Values, Chi-Square Test Linear and Logistic Regression mostly following L. Wasserman Chapters 1-5, with additions from other textbooks on stochastics 2-1 IRDM WS 2005

  2. 2.1 Basic Probability Theory A probability space is a triple ( Ω , E, P) with • a set Ω of elementary events (sample space), • a family E of subsets of Ω with Ω∈ E which is closed under ∩ , ∪ , and − with a countable number of operands (with finite Ω usually E=2 Ω ), and • a probability measure P: E → → → [0,1] with P[ Ω ]=1 and → P[ ∪ i A i ] = i P[A i ] for countably many, pairwise disjoint A i Properties of P: P[A] + P[ ¬ A] = 1 P[A ∪ B] = P[A] + P[B] – P[A ∩ B] P[ ∅ ] = 0 (null/impossible event) P[ Ω ] = 1 (true/certain event) 2-2 IRDM WS 2005

  3. Independence and Conditional Probabilities Two events A, B of a prob. space are independent if P[A ∩ B] = P[A] P[B]. A finite set of events A={A 1 , ..., A n } is independent = ∏ if for every subset S ⊆ A the equation P[ A ] P[A ] i i holds. ∈ ∈ A S A S i i The conditional probability P[A | B] of A under the ∩ P [ A B ] condition (hypothesis) B is defined as: = P [ A | B ] P [ B ] Event A is conditionally independent of B given C if P[A | BC] = P[A | C]. 2-3 IRDM WS 2005

  4. Total Probability and Bayes’ Theorem Total probability theorem: For a partitioning of Ω into events B 1 , ..., B n : n = P[ A] P[ A| B ] P[ B ] i i = i 1 P [ B | A ] P [ A ] = Bayes‘ theorem: P [ A | B ] P [ B ] P[A|B] is called posterior probability P[A] is called prior probability 2-4 IRDM WS 2005

  5. Random Variables A random variable (RV) X on the prob. space ( Ω , E, P) is a function X: Ω → M with M ⊆ R s.t. {e | X(e) ≤ x} ∈ E for all x ∈ M (X is measurable). F X : M → [0,1] with F X (x) = P[X ≤ x] is the (cumulative) distribution function (cdf) of X. With countable set M the function f X : M → [0,1] with f X (x) = P[X = x] is called the (probability) density function (pdf) of X; in general f X (x) is F‘ X (x). For a random variable X with distribution function F, the inverse function F -1 (q) := inf{x | F(x) > q} for q ∈ [0,1] is called quantile function of X. (0.5 quantile (50 th percentile) is called median) Random variables with countable M are called discrete , otherwise they are called continuous . For discrete random variables the density function is also referred to as the probability mass function . 2-5 IRDM WS 2005

  6. Important Discrete Distributions p ) − x 1 x = = − • Bernoulli distribution with parameter p: P [ X x ] p (1 ∈ for x {0,1} • Uniform distribution over {1, 2, ..., m}: 1 = = = ≤ ≤ P [ X k ] f ( k ) for 1 k m X m • Binomial distribution (coin toss n times repeated; X: #heads): n − k n k = = = − P [ X k ] f ( k ) p ( 1 p ) X k • Poisson distribution (with rate λ ): k λ − λ = = = P [ X k ] f ( k ) e X k ! • Geometric distribution (#coin tosses until first head): k = = = − P [ X k ] f ( k ) ( 1 p ) p X • 2-Poisson mixture (with a 1 +a 2 =1): k k λ λ − λ − λ = = = 1 + 2 1 2 P [ X k ] f ( k ) a e a e X 1 2 k ! k ! 2-6 IRDM WS 2005

  7. Important Continuous Distributions • Uniform distribution in the interval [a,b] 1 = ≤ ≤ f X ( x ) for a x b ( 0 otherwise ) − b a • Exponential distribution (z.B. time until next event of a Poisson process) with rate λ = lim ∆ t → 0 (# events in ∆ t) / ∆ t : − λ x = λ ≥ f ( x ) e for x 0 ( 0 otherwise ) X − λ − λ x x = λ + − λ 1 2 f ( x ) p e ( 1 p ) e • Hyperexponential distribution: X 1 2 + a 1 a b → > f ( x ) for x b , 0 otherwise • Pareto distribution: X b x → c f ( x ) Example of a „heavy-tailed“ distribution with X α + 1 x 1 = F ( x ) • logistic distribution: X e − x + 1 2-7 IRDM WS 2005

  8. Normal Distribution (Gaussian Distribution) • Normal distribution N( µ µ µ , σ µ σ σ σ 2 ) (Gauss distribution; 2 − µ ( x ) − approximates sums of independent, 2 σ = 1 2 f ( x ) e identically distributed random variables): X 2 πσ 2 • Distribution function of N(0,1): 2 x − z Φ = 1 2 ( z ) e dx π 2 − ∞ Theorem: Let X be normal distributed with expectation µ and variance σ 2 . − µ = X Then Y : σ is normal distributed with expectation 0 and variance 1. 2-8 IRDM WS 2005

  9. Multidimensional (Multivariate) Distributions Let X 1 , ..., X m be random variables over the same prob. space with domains dom(X 1 ), ..., dom(X m ). The joint distribution of X 1 , ..., X m has a density function f ( x , ..., x ) X , ..., X 1 m 1 m = with ... f ( x , ..., x ) 1 X , ..., X 1 m 1 m ∈ ∈ x dom ( X ) x dom ( X ) 1 1 m m = or ... f ( x ,...,x ) dx ...dx 1 X 1,...,Xm 1 m m 1 dom( X ) dom( X ) 1 m The marginal distribution of X i in the joint distribution of X 1 , ..., X m has the density function ... ... f ( x , ..., x ) or X , ..., X 1 m 1 m − + x x x x 1 i 1 i 1 m ... ... f ( x , ..., x ) dx ... dx dx ... dx + − X , ..., X 1 m m i 1 i 1 1 1 m X X − X + X 1 i 1 i 1 m 2-9 IRDM WS 2005

  10. � � � � Important Multivariate Distributions multinomial distribution (n trials with m-sided dice): n k k = ∧ ∧ = = = 1 m P [ X k ... X k ] f ( k , ..., k ) p ... p 1 1 m m X , ..., X 1 m m 1 1 m k ... k 1 m n n ! = with : k ... k k ! ... k ! 1 m 1 m multidimensional normal distribution: − 1 T 1 − − µ Σ − µ ( x ) ( x ) 1 = 2 f ( x ) e X , ..., X 1 m m π Σ ( 2 ) with covariance matrix Σ with Σ ij := Cov(X i ,X j ) 2-10 IRDM WS 2005

  11. Moments For a discrete random variable X with density f X = is the expectation value (mean) of X E [ X ] k f X k ( ) ∈ k M i i = is the i-th moment of X E [ X ] k f ( k ) X ∈ k M 2 2 2 = − = − is the variance of X V [ X ] E [( X E [ X ]) ] E [ X ] E [ X ] For a continuous random variable X with density f X + ∞ = is the expectation value of X E [ X ] x f ( x ) dx X − ∞ + ∞ i i = is the i-th moment of X E [ X ] x f ( x ) dx X − ∞ 2 2 2 = − = − is the variance of X V [ X ] E [( X E [ X ]) ] E [ X ] E [ X ] + = + E [ X Y ] E [ X ] E [ Y ] Theorem: Expectation values are additive: (distributions are not) 2-11 IRDM WS 2005

  12. Properties of Expectation and Variance E[aX+b] = aE[X]+b for constants a, b E[X 1 +X 2 +...+X n ] = E[X 1 ] + E[X 2 ] + ... + E[X n ] (i.e. expectation values are generally additive, but distributions are not!) E[X 1 +X 2 +...+X N ] = E[N] E[X] if X 1 , X 2 , ..., X N are independent and identically distributed (iid RVs) with mean E[X] and N is a stopping-time RV Var[aX+b] = a 2 Var[X] for constants a, b Var[X 1 +X 2 +...+X n ] = Var[X 1 ] + Var[X 2 ] + ... + Var[X n ] if X 1 , X 2 , ..., X n are independent RVs Var[X 1 +X 2 +...+X N ] = E[N] Var[X] + E[X] 2 Var[N] if X 1 , X 2 , ..., X N are iid RVs with mean E[X] and variance Var[X] and N is a stopping-time RV 2-12 IRDM WS 2005

  13. Correlation of Random Variables Covariance of random variables Xi and Xj:: = − − Cov ( Xi , Xj ) : E [ ( Xi E [ Xi ]) ( Xj E [ Xj ]) ] 2 2 = = − Var ( Xi ) Cov ( Xi , Xi ) E [ X ] E [ X ] Correlation coefficient of Xi and Xj Cov ( Xi , Xj ) ρ = ( Xi , Xj ) : Var ( Xi ) Var ( Xj ) Conditional expectation of X given Y=y: x f (x | y) discrete case X|Y = = E[X | Y y] x f (x | y)dx continuous case X|Y 2-13 IRDM WS 2005

  14. Transformations of Random Variables Consider expressions r(X,Y) over RVs such as X+Y, max(X,Y), etc. 1. For each z find A z = {(x,y) | r(x,y) ≤ z} 2. Find cdf F Z (z) = P[r(x,y) ≤ z] = z f (x, y)dx dy A X,Y 3. Find pdf f Z (z) = F‘ Z (z) Important case: sum of independent RVs (non-negative) Z = X+Y F Z (z) = P[r(x,y) ≤ z] = f (x)f (y)dx dy + ≤ x y z X Y y x − z x z = x 0 f (x)f (y) dxdy = = X Y y 0 z = − x 0 f (x)F (z x) dx = X Y or in discrete case: Convolution = F (z) f (x)f (y) + ≤ Z x y z X Y x y 2-14 IRDM WS 2005

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend