Probability Review Gonzalo Mateos Dept. of ECE and Goergen - PowerPoint PPT Presentation

Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 16, 2020 Introduction to Random Processes Probability Review 1

Markov and Chebyshev’s inequalities Markov and Chebyshev’s inequalities Convergence of random variables Limit theorems Conditional probabilities Conditional expectation Introduction to Random Processes Probability Review 2

Markov’s inequality ◮ RV X with E [ | X | ] < ∞ , constant a > 0 ◮ Markov’s inequality states ⇒ P ( | X | ≥ a ) ≤ E ( | X | ) a Proof. ◮ I {| X | ≥ a } = 1 when | X | ≥ a and | X | 0 else. Then (figure to the right) a I {| X | ≥ a } ≤ | X | a ◮ Use linearity of expected value a E ( I {| X | ≥ a } ) ≤ E ( | X | ) X − a a ◮ Indicator function’s expectation = Probability of indicated event a P ( | X | ≥ a ) ≤ E ( | X | ) Introduction to Random Processes Probability Review 3

Chebyshev’s inequality ◮ RV X with E ( X ) = µ and E ( X − µ ) 2 � = σ 2 , constant k > 0 � ◮ Chebyshev’s inequality states ⇒ P ( | X − µ | ≥ k ) ≤ σ 2 k 2 Proof. ◮ Markov’s inequality for the RV Z = ( X − µ ) 2 and constant a = k 2 ( X − µ ) 2 � � ≤ E [ | Z | ] = E ( X − µ ) 2 ≥ k 2 � | Z | ≥ k 2 � � � P = P k 2 k 2 ◮ Notice that ( X − µ ) 2 ≥ k 2 if and only if | X − µ | ≥ k thus ( X − µ ) 2 � � P ( | X − µ | ≥ k ) ≤ E k 2 ◮ Chebyshev’s inequality follows from definition of variance Introduction to Random Processes Probability Review 4

Comments and observations ◮ If absolute expected value is finite, i.e., E [ | X | ] < ∞ ⇒ Complementary (c)cdf decreases at least like x − 1 (Markov’s) ◮ If mean E ( X ) and variance E ( X − µ ) 2 � � are finite ⇒ Ccdf decreases at least like x − 2 (Chebyshev’s) ◮ Most cdfs decrease exponentially (e.g. e − x 2 for normal) ⇒ Power law bounds ∝ x − α are loose but still useful ◮ Markov’s inequality often derived for nonnegative RV X ≥ 0 ⇒ Can drop the absolute value to obtain P ( X ≥ a ) ≤ E ( X ) a ⇒ General bound P ( X ≥ a ) ≤ E ( X r ) holds for r > 0 a r Introduction to Random Processes Probability Review 5

Convergence of random variables Markov and Chebyshev’s inequalities Convergence of random variables Limit theorems Conditional probabilities Conditional expectation Introduction to Random Processes Probability Review 6

Limits ◮ Sequence of RVs X N = X 1 , X 2 , . . . , X n , . . . ⇒ Distinguish between random process X N and realizations x N Q1) Say something about X n for n large? ⇒ Not clear, X n is a RV Q2) Say something about x n for n large? ⇒ Certainly, look at n →∞ x n lim Q3) Say something about P ( X n ∈ X ) for n large? ⇒ Yes, n →∞ P ( X n ∈ X ) lim ◮ Translate what we now about regular limits to definitions for RVs ◮ Can start from convergence of sequences: n →∞ x n lim ⇒ Sure and almost sure convergence ◮ Or from convergence of probabilities: n →∞ P ( X n ) lim ⇒ Convergence in probability, in mean square and distribution Introduction to Random Processes Probability Review 7

Convergence of sequences and sure convergence ◮ Denote sequence of numbers x N = x 1 , x 2 , . . . , x n , . . . ◮ Def: Sequence x N converges to the value x if given any ǫ > 0 ⇒ There exists n 0 such that for all n > n 0 , | x n − x | < ǫ ◮ Sequence x n comes arbitrarily close to its limit ⇒ | x n − x | < ǫ ⇒ And stays close to its limit for all n > n 0 ◮ Random process (sequence of RVs) X N = X 1 , X 2 , . . . , X n , . . . ⇒ Realizations of X N are sequences x N ◮ Def: We say X N converges surely to RV X if ⇒ n →∞ x n = x for all realizations x N of X N lim ◮ Said differently, lim n →∞ X n ( s ) = X ( s ) for all s ∈ S ◮ Not really adequate. Even a (practically unimportant) outcome that happens with vanishingly small probability prevents sure convergence Introduction to Random Processes Probability Review 8

Almost sure convergence ◮ RV X and random process X N = X 1 , X 2 , . . . , X n , . . . ◮ Def: We say X N converges almost surely to RV X if � � P n →∞ X n = X lim = 1 ⇒ Almost all sequences converge, except for a set of measure 0 ◮ Almost sure convergence denoted as ⇒ n →∞ X n = X lim a.s. ⇒ Limit X is a random variable 1 Example 0.5 ◮ X 0 ∼ N (0 , 1) (normal, mean 0, variance 1) 0 ◮ Z n sequence of Bernoulli RVs, parameter p − 0.5 ◮ Define ⇒ X n = X 0 − Z n − 1 n − 1.5 ◮ Z n n → 0 so lim n →∞ X n = X 0 a.s. (also surely) − 2 10 20 30 40 50 60 70 80 90 100 Introduction to Random Processes Probability Review 9

Almost sure convergence example ◮ Consider S = [0 , 1] and let P ( · ) be the uniform probability distribution ⇒ P ([ a , b ]) = b − a for 0 ≤ a ≤ b ≤ 1 ◮ Define the RVs X n ( s ) = s + s n and X ( s ) = s ◮ For all s ∈ [0 , 1) ⇒ s n → 0 as n → ∞ , hence X n ( s ) → s = X ( s ) ◮ For s = 1 ⇒ X n (1) = 2 for all n , while X (1) = 1 ◮ Convergence only occurs on the set [0 , 1), and P ([0 , 1)) = 1 ⇒ We say lim n →∞ X n = X a.s. ⇒ Once more, note the limit X is a random variable Introduction to Random Processes Probability Review 10

Convergence in probability ◮ Def: We say X N converges in probability to RV X if for any ǫ > 0 n →∞ P ( | X n − X | < ǫ ) = 1 lim ⇒ Prob. of distance | X n − X | becoming smaller than ǫ tends to 1 ◮ Statement is about probabilities, not about realizations (sequences) ⇒ Probability converges, realizations x N may or may not converge ⇒ Limit and prob. interchanged with respect to a.s. convergence Theorem Almost sure (a.s.) convergence implies convergence in probability Proof. ◮ If n →∞ X n = X then for any ǫ > 0 there is n 0 such that lim | X n − X | < ǫ for all n ≥ n 0 ◮ True for all almost all sequences so P ( | X n − X | < ǫ ) → 1 Introduction to Random Processes Probability Review 11

Convergence in probability example − 0.6 − 0.8 − 1 − 1.2 ◮ X 0 ∼ N (0 , 1) (normal, mean 0, variance 1) − 1.4 ◮ Z n sequence of Bernoulli RVs, parameter 1 / n − 1.6 − 1.8 ◮ Define ⇒ X n = X 0 − Z n − 2 10 20 30 40 50 60 70 80 90 100 − 0.6 ◮ X n converges in probability to X 0 because − 0.8 − 1 − 1.2 P ( | X n − X 0 | < ǫ ) = P ( | Z n | < ǫ ) − 1.4 = 1 − P ( Z n = 1) − 1.6 − 1.8 = 1 − 1 − 2 100 200 300 400 500 600 700 800 900 1000 n → 1 − 0.6 − 0.8 − 1 ◮ Plot of path x n up to n = 10 2 , n = 10 3 , n = 10 4 − 1.2 − 1.4 ⇒ Z n = 1 becomes ever rarer but still happens − 1.6 − 1.8 − 2 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Introduction to Random Processes Probability Review 12

Difference between a.s. and in probability ◮ Almost sure convergence implies that almost all sequences converge ◮ Convergence in probability does not imply convergence of sequences ◮ Latter example: X n = X 0 − Z n , Z n is Bernoulli with parameter 1 / n ⇒ Showed it converges in probability P ( | X n − X 0 | < ǫ ) = 1 − 1 n → 1 ⇒ But for almost all sequences, n →∞ x n does not exist lim ◮ Almost sure convergence ⇒ disturbances stop happening ◮ Convergence in prob. ⇒ disturbances happen with vanishing freq. ◮ Difference not irrelevant ◮ Interpret Z n as rate of change in savings ◮ With a.s. convergence risk is eliminated ◮ With convergence in prob. risk decreases but does not disappear Introduction to Random Processes Probability Review 13

Mean-square convergence ◮ Def: We say X N converges in mean square to RV X if | X n − X | 2 � � lim = 0 n →∞ E ⇒ Sometimes (very) easy to check Theorem Convergence in mean square implies convergence in probability Proof. ◮ From Markov’s inequality � | X n − X | 2 � ≤ E | X n − X | 2 ≥ ǫ 2 � � P ( | X n − X | ≥ ǫ ) = P ǫ 2 /ǫ 2 → 0 for all ǫ ◮ If X n → X in mean-square sense, E | X n − X | 2 � � ◮ Almost sure and mean square ⇒ neither one implies the other Introduction to Random Processes Probability Review 14

Convergence in distribution ◮ Consider a random process X N . Cdf of X n is F n ( x ) ◮ Def: We say X N converges in distribution to RV X with cdf F X ( x ) if ⇒ lim n →∞ F n ( x ) = F X ( x ) for all x at which F X ( x ) is continuous ◮ No claim about individual sequences, just the cdf of X n ⇒ Weakest form of convergence covered ◮ Implied by almost sure, in probability, and mean square convergence Example 4 2 ◮ Y n ∼ N (0 , 1) 0 − 2 ◮ Z n Bernoulli with parameter p − 4 − 6 ◮ Define ⇒ X n = Y n − 10 Z n / n − 8 ◮ Z n − 10 n → 0 so lim n →∞ F n ( x ) “=” N (0 , 1) − 12 10 20 30 40 50 60 70 80 90 100 Introduction to Random Processes Probability Review 15

Convergence in distribution (continued) ◮ Individual sequences x n do not converge in any sense ⇒ It is the distribution that converges n = 1 n = 10 n = 100 0.4 0.4 0.4 0.35 0.35 0.35 0.3 0.3 0.3 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1 0.05 0.05 0.05 0 0 0 − 15 − 10 − 5 0 5 − 6 − 4 − 2 0 2 4 6 − 4 − 3 − 2 − 1 0 1 2 3 4 ◮ As the effect of Z n / n vanishes pdf of X n converges to pdf of Y n ⇒ Standard normal N (0 , 1) Introduction to Random Processes Probability Review 16

Probability Review Gonzalo Mateos Dept. of ECE and Goergen - PowerPoint PPT Presentation

Probability Review Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ September 16, 2020 Introduction to Random Processes Probability

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Counting and Probability Whats to come? Counting and Probability Whats to come?

CS70: Jean Walrand: Lecture 21. Events, Conditional Probability 1. Probability Basics Review 2.

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Probability Review CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner Probability

Lecture 15: More Probability. Summary. CS70: Onwards. Events, Conditional Probability,

DATA MINING TECHNIQUES Review of Probability Theory Yijun Zhao Northeastern University spring

Probability Probability Random variables Atomic events Sample space Probability

Foundations of Computer Science Lecture 16 Conditional Probability Updating a Probability when

MATH 105: Finite Mathematics 7-3: Probability from Counting Prof. Jonathan Duncan Walla Walla

Covariance and Correlation The probability distribution of a random variable gives complete

Basic Probability Basic Probability In [9]: import mxnet as mx from mxnet import nd % matplotlib

Basic Probability Robert Platt Northeastern University Some images and slides are used from: 1.

Probability and Risk CS 4730 Computer Game Design

Categorical Probability: Results and Challenges Tobias Fritz May 2019 What this talk is (not)

Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and

BS2247 Introduction to Econometrics Lecture 2: Fundamentals of Probability Dr. Kai Sun Aston