SLIDE 1 Probability Review
CMSC 473/673 UMBC
Some slides adapted from 3SLP, Jason Eisner
SLIDE 2
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 3 Interpretations of Probability
Past performance 58% of the past 100 flips were heads Hypothetical performance If I flipped the coin in many parallel universes… Subjective strength of belief Would pay up to 58 cents for chance to win $1 Output of some computable formula? p(heads) vs q(heads)
SLIDE 4 (Most) Probability Axioms
p(everything) = 1 p(nothing) = p(φ) = 0 p(A) ≤ p(B), when A ⊆ B p(A ∪ B) = p(A) + p(B), when A ∩ B = φ
everything A B
p(A ∪ B) = p(A) + p(B) – p(A ∩ B) p(A ∪ B) ≠ p(A) + p(B)
empty set
SLIDE 5
Examining p(everything) =1
If p(everything) = 1…
SLIDE 6
Examining p(everything) =1
If p(everything) = 1… and you can break everything into M unique items 𝑦1, 𝑦2, … , 𝑦𝑁…
SLIDE 7
Examining p(everything) =1
If p(everything) = 1… and you can break everything into M unique items 𝑦1, 𝑦2, … , 𝑦𝑁… then each pair 𝑦𝑗 and 𝑦𝑘 are disjoint (𝑦𝑗 ∩ 𝑦𝑘 = 𝜚)…
SLIDE 8
Examining p(everything) =1
If p(everything) = 1… and you can break everything into M unique items 𝑦1, 𝑦2, … , 𝑦𝑁… then each pair 𝑦𝑗 and 𝑦𝑘 are disjoint (𝑦𝑗 ∩ 𝑦𝑘 = 𝜚)… and because everything is the union of 𝑦1, 𝑦2, … , 𝑦𝑁…
SLIDE 9 Examining p(everything) =1
If p(everything) = 1… and you can break everything into M unique items 𝑦1, 𝑦2, … , 𝑦𝑁… then each pair 𝑦𝑗 and 𝑦𝑘 are disjoint (𝑦𝑗 ∩ 𝑦𝑘 = 𝜚)… and because everything is the union of 𝑦1, 𝑦2, … , 𝑦𝑁… 𝑞 everything =
𝑗=1 𝑁
𝑞 𝑦𝑗 = 1
SLIDE 10 A Very Important Concept to Remember
The probabilities of all unique (disjoint) items 𝑦1, 𝑦2, … , 𝑦𝑁 must sum to 1: 𝑞 everything =
𝑗=1 𝑁
𝑞 𝑦𝑗 = 1
SLIDE 11
Probabilities and Random Variables
Random variables: variables that represent the possible outcomes of some random “process”
SLIDE 12 Probabilities and Random Variables
Random variables: variables that represent the possible outcomes of some random “process” Example #1: A (weighted) coin that can come up heads or tails
X is a random variable denoting the possible
X=HEADS or X=TAILS
SLIDE 13 Distribution Notation
If X is a R.V. and G is a distribution:
- 𝑌 ∼ 𝐻 means X is distributed according to
(“sampled from”) 𝐻
SLIDE 14 Distribution Notation
If X is a R.V. and G is a distribution:
- 𝑌 ∼ 𝐻 means X is distributed according to
(“sampled from”) 𝐻
- 𝐻 often has parameters 𝜍 = (𝜍1, 𝜍2, … , 𝜍𝑁)
that govern its “shape”
- Formally written as 𝑌 ∼ 𝐻(𝜍)
SLIDE 15 Distribution Notation
If X is a R.V. and G is a distribution:
- 𝑌 ∼ 𝐻 means X is distributed according to
(“sampled from”) 𝐻
- 𝐻 often has parameters 𝜍 = (𝜍1, 𝜍2, … , 𝜍𝑁) that
govern its “shape”
- Formally written as 𝑌 ∼ 𝐻(𝜍)
i.i.d. If 𝑌1, X2, … , XN are all independently sampled
from 𝐻(𝜍), they are independently and identically distributed
SLIDE 16
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 17 Joint Probability
Probability that multiple things “happen together”
everything A B Joint probability
SLIDE 18 Joint Probability
Probability that multiple things “happen together” p(x,y), p(x,y,z), p(x,y,w,z) Symmetric: p(x,y) = p(y,x)
everything A B Joint probability
SLIDE 19 Joint Probability
Probability that multiple things “happen together” p(x,y), p(x,y,z), p(x,y,w,z) Symmetric: p(x,y) = p(y,x) Form a table based of
- utcomes: sum across cells = 1
everything A B Joint probability p(x,y) Y=0 Y=1 X=“cat” .04 .32 X=“dog” .2 .04 X=“bird” .1 .1 X=“human” .1 .1
SLIDE 20 Joint Probabilities
1
p(A)
what happens as we add conjuncts?
SLIDE 21 Joint Probabilities
1
p(A, B) p(A)
what happens as we add conjuncts?
SLIDE 22 Joint Probabilities
1
p(A, B, C) p(A, B) p(A)
what happens as we add conjuncts?
SLIDE 23 Joint Probabilities
p(A, B, C, D)
1
p(A, B, C) p(A, B) p(A)
what happens as we add conjuncts?
SLIDE 24 Joint Probabilities
p(A, B, C, D)
1
p(A, B, C) p(A, B) p(A) p(A, B, C, D, E)
what happens as we add conjuncts?
SLIDE 25 A Note on Notation
p(X INCLUSIVE_OR Y) p(X ∪ Y) p(X AND Y) p(X, Y) p(X, Y) = p(Y, X)
– except when order matters (should be obvious from context)
SLIDE 26
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 27 Probabilistic Independence
Independence: when events can occur and not impact the probability of
Formally: p(x,y) = p(x)*p(y) Generalizable to > 2 random variables
Q: Are the results of flipping the same coin twice in succession independent?
SLIDE 28 Probabilistic Independence
Independence: when events can occur and not impact the probability of
Formally: p(x,y) = p(x)*p(y) Generalizable to > 2 random variables
Q: Are the results of flipping the same coin twice in succession independent? A: Yes (assuming no weird effects)
SLIDE 29 Probabilistic Independence
Independence: when events can occur and not impact the probability of
Formally: p(x,y) = p(x)*p(y) Generalizable to > 2 random variables
everything A B
Q: Are A and B independent?
SLIDE 30 Probabilistic Independence
Independence: when events can occur and not impact the probability of
Formally: p(x,y) = p(x)*p(y) Generalizable to > 2 random variables
everything A B
Q: Are A and B independent? A: No (work it out from p(A,B)) and the axioms
SLIDE 31 Probabilistic Independence
Independence: when events can occur and not impact the probability of
Formally: p(x,y) = p(x)*p(y) Generalizable to > 2 random variables
Q: Are X and Y independent?
p(x,y) Y=0 Y=1 X=“cat” .04 .32 X=“dog” .2 .04 X=“bird” .1 .1 X=“human” .1 .1
SLIDE 32 Probabilistic Independence
Independence: when events can occur and not impact the probability of
Formally: p(x,y) = p(x)*p(y) Generalizable to > 2 random variables
Q: Are X and Y independent?
p(x,y) Y=0 Y=1 X=“cat” .04 .32 X=“dog” .2 .04 X=“bird” .1 .1 X=“human” .1 .1
A: No (find the marginal probabilities of p(x) and p(y))
SLIDE 33
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 34 Marginal(ized) Probability: The Discrete Case
y x1 & y x2 & y x3 & y x4 & y Consider the mutually exclusive ways that different values of x could occur with y
Q: How do write this in terms of joint probabilities?
SLIDE 35 Marginal(ized) Probability: The Discrete Case
y x1 & y x2 & y x3 & y x4 & y
𝑞 𝑧 =
𝑦
𝑞(𝑦, 𝑧)
Consider the mutually exclusive ways that different values of x could occur with y
SLIDE 36
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 37
Conditional Probability
𝑞 𝑌 𝑍) = 𝑞(𝑌, 𝑍) 𝑞(𝑍)
Conditional Probabilities are Probabilities
SLIDE 38
Conditional Probability
𝑞 𝑌 𝑍) = 𝑞(𝑌, 𝑍) 𝑞(𝑍) 𝑞 𝑍 = marginal probability of Y
SLIDE 39
Conditional Probability
𝑞 𝑌 𝑍) = 𝑞(𝑌, 𝑍) 𝑞(𝑍) 𝑞 𝑍 =
𝑦
𝑞(𝑌 = 𝑦, 𝑍)
SLIDE 40 Revisiting Marginal Probability: The Discrete Case
y x1 & y x2 & y x3 & y x4 & y
𝑞 𝑧 =
𝑦
𝑞(𝑦, 𝑧) =
𝑦
𝑞 𝑦 𝑞 𝑧 𝑦)
SLIDE 41
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 42 Deriving Bayes Rule
Start with conditional p(X | Y)
SLIDE 43 Deriving Bayes Rule
𝑞 𝑌 𝑍) = 𝑞(𝑌, 𝑍) 𝑞(𝑍)
Solve for p(x,y)
SLIDE 44 Deriving Bayes Rule
𝑞 𝑌 𝑍) = 𝑞 𝑍 𝑌) ∗ 𝑞(𝑌) 𝑞(𝑍)
𝑞 𝑌 𝑍) = 𝑞(𝑌, 𝑍) 𝑞(𝑍)
Solve for p(x,y)
𝑞 𝑌, 𝑍 = 𝑞 𝑌 𝑍)𝑞(𝑍)
p(x,y) = p(y,x)
SLIDE 45 Bayes Rule
𝑞 𝑌 𝑍) = 𝑞 𝑍 𝑌) ∗ 𝑞(𝑌) 𝑞(𝑍)
posterior probability likelihood prior probability marginal likelihood (probability)
SLIDE 46
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable
SLIDE 47 Probability Chain Rule
𝑞 𝑦1, 𝑦2 = 𝑞 𝑦1 𝑞 𝑦2 𝑦1)
Bayes rule
SLIDE 48
Probability Chain Rule
𝑞 𝑦1, 𝑦2, … , 𝑦𝑇 = 𝑞 𝑦1 𝑞 𝑦2 𝑦1)𝑞 𝑦3 𝑦1, 𝑦2) ⋯ 𝑞 𝑦𝑇 𝑦1, … , 𝑦𝑗
SLIDE 49 Probability Chain Rule
𝑞 𝑦1, 𝑦2, … , 𝑦𝑇 = 𝑞 𝑦1 𝑞 𝑦2 𝑦1)𝑞 𝑦3 𝑦1, 𝑦2) ⋯ 𝑞 𝑦𝑇 𝑦1, … , 𝑦𝑗 = ෑ
𝑗 𝑇
𝑞 𝑦𝑗 𝑦1, … , 𝑦𝑗−1)
SLIDE 50 Probability Chain Rule
𝑞 𝑦1, 𝑦2, … , 𝑦𝑇 = 𝑞 𝑦1 𝑞 𝑦2 𝑦1)𝑞 𝑦3 𝑦1, 𝑦2) ⋯ 𝑞 𝑦𝑇 𝑦1, … , 𝑦𝑗 = ෑ
𝑗 𝑇
𝑞 𝑦𝑗 𝑦1, … , 𝑦𝑗−1)
extension of Bayes rule
SLIDE 51 Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Common distributions Expected Value (of a function) of a Random Variable
SLIDE 52 Expected Value of a Random Variable
𝑌 ~ 𝑞 ⋅
random variable
SLIDE 53 Expected Value of a Random Variable
𝑌 ~ 𝑞 ⋅ 𝔽 𝑌 =
𝑦
𝑦 𝑞 𝑦
random variable expected value (distribution p is implicit)
SLIDE 54 Expected Value: Example
1 2 3 4 5 6
uniform distribution of number of cats I have
1/6 * 1 + 1/6 * 2 + 1/6 * 3 + 1/6 * 4 + 1/6 * 5 + 1/6 * 6 = 3.5 𝔽 𝑌 =
𝑦
𝑦 𝑞 𝑦
SLIDE 55 Expected Value: Example 2
1 2 3 4 5 6
non-uniform distribution of number of cats a normal cat person has
1/2 * 1 + 1/10 * 2 + 1/10 * 3 + 1/10 * 4 + 1/10 * 5 + 1/10 * 6 = 2.5 𝔽 𝑌 =
𝑦
𝑦 𝑞 𝑦
SLIDE 56
Expected Value of a Function of a Random Variable
𝑌 ~ 𝑞 ⋅ 𝔽 𝑌 =
𝑦
𝑦 𝑞(𝑦) 𝔽 𝑔(𝑌) =? ? ?
SLIDE 57
Expected Value of a Function of a Random Variable
𝑌 ~ 𝑞 ⋅ 𝔽 𝑌 =
𝑦
𝑦 𝑞(𝑦) 𝔽 𝑔(𝑌) =
𝑦
𝑔(𝑦) 𝑞 𝑦
SLIDE 58 Expected Value of Function: Example
1 2 3 4 5 6
non-uniform distribution of number of cats I start with
What if each cat magically becomes two? 𝑔 𝑙 = 2𝑙 𝔽 𝑔(𝑌) =
𝑦
𝑔(𝑦) 𝑞 𝑦
SLIDE 59 Expected Value of Function: Example
1 2 3 4 5 6
non-uniform distribution of number of cats I start with
1/2 * 21 + 1/10 * 22 + 1/10 * 23 + 1/10 * 24 + 1/10 * 25 + 1/10 * 26 = 13.4 What if each cat magically becomes two? 𝑔 𝑙 = 2𝑙 𝔽 𝑔(𝑌) =
𝑦
𝑔(𝑦) 𝑞 𝑦 =
𝑦
2𝑦𝑞(𝑦)
SLIDE 60
Probability Prerequisites
Basic probability axioms and definitions Joint probability Probabilistic Independence Marginal probability Definition of conditional probability Bayes rule Probability chain rule Expected Value (of a function) of a Random Variable