Carnegie Mellon University 10-701 Machine Learning Spring 2013
TA: Ina Fiterau
4th year PhD student MLD
Alex Smola Barnabas Poczos
Review of Probabilities and Basic Statistics
10-701 Recitations
1/25/2013 1 Recitation 1: Statistics Intro
Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: - - PowerPoint PPT Presentation
Carnegie Mellon University 10-701 Machine Learning Spring 2013 TA: Ina Fiterau Alex Smola Barnabas Poczos 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1
Carnegie Mellon University 10-701 Machine Learning Spring 2013
4th year PhD student MLD
Alex Smola Barnabas Poczos
1/25/2013 1 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 2 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
∞ 𝑗=1
1/25/2013 3 Introduction to Probability Theory
Ω 𝐹
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 4 Introduction to Probability Theory
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 5 Introduction to Probability Theory
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 6 Introduction to Probability Theory
𝑇1 𝑇2 𝑇3 𝑇4 𝑇5 𝑇6
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑘 = ∅
𝑜 𝑗=1 𝑗)
∞ 𝑗=1
𝑜 𝑗=1 𝑗)
𝑄 𝐵 𝑇𝑗 𝑄(𝑇𝑗) 𝑄 𝐵 𝑇𝑘 𝑄(𝑇𝑘)
𝑜 𝑘=1
1/25/2013 7 Introduction to Probability Theory
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 8 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 9 Random Variables
Draw 2 numbers between 1 and 4. Let r.v. X be their sum. E 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 X(E) 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 Induced probability function on 𝒴. x 2 3 4 5 6 7 8 P(X=x) 1 16 2 16 3 16 4 16 3 16 2 16 1 16
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑌 𝑦 = 𝑄 𝑌 ≤ 𝑦 ∀𝑦 ∈ 𝒴
𝑦→−∞ 𝐺 𝑦 = 0 and lim 𝑦→∞ 𝐺 𝑦 = 1
𝑦→𝑦0 𝑦 > 𝑦0
1/25/2013 10 Random Variables
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 11 Random Variables
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑌 𝑦 = 𝑄 𝑌 = 𝑦 ∀𝑦
𝑌 𝑦 =
𝑌 𝑢 𝑒𝑢 𝑦 −∞
1/25/2013 12 Random Variables
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑦 𝑦 𝑒𝑦 𝑦2 𝑦1
1/25/2013 13 Random Variables
This explains why P(X=x) = 0 for continuous distributions!
𝑄 𝑌 = 𝑦 ≤ lim
𝜗→0 𝜗 >0
[𝐺
𝑦 𝑦 − 𝐺 𝑦(𝑦 − 𝜗)] = 0
𝑦1 𝑦2
Carnegie Mellon University 10-701 Machine Learning Spring 2013
The expected value of a function depending on a r.v. X~𝑄 is defined as 𝐹 𝑌 = (𝑦)𝑄 𝑦 𝑒𝑦
𝜈𝑜 = 𝑦𝑜𝑄 𝑦 𝑒𝑦
𝜈𝑜′ = 𝑦 − 𝜈 𝑜𝑄 𝑦 𝑒𝑦
1/25/2013 14 Random Variables
Carnegie Mellon University 10-701 Machine Learning Spring 2013
(𝑦,𝑧)𝜗𝐵
𝑍 𝑧 = 𝑔(𝑦, 𝑧) 𝑦
𝑌1 𝑦1 … 𝑔 𝑌𝑜(𝑦𝑜)
1/25/2013 15 Random Variables 1 2 PW 2 1/9
1/9
3 2/9
2/9
4 1/9 2/9
3/9
5 2/9
2/9
6 1/9
1/9
PV 3/9 4/9 2/9 1 W V
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 16 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝐹𝑌 = 1𝑞 + 0 1 − 𝑞 = 𝑞 𝑊𝑏𝑠𝑌 = 1 − 𝑞2 𝑞 + 0 − 𝑞2 1 − 𝑞 = 𝑞(1 − 𝑞)
If 𝑌1 … 𝑌𝑜~ 𝐶𝑓𝑠𝑜(𝑞) then Y = 𝑌𝑗
𝑜 𝑗=1
is Binomial(n, p) Geometric distribution – the number of Bernoulli trials needed to get one success
1/25/2013 17 Properties of Common Distributions
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑜 𝑦=0
1/25/2013 18 Properties of Common Distributions
𝑾𝒃𝒔𝒀 = 𝑭𝒀𝟑 − (𝑭𝒀)𝟑
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑌1, 𝑌2~ 𝑂𝑝𝑠𝑛 0,1 , then 𝑌1 ± 𝑌2~𝑂(0,2) 𝑌1/𝑌2 ~ 𝐷𝑏𝑣𝑑ℎ𝑧(0,1) 𝑌1~ 𝑂𝑝𝑠𝑛 𝜈1, 𝜏12 , 𝑌2~ 𝑂𝑝𝑠𝑛 𝜈2, 𝜏22 and 𝑌1 ⊥ 𝑌2 then 𝑎 = 𝑌1 + 𝑌2~ 𝑂𝑝𝑠𝑛 𝜈1 + 𝜈2, 𝜏12 + 𝜏22 If 𝑌 , 𝑍 ~ 𝑂 𝜈𝑦 𝜈𝑧 , 𝜏𝑌2 𝜍𝜏𝑌𝜏𝑍 𝜍𝜏𝑌𝜏𝑍 𝜏𝑍2 , then 𝑌 + 𝑍 is still normally distributed, the mean is the sum of the means and the variance is 𝜏𝑌+𝑍2 = 𝜏𝑌2 + 𝜏𝑍2 + 2𝜍𝜏𝑌𝜏𝑍, where 𝜍 is the correlation
1/25/2013 19 Properties of Common Distributions
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 20 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1 𝑜
𝑜 𝑗=1
1 𝑜
𝑜 𝑗=1
1/25/2013 21 Estimators
Carnegie Mellon University 10-701 Machine Learning Spring 2013
2
1/25/2013 22 Estimators
Bias Variance
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 23 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑄 𝑌 𝑍 =
𝑄(𝑌,𝑍) 𝑄(𝑍) note X;Y is a different r.v.
X and Y are cond. independent given Z iif 𝑄 𝑌, 𝑍 𝑎 = 𝑄 𝑌 𝑎 𝑄 𝑍 𝑎
Symmetry 𝑌 ⊥ 𝑍 |𝑎 ⟺ 𝑍 ⊥ 𝑌 |𝑎 Decomposition 𝑌 ⊥ 𝑍, 𝑋 𝑎 ⇒ 𝑌 ⊥ 𝑍 𝑎 Weak Union 𝑌 ⊥ 𝑍, 𝑋 𝑎 ⇒ 𝑌 ⊥ 𝑍 𝑎, 𝑋 Contraction (𝑌 ⊥ 𝑋 𝑎, 𝑍) , 𝑌 ⊥ 𝑍 𝑎 ⇒ 𝑌 ⊥ 𝑍, 𝑋 𝑎
1/25/2013 24 Conditional Probabilities Can you prove these?
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 25 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
𝑄 𝐸𝑏𝑢𝑏 𝜄)𝑄(𝜄) 𝑄(𝐸𝑏𝑢𝑏)
1/25/2013 26 Bayes Rule
Carnegie Mellon University 10-701 Machine Learning Spring 2013
If the posterior distributions 𝑄 𝜄 𝐸𝑏𝑢𝑏) are in the same family as the prior prob. distribution 𝑄𝜄, then the prior and the posterior are called conjugate distributions and 𝑄𝜄 is called conjugate prior
1/25/2013 27 Bayes Rule
Likelihood Conjugate Prior Bernoulli/Binomial Beta Poisson Gamma (MV) Normal with known (co)variance Normal Exponential Gamma Multinomial Dirichlet
How to compute the parameters of the Posterior?
I’ll send a derivation
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 28 Probabilistic Inference
Problem: You’re planning a weekend biking trip with your best friend, Min. Alas, your path to outdoor leisure is strewn with many
not counting other factors. Independent of this, Min might be able to bring a tent, the lack of which will only matter if you notice the symptoms of a flu before the trip. Finally, the trip won’t happen if your advisor is unhappy with your weekly progress report.
Carnegie Mellon University 10-701 Machine Learning Spring 2013
Problem: You’re planning a weekend biking trip with your best friend, Min. Your path to outdoor leisure is strewn with many
not counting other factors. Independent of this, Min might be able to bring a tent, the lack of which will only matter if you notice the symptoms of a flu before the trip. Finally, the trip won’t happen if your advisor is unhappy with your weekly progress report. Variables:
O – the outdoor trip happens A – advisor is happy R – it rains that day T – you have a tent F – you show flu symptoms
1/25/2013 29 Probabilistic Inference
Carnegie Mellon University 10-701 Machine Learning Spring 2013
Problem: You’re planning a weekend biking trip with your best friend, Min. Alas, your path to outdoor leisure is strewn with many
not counting other factors. Independent of this, Min might be able to bring a tent, the lack of which will only matter if you notice the symptoms of a flu before the trip. Finally, the trip won’t happen if your advisor is unhappy with your weekly progress report. Variables:
O – the outdoor trip happens A – advisor is happy R – it rains that day T – you have a tent F – you show flu symptoms
1/25/2013 30 Probabilistic Inference
A O F R T
Carnegie Mellon University 10-701 Machine Learning Spring 2013
How many parameters determine this model? P(A|O) => 1 parameter P(R|O) => 1 parameter P(F, T|O) => 3 parameters In this problem, the values are given; Otherwise, we would have had to estimate them Variables:
O – the outdoor trip happens A – advisor is happy R – it rains that day T – you have a tent F – you show flu symptoms
1/25/2013 31 Probabilistic Inference
A O F R T
Carnegie Mellon University 10-701 Machine Learning Spring 2013
The weather forecast is optimistic, the chances of rain are 20%. You’ve barely slacked off this week so your advisor is probably happy, let’s give it an 80%. Luckily, you don’t seem to have the flu. What are the chances that the trip will happen? Think of how you would do this. Hint #1: do the variables F and T influence the result in this case? Hint #2: use the fact that the combinations of values for A and R represent a partition and use one
1/25/2013 32 Probabilistic Inference
A O F R T
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 33 Recitation 1: Statistics Intro
Carnegie Mellon University 10-701 Machine Learning Spring 2013
1/25/2013 34 Recitation 1: Statistics Intro